WIS International is looking for a Site Reliability Engineer (SRE) to join our team and be a key member for our cloud infrastructure team!
The Site Reliability Engineering team is responsible for building and maintaining WIS's cloud-based infrastructure. This involves the use of modern tools and techniques to provision, deploy, configure, and monitor a wide array of technologies, from serverless web applications to database clusters, traditional web-server farms to AKS and more. As a member of SRE team you will help to lead the planning, designing and implementation of many such services. You will work closely with various business units throughout the organization, mostly development team and networking team, to whom the team provides infrastructure in a rigorously defined set of services. The successful candidate must possess and be able to demonstrate a solid foundation of technical knowledge, meaning a full understanding of foundational concepts of Windows, Cloud infrastructure, Infrastructure as Code (IaC), general networking (TCP/IP), and programming/scripting.
Requirements
- 3+ years of experience as an SRE supporting production infrastructure.
- 5+ years of overall software engineering experience in a development environment.
- Bachelor’s degree in computer science and/or a wide range of relevant work experience.
- Extensive experience with Azure and Windows systems.
- Experience with container orchestration platforms such as Kubernetes.
- Experience using IAC tools such as Terraform, Docker, Helm, Packer, Ansible, ARM.
- Experience with configuration management tools such as Ansible, YAML and Terraform.
- Experience managing observability tools such as Grafana, Kibana and Prometheus.
- Experience with enterprise-grade software.
- Experience with software development.
- Experience with microservices architecture.
- At least two years of experience managing Kubernetes production systems.
- Experience with Power shell and Shell scripting
- Strong verbal and written communications skills Solid knowledge of web architecture and systems.
- Strong analytical and problem-solving skills.
Key Responsibilities
- Design and implement Kubernetes clusters according to business requirements, including scalability and security.
- Build and maintain Docker container for use in the AKS environment.
- Develop and maintain monitoring system to ensure the health and availability of SQL DBs, AKS clusters, file shares, service bus, web apps, etc. for production/Dev/Staging environments.
- Build and own infrastructure through code and work closely with development/systems/networking teams to automate CI/CD pipelines to remove repetitive manual process to simplify operational needs.
- Manage and optimize existing CI/CD pipelines.
- Design, architect and develop cloud native solution using services like AKS, Azure SQL, Azure functions, service bus, data factory on Azure cloud platform.
- Create and maintain technical documentation and build books.
- Deploy application packages and new workloads to production environment.
- Streamline and maintain QA and DEV environments that allows our developers and quality assurance teams to work more effectively and efficiently.
- Perform regular DR drills and maintain DRP by collaborating with systems and development teams.
- Identify and diagnose deficiencies with existing systems, frameworks, tools, and processes, and recommend creative solutions based on best practices and industry standards.
- Create dashboards that provide visibility into production metrics.
To learn more about WIS, please visit our website at www.wisintl.com.
WIS International welcomes applications from people with disabilities and is committed in providing appropriate accommodations upon request for candidates taking part in all aspects of the selection process.
WIS thanks all applicants in advance but will only contact those we wish to interview.
Job Type: Full-time
Benefits:
- Dental care
- Extended health care
Schedule:
Work Location: Hybrid remote in Mississauga, ON L4Z 1W9