Site Reliability Engineer
The Site Reliability Engineer (SRE) role focuses on enhancing observability, managing release processes, and maintaining operations within the organization. This position aims to ensure system reliability and performance through various engineering practices.
Develop, test, and maintain high-quality software solutions, frameworks, and automations.
Collaborate with cross-functional teams to analyze requirements and design solutions centered on stability and reliability.
Participate in code reviews to ensure code quality and shared knowledge.
Identify, troubleshoot, and resolve various incidents and problems while ensuring adherence to DevOps/SRE best practices.
Contribute to continuous improvement initiatives within the engineering team.
Design, build, and maintain scalable and reliable infrastructure.
Lead incident response efforts and develop runbooks for incident management.
Implement CI/CD workflows and automation tools to enhance system reliability.
Analyze system performance metrics and implement monitoring solutions to identify bottlenecks.
Ensure systems are secure and compliant with industry standards.
Create comprehensive documentation of systems, processes, and procedures.
Develop monitoring and alerting based on Service Levels (SLI/SLO) for Applications and Infrastructure.
Proficiency in one or more programming/scripting languages (e.g., Java/Python).
Experience with CI/CD setup and GitHub Actions.
Good knowledge of at least one major cloud service provider (Microsoft Azure or GCP).
Solid understanding of Agile development methodologies.
Strong troubleshooting skills in ITIL framework.
Experience in observability setup using tools like Splunk or Grafana.
Familiarity with version control systems (Git).
Good problem-solving skills and a willingness to learn.
Strong communication and teamwork skills.
This role offers a unique opportunity to work in a dynamic environment focused on high-performance systems and infrastructure management. Remote work flexibility is available, supporting ongoing projects within the technology sector.
Site Reliability Engineer
Site Reliability Engineer