IT - Site Reliability Engineer

DevOps

Centrum, Pune

emagine Polska

Full-time

Any

Mid

Hybrid

Job description

Introduction & Summary

We are seeking a dedicated Site Reliability Engineer (SRE) to join our team. The ideal candidate will possess a strong technical background and operational excellence in ensuring the reliability, availability, and performance of critical systems. You will play a key role in monitoring, troubleshooting, and resolving issues, while leveraging your expertise in observability for robust incident management.

Main Responsibilities

Your core duties will include:

Monitoring production systems and services using observability tools.
Responding to incidents, alerts, and outages in real time.
Participating in a rotating on-call schedule.
Designing, implementing, and maintaining observability solutions.
Collaborating with development and infrastructure teams to ensure system reliability.
Automating operational tasks and documenting procedures.
Conducting post-incident reviews and proposing monitoring enhancements.

Key Requirements

Bachelor's degree in Information Technology, Computer Science or related field.
2-5 years of experience in cloud and operations engineering.
Proficiency with Azure services; AWS and GCP experience is a plus.
Hands-on experience with Infrastructure-as-Code (IaC) tools like Terraform.
Strong scripting skills in Python, Bash or PowerShell.
Familiarity with Gitlab CI/CD tools integrated with Azure.
Proficiency in monitoring and logging tools.

Nice to Have

Master's degree or relevant certifications.

Other Details

This position involves a 24/7 shift rotation, ensuring continuous system reliability and performance. The role emphasizes proactive monitoring and efficient incident response in a collaborative environment.