Platform Observability Analyst
Role Purpose
The Platform Observability Analyst ensures the performance and reliability of Amelco’s platforms through strong monitoring, alerting, and operational insight. The role focuses on observability, incident response, and proactive system stability, providing early detection of issues and supporting rapid resolution.
You will work across Prometheus, Grafana, Pingdom, Cloudflare, and AWS CloudWatch to maintain system health and service reliability.
Key Responsibilities
Monitoring and Observability
Own and maintain dashboards for system health, performance, and uptime.
Manage Prometheus, Grafana, Pingdom, Cloudflare, and AWS CloudWatch monitoring.
Manage alerts, adjust thresholds, and configure notifications in line with operational SLAs.
Monitor system metrics and logs proactively.
Incident and Event Response
Respond to system alerts and operational issues.
Take immediate action on critical incidents, mitigate medium-impact issues, and escalate major events to Incident Management as needed.
Document all alerts, actions, and resolutions.
Proactive System Insights
Identify trends and early warning signs in system performance.
Recommend improvements for monitoring, alerting, and operational efficiency.
Support post-incident reviews and maintain operational documentation and runbooks.
Collaboration and Communication
Work closely with L2/L3 Support, Incident Management, and DevOps teams.
Provide clear technical insights to stakeholders.
Deliver structured shift handovers and summaries for continuity.
The role involves an evening and overnight shift pattern.
Required Skills and Experience
Experience with Prometheus, Grafana, and alerting pipelines.
Operational knowledge of AWS services (EC2, ECS/EKS, RDS, S3, CloudWatch).
Experience with Pingdom, Cloudflare, or similar uptime/performance monitoring tools.
Understanding of distributed systems, microservices, and cloud-native architecture.
Experience with log aggregation or observability tools (ELK, Loki, etc.).
Strong analytical mindset and problem-solving skills.
Clear documentation and communication skills.
Desirable Skills
Experience in gaming, fintech, or high-availability environments.
SQL skills for metrics and log analysis.
Knowledge of SRE principles and observability best practices.
Automation experience for monitoring or alerting workflows.
What We Offer
Competitive contractor rates.
B2B open-ended contract.
Full-time job and long-term working possibilities.
Exposure to modern cloud and observability tooling.
Opportunity to shape platform monitoring and reliability practices.
Strong collaboration with platform, DevOps, and operations teams.
Clear progression path toward SRE or Platform Engineering roles.
Knowledge-sharing opportunities.
Dynamic culture surrounded by industry experts.
Enthusiastic and energetic working environment.
Flat structure.
No dress code.
Sounds good? Please submit your CV in English by using the " Apply " button.
Platform Observability Analyst
Platform Observability Analyst