Senior Site Reliability Engineer
Are you passionate about cutting edge technology?
Do solving some of the Internet's most difficult content delivery challenges interest you?
Join our highly skilled Site Reliability team
Our team designs, develops, and manages applications and infrastructure that support Akamai's Compute products and services. We do this while maintaining Akamai's mission at the forefront of what we do. Make life better for billions of people, billions of times a day.
Partner with the best
The Senior Engineer creates solutions to improve automation and efficiency for systems and teams. Responsibilities include optimizing workflows, infrastructure, and applications. Expertise in Linux administration, configuration management, and performance tuning is essential. Collaborate on deployment, monitoring, and resolving incidents. Focus on reliability, scalability, and efficiency through automation and resource optimization. Promote continuous improvement and operational excellence across all systems.
As a Senior Site Reliability Engineer, you will be:
Providing support and mentorship for other engineers within the department
Developing and maintaining automated tools and scripts to enhance system reliability, deployment processes, and incident response efficiency.
Improving our system monitoring to speed error detection and remediation, enhancing performance and reliability of virtualization platform
Participating in on-call rotations, guiding restoration and repair of service-impacting issues
Writing automation and tooling to reduce operational toil, improve deployment safety, and accelerate incident response
Contributing to capacity planning, autoscaling configuration, and workload scheduling for AI compute infrastructure
Do what you love
To be successful in this role you will:
Possess expert level experience in a SysAdmin (Linux/Unix Administration), DevOps or SRE role, working with large scale distributed systems
Demonstrate expertise in Kubernetes and large-scale containerization systems.
Possess at least one programming language (Python/Golang) and configuration management with Terraform/SaltStack/Ansible
Define SLOs and work with observability tools like Prometheus, Grafana, and distributed tracing to enhance system monitoring.
Have experience with architecting software and infrastructure at scale
Demonstrate accountability for reliability, develop automation and monitoring, and collaborate effectively with an engineering team unfamiliar with SRE practices.
Build your career at Akamai
Our ability to shape digital life today relies on developing exceptional people like you. The kind that can turn impossible into possible. We’re doing everything we can to make Akamai a great place to work. A place where you can learn, grow and have a meaningful impact.
With our company moving so fast, it’s important that you’re able to build new skills, explore new roles, and try out different opportunities. There are so many different ways to build your career at Akamai, and we want to support you as much as possible. We have all kinds of development opportunities available, from programs such as GROW and Mentoring, to internal events like the APEX Expo and tools such as Linkedin Learning, all to help you expand your knowledge and experience here.
Learn more
Not sure if this job is the right match for you or want to learn more about the job before you apply? Schedule a 15-minute exploratory call with the Recruiter and they would be happy to share more details.
Senior Site Reliability Engineer
Senior Site Reliability Engineer