We are seeking a highly skilled and motivated Cloud Operations Engineer to join our IT Operations team. You will be a key player in managing and optimizing our hybrid cloud infrastructure, ensuring high availability, performance, and security of our systems across AWS and on-premises environments. You will play a crucial role in our migration to OpenShift virtualization and help lead our transition to a containerized infrastructure.
Responsibilities:
- Cloud and On-Premises Infrastructure Management:
- Provision, configure, and troubleshoot Linux servers in AWS and on-premises environments.
- Administer key AWS services, including S3, ELB, EFS, and Auto Scaling.
- Implement and maintain system monitoring and alerting to ensure optimal performance and stability.
- DevOps and Automation:
- Utilize Git and GitHub Actions for CI/CD automation.
- Develop and maintain Infrastructure-as-Code (IaC) using Terraform and Ansible.
- Leverage scripting skills (Bash, PowerShell) and AWS CLI for automation tasks.
- Virtualization and Containerization:
- Key Goal: Participate in migrating VMs from VMware to OpenShift Virtualization in Q1-Q2 2025.
- Train the rest of the team on Kubernetes and OpenShift, enabling a smooth transition to containerized infrastructure.
- Contribute to the planning and implementation of our broader containerization strategy.
- Incident Response and Support:
- Participate in on-call rotation to address and resolve production issues.
- Proactively identify and address potential system vulnerabilities and performance bottlenecks.
- Collaboration and Knowledge Sharing:
- Create and maintain comprehensive documentation of systems and processes.
- Actively share knowledge and collaborate effectively with team members.
Qualifications:
- Experience: 5+ years of proven experience as a Cloud Operations Engineer or Systems Administrator with hands-on experience managing cloud-based infrastructure.
- Cloud Expertise: Strong understanding of AWS cloud platform, with experience managing core services (EC2, S3, VPC, ELB, EFS, Auto Scaling).
- Linux Proficiency: Deep understanding of Linux system administration, including networking, security, and performance tuning.
- Networking: Solid understanding of networking principles and protocols.
- DevOps Skills: Experience with Git, GitHub Actions, IaC tools (Terraform, Ansible), and scripting (Bash, PowerShell, AWS CLI).
- Problem-Solving Abilities: Excellent analytical and problem-solving skills with a proactive approach to identifying and resolving issues.
- Communication and Teamwork: Strong communication and interpersonal skills, with the ability to work effectively in a collaborative team environment.
Highly Desired:
- Linux Certification (RHCSA or RHCE)
- AWS Certification (SysOps Administrator)
- Kubernetes Certification (CKAD or CKA)
- OpenShift Certification (EX280 or EX316)
Desired:
- Experience with VMware vSphere and virtualization technologies.
- Familiarity with other cloud service providers (GCP, OCI, Azure).
- Understanding of security best practices for cloud and on-premises environments.