Job Summary:
As a member of PubNub's Engineering organization, you will work alongside Engineers and Architects in designing, developing, operating and scaling PubNub’s Data Stream Network, with a focus on improving the reliability, scale and efficiency of our global Data Stream Network. The infrastructure you will manage creates billions of events and produces terabytes of data on a daily basis. You will have the unique opportunity to help architect PubNub's infrastructure to solve challenging problems related to distributed systems, real-time messaging, and large scale data management.
Responsibilities:
- Design processes for improving operational stability of PubNub services
- Identify, document and help improve performance and operational efficiency challenges
- Assist in rationalizing PubNub's infrastructure as code and automation tooling
- Create tooling with documentation to scale our distributed systems
- Ensure and enforce best application and network security practices
- Participate in incident management on-call rotation and drive root cause analysis
- Collaborate with engineering teams, product owners and other stakeholders to develop tooling and CI/CD patterns
- Help define Service Level Objectives to assess release readiness of all services
- Support, monitor and manage cloud infrastructure and environments (AWS EC2, DNS, load balancers, and databases)
Experience & Skills Required:
- 2+ years of cloud platform experience (AWS preferred) and programming in Python, GO, Java, or equivalent
- Configuration management and automation tools such as Ansible, Terraform, etc
- Experience with CI/CD tools and implementing best practices
- Solid principles in cloud resources such as networking, load balancing, DNS, and security
- BS or MS in Computer Science or a related technical field
Preferred:
- Containerization experience (Docker, etc)
- Container orchestration systems management (Kubernetes, etc)
- Experience developing, supporting or operating large-scale, distributed SaaS products
- Desire to automate tedious tasks and eliminate inefficiencies
- A passion for system stability, performance, scalability or customer success
- Previous participation in Incident Management teams