Sr. Site Reliability Engineer
About Us
Visa is a global leader in payments technology, enabling transactions between consumers, merchants, financial institutions, and governments in over 200 countries and territories. The company is committed to delivering secure, reliable, and innovative payment solutions worldwide.
At Visa, you have the opportunity to create real impact—working on meaningful challenges, developing your skills, and contributing to solutions used globally.
Job Description
The Senior Platform Engineer is a senior individual contributor within the SRE Tribe, responsible for developing and maintaining a containerized platform supporting critical workloads.
This role focuses on platform reliability, resilience, and automation, ensuring infrastructure is designed and operated according to SRE and cloud-native best practices.
You will act as a technical expert, contributing hands-on while influencing cross-team initiatives—especially in infrastructure automation and orchestration at scale.
Work Model
Remote position (with occasional presence in a Visa office if required)
Key Responsibilities
Platform Ownership & Reliability
Own the full lifecycle of platform components (design, provisioning, upgrades, decommissioning), including:
Cloud infrastructure
Kubernetes clusters and services
Networking, ingress, and service discovery
Service Mesh and data-plane components
Ensure resilience using SRE principles:
Fault isolation and graceful degradation
Capacity planning and saturation control
Reduction of operational toil
Identify and mitigate reliability risks to improve platform stability
Infrastructure Automation & Orchestration
Design and implement infrastructure bootstrap processes:
Automated provisioning of clusters and environments
Repeatable platform setup and teardown
Dependency-aware orchestration across cloud and Kubernetes layers
Promote Infrastructure-as-Code and GitOps approaches:
Reproducible and auditable platform components
Automated, testable, and reversible changes
Minimal manual intervention
Identify automation gaps and drive improvements reducing operational risk
SRE Practices & Operational Excellence
Apply and promote SRE best practices:
Clear ownership and runbooks
Participation in on-call rotations
Incident response and post-incident reviews
Improve operational efficiency:
Simplify maintenance and day-2 operations
Standardize upgrade and rollback strategies
Reduce MTTD and MTTR
Ensure compliance with security and internal standards
Qualifications
Technical Skills
Strong hands-on experience with:
Public Cloud platforms (AWS preferred, Azure)
Kubernetes at scale (production environments)
Service Mesh (e.g., Istio, App Mesh, Linkerd)
Strong understanding of:
Observability and Golden Signals
Incident management and on-call practices
Infrastructure as Code (Terraform)
Cloud-native microservices architecture
Additional Skills
Strong communication and collaboration skills
Ability to work across teams and drive technical initiatives
Additional Information
Visa is an equal opportunity employer and considers all qualified applicants in accordance with applicable laws and regulations.
Sr. Site Reliability Engineer
Sr. Site Reliability Engineer