System Administration experience on Linux / Unix & Windows,
Scripting Abilities - Bash, Ruby or Python,
Experience in Production Application Support,
Experience in using monitoring / alerting tools like AppDynamics, Nagios, Sensu, Graphana & VictorOps,
Experience with monitoring solutions such as CloudWatch, Prometheus, and the ELK stack,
Basic knowledge and understanding of IBM Message Queues,
Understanding of hardware or software Load Balancers in a large data center environment,
Knowledge of networking protocols such as HTTP and TCP/IP,
Understanding of firewall concepts and rules,
Excellent network analysis fundamentals and robust troubleshooting skills,
Strong customer focus and ownership,
Excellent written and verbal communication skills,
Ability to handle multiple projects simultaneously.
Education:
Bachelor's degree or equivalent,
Minimum 3+ years related experience.
Responsibilities:
Assist in the migration of the existing applications from current Data Center to AWS Cloud,
Automate routine tasks using Rundeck & Ansible,
Understand & manage application components in the AWS environment,
Understand & manage the applications hosted in Red Hat OpenShift running on top of Docker & Kubernetes,
Perform activities related to application deployment from development to production environments,
Field and manage technical customer issues via phone, chat and email,
Determines the tools needed to support system health, monitor system performance, develop and maintain performance monitoring and error tracking operational tools,
Attend incident and functional calls as required,
Carry out On-Call rotation tasks as needed for the business,
Work closely with multiple internal groups, which include Development, QA, Incident Management etc,
Assist System owners in providing day to day operating compliance, stability and system health and performance of the applications and infrastructure,
Responsible for creation and/or review of application Change Records (CR’s),
Responsible for release and deployment of application CR’s across multiple platforms from integration through production,
Accountable for meeting or exceeding a 96% KPI on the successful implementation of CR’s,
Responsible for providing daily health check information,
Identify areas of the release process that can be improved with advanced scripting and automation,
Assist System owners with Run-books and E2E documentation,