Ensure successful installation, integration, and deployment of monitoring tools applications and solutions in various production and non-production environments.
Identify opportunities for streamlining processes through automation.
Prepare and maintain technical documentation to assist with the operation, maintenance, and development of monitoring tools.
Analyze reported incidents and problems for applications, assess patterns, make recommendations, and implement monitoring solutions.
Install monitoring tools applications, work with infrastructure teams, and report application metrics to prevent problems and improve IT effectiveness.
Create and execute application and system test procedures after tools implementation.
Identify critical business transactions and implement tools to manage alerts and responses.
Manage and ensure security remediation implementation for all findings. Lead problem management and resolution.
Provide instrumentation for proactive diagnostic insight into the health of applications.
Take end-to-end ownership of managing the tools application, ensuring on-time updates, hotfixes, and upgrades. Stay up-to-date with new features and versions of the tools.
Build custom monitors to support new requirements. Service maintenance employing ITIL principles.
Provide knowledge transfer to team members and IT teams on monitoring systems. Ensure and maintain HLD, LLD, and SOP across all technology and tools support.
Contribute to our long-term vision and strategy for our engineering processes. Participate in the interview process for new engineers.
Work with application and system owners to clarify monitoring requirements and business needs. Provide solution demonstrations.
Collaborate with teams throughout IT to ensure proper use of the monitoring system. Identify and create methods of solving complex monitoring issues.
Provide continuous suggestions for improved alert configuration and operational processes in relation to monitoring. Continually learn, share knowledge, and push industry best practices.
Serve as an escalation point or resource to system administrators for monitoring issues.
Requirements:
Over 6-7 years of experience in installation, maintenance, and working knowledge of Infra & application monitoring tools such as Solarwinds, OpsRamp, Dynatrace, New Relic, Prometheus & Grafana, ServiceNow ITOM, and equivalent.
Working knowledge of holistic monitoring of applications with drill-down on the platform layer (infra, data, middleware, apps, etc.).
2 or more years of experience writing code (including Python, Shell Script, PowerShell; databases like MSSQL/Oracle). Basic understanding of modern software development methodologies (Object).
Basic knowledge of server, cloud, virtualization, database, network, and containers administration.
Good verbal and written communication skills (English).
Bachelor's or Master's degree in Computer Science, Information Systems, or equivalent.