#1 Job Board for tech industry in Europe

  • Job offers
  • All offersKrakówJavaSite Reliability Engineer
    Site Reliability Engineer
    Java
    Antal Sp. z o.o.

    Site Reliability Engineer

    Antal Sp. z o.o.
    6 528 - 8 704 USDNet/month - B2B
    Type of work
    Full-time
    Experience
    Mid
    Employment Type
    B2B
    Operating mode
    Hybrid

    Tech stack

      Jira/Confluence

      regular

      PM

      regular

      problem management

      regular

      Java

      regular

    Job description

    Grow Your Career with Us!

    If you’re looking for a career that will help you stand out, join us and fulfill your potential. Whether you aim to reach the top or simply explore an exciting new direction, we offer opportunities, support, and rewards that will take you further.


    Technologies We Use

    • Java SE
    • Spring Boot
    • Spring Cloud
    • Apache Beam
    • Apache Flink
    • GCP
    • Redis
    • REST APIs
    • Ansible
    • Jenkins


    Our Work Culture

    We invest heavily in an Agile culture, adopting DevOps processes, CI/CD pipelines, and cloud technologies. We plan to establish a new development team in Krakow in 2023 as part of a long-term strategy to develop and support our platform in Europe.

    This is an exciting opportunity to join a team in its early stages and make a key contribution.


    Your Responsibilities

    • Manage application support operations, focusing on resiliency, availability, and monitoring system health and performance.
    • Coordinate resolution of production incidents, conducting post-mortem/RCA to identify root causes and improve processes.
    • Investigate, triage, and resolve production incidents with a focus on technical signals and root cause analysis.
    • Document post-incident recovery steps, contributing to process improvements, identifying deviations, and creating a Knowledge Base.
    • Actively participate in the service management community, engaging in Incident Management, Problem Management, and Service Delivery.
    • Define and deliver tactical and strategic service improvements across the technical and process landscape.
    • Apply SRE principles to continuously improve platform reliability, capacity, and performance, reducing toil and enhancing observability.
    • Develop observability tools and techniques for monitoring, alerting, incident detection, response, capacity management, and release safety.


    What You Need to Succeed in This Role

    • 4+ years of experience in developing and supporting distributed systems written in Java.
    • Experience with Disaster Recovery methods and processes.
    • A methodical approach to troubleshooting and problem-solving skills.
    • Experience in application lifecycle management tooling: JIRA/Confluence, Ansible, Vulnerability Remediation, CI/CD automation.
    • Experience implementing and managing Logging, Monitoring, and Alerting frameworks for hybrid cloud using tools such as Geneos, Grafana, InfluxDB, Splunk, Loki, or similar tools.
    • Understanding of RDBMS Database, Cloud Technology, Unix/Linux, Job scheduling e.g., Control-m or autosys.
    • Ability to lead technical conversations with various technical support groups.
    • Excellent communication skills and experience working in Agile methodology.


    Join us and grow your career in a dynamic and innovative environment!

    6 528 - 8 704 USD

    B2B