Site Reliability Engineer
We're on the hunt for a talented and proactive individual to join our team, someone who thrives on ensuring our services are always available, secure, and performing at their best. If you enjoy the challenge of building, maintaining, optimizing, and troubleshooting technology services you’ll fit right in. You'll be instrumental in rolling out exciting changes to our services, shaping our best practices, developing robust SOPs, and enhancing our use of Infrastructure as Code (IaC) and Configuration as Code (CaC) to build reliable and scalable systems used every day by your colleagues at NAVBLUE.
The official job title in our structure is Senior Technical Operations Specialist.
This role offers a unique opportunity to own the reliability of our critical services, from planning new infrastructure and migrating existing systems to performing essential upgrades and responding to incidents. We're looking for someone who can not only solve today's problems but also anticipate tomorrow's challenges, continually improving our systems and processes. If you're ready to make a significant impact on our operational excellence and contribute to a culture of continuous improvement, we'd love to hear from you!
Responsibilities:
Ensuring availability across numerous services, whether they are custom software, commercial software, or free and open source solutions.
Monitoring system and application performance, and logs.
Creating and testing backup and recovery procedures.
Responding to alerts and incidents when they occur.
Investigating and finding solutions to operational issues at the infrastructure, network, os and application levels.
Escalating issues to vendors or partners when appropriate.
Follow and improve the best practices and standards that help us keep services safe, secure, and reliable.
Improve or create our best practices to ensure the smooth operation of services and execution of procedures.
Develop and improve SOPs for the maintenance of our services and their underlying systems.
Develop and improve Infrastructure as Code (IaC) and Configuration as Code (CaC) used to maintain services and systems.
Ensuring our systems meet non-functional requirements such as performance or system protection requirements.
Planning and performing upgrades and or updates to services and their underlying operating systems and infrastructure when required.
Manage and document the configuration of systems following established processes for change control with input from vendor documentation or vendor support.
Write and validate the procedures used to update the system configuration and/or upgrade the systems to new versions with input from vendor documentation applied to our architecture.
Perform software updates following the documented procedures and be ready to respond to and troubleshoot issues that may arise during this activity.
Communicate plans and progress with internal stakeholders.
Planning and deployment of new infrastructure.
Ability to turn vendor’s deployment documentation or IaC resources into repeatable and reliable deployments using Infrastructure as Code and Configuration as Code.
Adapt monitoring and backup strategies to new services and their architectures
Migration of existing systems to new infrastructure when required.
Define scopes of work for and monitor performance of third party teams to help scale our work higher when required.
Must-Haves:
5+ years of proven experience in a technical role supporting multiple systems.
Experience using scripting languages to perform and automate tasks.
Using Infrastructure as Code and Configuration as Code such as Terraform and Ansible.
Experience deploying production services into cloud environments.
Knowledge, Skills, Demonstrated Capabilities & Competencies:
Solid knowledge of Operating Systems & ability to perform troubleshooting required.
Proven track record building and maintaining infrastructure in cloud environments.
Solid understanding of networking for enterprise environments required.
Demonstrated ability to identify root cause of issues and to recommend permanent, long term, fixes.
Demonstrated ability to perform standard troubleshooting in AWS environment and providing guidance to other teams.
Some experience with container and container orchestration platforms like kubernetes.
Proactive, confident self-starter with effective interpersonal and communication skills
Fluent in spoken and written English
Effective interpersonal skills
Technical Systems Proficiency:
Advanced Proficiency with the operation and support of Linux and/or MS Windows Operating Systems such as
Process and service management
Fundamentals of resource management and monitoring
Troubleshooting tools available
Deep knowledge of networks, shell scripting, Databases (Postgres, MySQL, MSSQL), DNS, HTTPS, and typical 3-tier architectures.
Infrastructure as Code (Terraform preferred) and Configuration as Code (Ansible preferred)
Cloud environments and technologies, such as AWS, kubernetes, etc.
Working knowledge of Windows domain administration, patch management, IIS management and Group Policies
Licensure/Certifications:
AWS Cloud Practitioner
AWS SysOps Administrator or Solutions Architect Associate preferred
Linux Foundation Certified Systems Administrator (LFCS) or similar is a bonus
Education:
Successful completion of a post-secondary degree or diploma in computer science or technology (or equivalent)
We offer:
Stable employment based on a full-time job contract
Flexible working hours and work-from-home opportunities (3 days in office)
International working environment in a dynamic company
Access to the latest knowledge and technologies enabling professional development
Training and development possibilities
Participating in international projects and international trips
Competitive salary dependent on experience and qualifications
Private medical coverage for you and your family
Sport card
Life insurance for you and your family
Co-funding for meals
Employee stock ownership plan
Selection and Hiring Commitment
We thank all applicants for applying. Only selected applicants will be contacted.
Navblue is committed to creating an environment and a culture where everyone feels like they belong no matter who they are or where they are from. We are committed to providing equal employment opportunities to all individuals based on job-related qualifications and ability to perform a job. We do not discriminate against any employee or applicant for employment because of race, colour, sex, age, national or ethnic origin, religion, sexual orientation, gender identity or expression, marital status, family status, genetic characteristics, record of offences, and basis of disability or any protected class. Accommodations will be available on request for candidates throughout the entire recruitment and selection process.
NAVBLUE is operating within the Airbus Helicopters Polska Structure.
At Airbus Helicopters Polska sp. z o.o. Gdańsk Plant, the "Internal procedure for reporting legal violations and taking follow-up actions at Airbus Helicopters Polska sp. z o.o. Gdańsk Plant" is in effect. This procedure was introduced in accordance with the Act of June 14, 2024, on the protection of whistleblowers (Journal of Laws of 2024, item 928). In the event of any detected violations, whistleblowers can submit reports to the following email address: talent@navblue.aero.
Site Reliability Engineer
Site Reliability Engineer