Regular Data Engineer (PySpark&Airflow)

4 165 - 6 109 USDNet per month - B2B
Python

Regular Data Engineer (PySpark&Airflow)

Python
Szlak 49, Kraków

VirtusLab

Full-time
B2B
Mid
Hybrid
4 165 - 6 109 USD
Net per month - B2B

Job description

We are #VLteam – tech enthusiasts constantly striving for growth. The team is our foundation, that’s why we care the most about the friendly atmosphere, a lot of self-development opportunities and good working conditions. Trust and autonomy are two essential qualities that drive our performance. We simply believe in the idea of ​​“measuring outcomes, not hours”. Join us & see for yourself!

About the role

Join our team to developing heavy data pipelines with cooperation with data scientists and other engineers. Working with distributed data processing tools such as Spark, to parallelise computation for Machine Learning and data pipelines. Diagnosing and resolving technical issues, ensuring availability of high-quality solutions that can be adapted and reused. Collaborating closely with different engineering and data science teams, providing advice and technical guidance to streamline daily work. Championing best practices in code quality, security, and scalability by leading by example. Taking your own, informed decisions moving a business forward.

Project Scope

The project aims at constructing, scaling and maintaining data pipelines for a simulation platform. You will be working on a solution to grant connectivity between AWS s3 and Cloudian s3. A previously completed Proof of Concept used Airflow to spin Spark job for some data extraction and to then expose the collected data via Airflow built-in XComs feature. Further work required productionization of the PoC solution, testing it on scale, or proposing an alternate solution.

As a Data Engineer in Store Ops, you will dive into projects that streamlining retail operations through the use of analytics and ML, by applying your Python, Spark, Airflow, Kubernetes skills.

Responsibilities

  • Developing heavy data pipelines with cooperation with data scientists and other engineers.

  • Working with distributed data processing tools such as Spark, to parallelise computation for Machine Learning and data pipelines.

  • Diagnosing and resolving technical issues, ensuring availability of high-quality solutions that can be adapted and reused.

  • Collaborating closely with different engineering and data science teams, providing advice and technical guidance to streamline daily work.

  • Championing best practices in code quality, security, and scalability by leading by example.

  • Taking your own, informed decisions moving a business forward.

Tech Stack

Python, PySpark, Airflow, Docker, Kubernetes, Dask, xgboost, pandas, scikit-learn, numpy, GitHub Actions, Azure DevOps, Terraform, Git @ GitHub

Project Challenges

  • Enhancing the monitoring, reliability, and stability of deployed solutions, including the development of automated testing suites.

  • Productionization of new data pipeline responsible for exposing data on demand and improve the performance on production.

  • Collaborating with cross-functional teams enhancing customer experiences through innovative technologies.

Team

5 engineers

What we expect in general:

  • Hands-on experience with Python.

  • Proven experience with PySpark.

  • Regular-level experience with Apache Airflow.

  • Proven experience with Data Manipulation libraries (Pandas, NumPy, and Scikit-learn)

  • Strong background in ETL/ELT design.

  • Regular-level proficiency in Docker and Kubernetes to containerize and scale simulation platform components.

  • Ability to occasionally visit Krakow office.

  • Good command of English (B2/C1).

 

Seems like lots of expectations, huh? Don’t worry! You don’t have to meet all the requirements.
What matters most is your passion and willingness to develop. Apply and find out!

A few perks of being with us

  • Building tech community

  • Flexible hybrid work model

  • Home office reimbursement

  • Language lessons

  • MyBenefit points

  • Private healthcare

  • Training Package

  • Virtusity / in-house training

  • And a lot more!

Tech stack

    English

    B2

    Python

    advanced

    Airflow

    regular

    Docker

    regular

    Kubernetes

    regular

    XGBoost

    regular

    Pandas

    regular

    NumPy

    regular

    GitHub

    regular

    Azure DevOps

    regular

Office location

Regular Data Engineer (PySpark&Airflow)

4 165 - 6 109 USDNet per month - B2B
Summary of the offer

Regular Data Engineer (PySpark&Airflow)

Szlak 49, Kraków
VirtusLab
4 165 - 6 109 USDNet per month - B2B
By applying, I consent to the processing of my personal data for the purpose of conducting the recruitment process. Informujemy, że administratorem danych jest VirtusLab Sp. z o.o. z siedzibą w Rzeszowie, ul. Zofii Nałkowskiej 23 (dalej jako "adminis... MoreThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.