Industry: banking
Location: Krakow, Warsaw or remote
Languages: Polish and English
Contract: B2B
The Data Engineer role focuses on developing and maintaining data processing pipelines and systems, ensuring high performance, quality, and efficiency within the technical team. The primary objective is to manage large datasets and automate processes while collaborating closely with cross-functional teams to support project goals.
RESPONSIBILITIES:
- Define and contribute to the Agile development process with a focus on Pyspark development and software design.
- Promote development standards and conduct code reviews while mentoring team members.
- Provide production support and troubleshooting for existing systems.
- Implement tools and processes focusing on performance, scale, availability, accuracy, and monitoring.
- Work with Business Analysts to accurately interpret and implement technical requirements.
- Participate in planning meetings, Sprint reviews, and retrospectives while contributing to system architecture and design.
REQUIREMENTS:
- Pyspark or Scala development and design experience.
- Min. 5 years of professional experience.
- Experience with scheduling tools like Airflow.
- Proficient in technologies such as Apache Hadoop, Pyspark, Apache Spark, YARN, Hive, Python, ETL frameworks, Map Reduce, SQL, RESTful services.
- Strong knowledge of Unix/Linux platforms.
- Hands-on experience with data pipelines using Hadoop components (Hive, Spark, Spark SQL).
- Experience with version control (Git, GitHub) and automated deployment tools (Ansible & Jenkins).
- Knowledge of big data modeling techniques (relational and non-relational).
- Experience in debugging code issues and communicating with the development team.
NICE TO HAVE:
- Experience with Elastic Search.
- Development experience with Java APIs.
- Experience with data ingestions.
- Understanding of Cloud design patterns.
- Exposure to DevOps and Agile methodologies (Scrum, Kanban).