Data Engineer

1dea

Warszawa

Type of work

Full-time

Experience

Mid

Employment Type

B2B

Operating mode

Remote

Tech stack

Python

advanced

Java

regular

Apache Spark

regular

AWS/Azure/GCP

regular

Scala

junior

Job description

Online interview

Friendly offer

Position Overview:

We are looking for a skilled and experienced Data Engineer to design, implement, and manage data pipelines, systems, and architectures. The ideal candidate will have a strong background in data engineering, software development, and database technologies, with a passion for optimizing data workflows and ensuring data quality, reliability, and performance.

Responsibilities:

Collaborate with cross-functional teams to understand business requirements and design data solutions that meet the needs of stakeholders.
Develop and maintain robust data pipelines for ingesting, processing, and transforming large volumes of structured and unstructured data from various sources.
Implement data modeling techniques to design efficient and scalable data schemas and architectures that support analytics, reporting, and machine learning applications.
Optimize data storage, retrieval, and query performance using database technologies such as SQL, NoSQL, and distributed storage systems.
Ensure data quality and consistency by implementing data validation, cleansing, and enrichment processes, and monitoring data pipelines for errors and anomalies.
Work closely with data scientists and analysts to provide them with access to clean and reliable data for analysis and modeling purposes.
Automate data infrastructure deployment, configuration, and maintenance using infrastructure-as-code tools and techniques.
Stay up-to-date with the latest advancements in data engineering technologies, tools, and methodologies, and recommend improvements to enhance efficiency and effectiveness.

Qualifications:

Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or a related field.
Proven experience (3+ years) in data engineering, ETL development, or a related field, with hands-on experience building and optimizing data pipelines and systems.
Proficiency in programming languages such as Python, Java, Scala, or SQL, with experience using data processing frameworks such as Apache Spark, Apache Flink, or Apache Beam.
Strong understanding of database technologies, including relational databases (e.g., PostgreSQL, MySQL), NoSQL databases (e.g., MongoDB, Cassandra), and distributed storage systems (e.g., Hadoop, Amazon S3).
Experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes) is a plus.
Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Effective communication skills and the ability to collaborate with cross-functional teams in a fast-paced environment.