Position Overview:
We are seeking a skilled Big Data Engineer to join our data engineering team. The ideal candidate will have extensive experience in building and managing large-scale data processing systems. This role involves designing, implementing, and optimizing data pipelines and infrastructure to support analytics, machine learning, and business intelligence efforts.
MD rate: 16600 – 20000 PLN
Roles and Responsibilities:
- Design, develop, and maintain big data pipelines to process and analyze large datasets.
- Implement data ingestion, processing, and storage solutions using big data frameworks such as Apache Spark, Hadoop, and Kafka.
- Optimize data pipelines for performance, scalability, and fault tolerance.
- Collaborate with data scientists, analysts, and other stakeholders to ensure data availability and usability.
- Develop and maintain data storage solutions such as HDFS, Amazon S3, Google Cloud Storage, or Azure Data Lake.
- Ensure data quality and integrity through automated testing and validation processes.
- Monitor and troubleshoot big data infrastructure to ensure optimal performance and reliability.
- Document technical solutions, workflows, and best practices.
Required Skills and Experience:
- Proficiency in big data technologies such as Apache Spark, Hadoop, Kafka, or Flink.
- Strong programming skills in languages like Python, Scala, or Java.
- Experience with SQL and NoSQL databases such as PostgreSQL, MongoDB, or Cassandra.
- Familiarity with cloud platforms such as AWS, Azure, or Google Cloud, including their big data services (e.g., EMR, BigQuery, Databricks).
- Knowledge of data modeling, ETL processes, and data pipeline orchestration tools like Apache Airflow, Luigi, or Dagster.
- Strong understanding of distributed computing principles and parallel processing.
- Experience with containerization tools such as Docker and orchestration tools like Kubernetes.
- Strong problem-solving skills and ability to troubleshoot large-scale data systems.
Nice to Have:
- Experience with real-time data processing and streaming platforms such as Apache Kafka Streams, Kinesis, or Pulsar.
- Familiarity with machine learning pipelines and integration with big data systems.
- Knowledge of data governance, security, and compliance in big data environments.
- Experience with CI/CD tools for automating data pipeline deployment and management.
- Exposure to Agile/Scrum methodologies.
- Understanding of data visualization tools such as Power BI, Tableau, or Looker.
Additional Information:
This role offers an opportunity to work on complex, large-scale data projects and help shape the future of data-driven decision-making. If you are passionate about big data technologies and thrive in a fast-paced, innovative environment, we encourage you to apply.