Pyspark Developer

Python

Pyspark Developer

Python
-, Wrocław

Link Group

Full-time
B2B
Senior
Remote
5 136 - 6 070 USD
Net per month - B2B

Job description

Job Description

We are seeking a highly skilled PySpark Developer with at least 5 years of experience in big data processing and analytics. The ideal candidate will design, implement, and optimize large-scale data processing pipelines, leveraging the capabilities of Apache Spark and Python.


Key Responsibilities

  • Develop, test, and maintain PySpark-based ETL pipelines to process and analyze large datasets.
  • Collaborate with data engineers, data scientists, and business stakeholders to understand data requirements and design optimal solutions.
  • Optimize PySpark applications for performance and scalability in distributed computing environments.
  • Work with Hadoop-based data platforms and integrate with other tools like Hive, HDFS, or Kafka.
  • Ensure data quality and integrity through robust validation and monitoring practices.
  • Debug and resolve issues in production and pre-production environments.
  • Document technical solutions and best practices.


Requirements

Technical Skills:


  • 5+ years of experience in data engineering or big data development, with a strong focus on PySpark.
  • Proficiency in Python programming, with experience in libraries commonly used in data processing (e.g., Pandas, NumPy).
  • Strong understanding of Apache Spark concepts: Spark Core, Spark SQL, and Spark Streaming.
  • Experience with distributed data processing frameworks and working in cloud-based environments (e.g., AWS, Azure, GCP).
  • Solid knowledge of big data technologies like Hadoop, Hive, HDFS, Kafka, or Airflow.
  • Hands-on experience with relational and NoSQL databases (e.g., PostgreSQL, Cassandra).
  • Familiarity with CI/CD pipelines and version control (e.g., Git).


Soft Skills:


  • Strong analytical and problem-solving skills.
  • Ability to work collaboratively in a team and communicate technical concepts effectively.
  • Detail-oriented, with a commitment to delivering high-quality code.


Preferred Qualifications:


  • Experience with streaming data using Spark Streaming or Kafka.
  • Knowledge of machine learning workflows and integration with big data pipelines.
  • Understanding of containerization tools like Docker or orchestration with Kubernetes.

Tech stack

    Data

    advanced

    PySpark

    regular

    Apache Spark

    regular

Office location

Published: 28.11.2024