Position Overview:
We are looking for a skilled and experienced Data Engineer to design, implement, and manage data pipelines, systems, and architectures. The ideal candidate will have a strong background in data engineering, software development, and database technologies, with a passion for optimizing data workflows and ensuring data quality, reliability, and performance.
Responsibilities:
- Collaborate with cross-functional teams to understand business requirements and design data solutions that meet the needs of stakeholders.
- Develop and maintain robust data pipelines for ingesting, processing, and transforming large volumes of structured and unstructured data from various sources.
- Implement data modeling techniques to design efficient and scalable data schemas and architectures that support analytics, reporting, and machine learning applications.
- Optimize data storage, retrieval, and query performance using database technologies such as SQL, NoSQL, and distributed storage systems.
- Ensure data quality and consistency by implementing data validation, cleansing, and enrichment processes, and monitoring data pipelines for errors and anomalies.
- Work closely with data scientists and analysts to provide them with access to clean and reliable data for analysis and modeling purposes.
- Automate data infrastructure deployment, configuration, and maintenance using infrastructure-as-code tools and techniques.
- Stay up-to-date with the latest advancements in data engineering technologies, tools, and methodologies, and recommend improvements to enhance efficiency and effectiveness.
Qualifications:
- Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or a related field.
- Proven experience (3+ years) in data engineering, ETL development, or a related field, with hands-on experience building and optimizing data pipelines and systems.
- Proficiency in programming languages such as Python, Java, Scala, or SQL, with experience using data processing frameworks such as Apache Spark, Apache Flink, or Apache Beam.
- Strong understanding of database technologies, including relational databases (e.g., PostgreSQL, MySQL), NoSQL databases (e.g., MongoDB, Cassandra), and distributed storage systems (e.g., Hadoop, Amazon S3).
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes) is a plus.
- Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
- Effective communication skills and the ability to collaborate with cross-functional teams in a fast-paced environment.