You will be a part of the team accountable for design, model and development of whole GCP data ecosystem for one of our Client’s (Cloud Storage, Cloud Functions, BigQuery)
Involvement throughout the whole process starting with the gathering, analyzing, modelling, and documenting business/technical requirements will be needed. The role will include direct contact with clients.
Modelling the data from various sources and technologies. Troubleshooting and supporting the most complex and high impact problems, to deliver new features and functionalities.
Designing and optimizing data storage architectures, including data lakes, data warehouses, or distributed file systems. Implementing techniques like partitioning, compression, or indexing to optimize data storage and retrieval. Identifying and resolving bottlenecks, tuning queries, and implementing caching strategies to enhance data retrieval speed and overall system efficiency.
Identifying and resolving issues related to data processing, storage, or infrastructure. Monitoring system performance, identifying anomalies, and conducting root cause analysis to ensure smooth and uninterrupted data operations.
Train and mentor less experienced data engineers, providing guidance and knowledge transfer.
Requirements:
Must have
At least 6 years of experience as a Data Engineer, including min. 4 years of experience working with GCP cloud-based infrastructure & systems.
Deep knowledge of Google Cloud Platform and cloud computing services.
Extensive experience in design, build, and deploy data pipelines in the cloud, to ingest data from various sources like databases, APIs or streaming platforms.
Proficient in database management systems such as SQL (Big Query is a must), NoSQL. Candidate should be able to design, configure, and manage databases to ensure optimal performance and reliability.
Programming skills (SQL, Python, other scripting).
Proficient in data modeling techniques and database optimization. Knowledge of query optimization, indexing, and performance tuning is necessary for efficient data retrieval and processing.
Knowledge of at least one orchestration and scheduling tool (Airflow is a must).
Experience with data integration tools and techniques, such as ETL and ELT Candidate should be able to integrate data from multiple sources and transform it into a format that is suitable for analysis.
Knowledge of modern data transformation tools (such as DBT, Dataform).
Excellent communication skills to effectively collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders. Ability to convey technical concepts to non-technical stakeholders in a clear and concise manner.
Ability to actively participate/lead discussions with clients to identify and assess concrete and ambitious avenues for improvement.
Tools knowledge: Git, Jira, Confluence, etc.
Open to learn new technologies and solutions.
Experience in multinational environment and distributed teams.
Nice to have
Certifications in big data technologies or/and cloud platforms.
Experience with BI solutions (e.g. Looker, Power BI, Tableau).
Experience with ETL tools: e.g. Talend, Alteryx
Experience with Apache Spark, especially in GCP environment.
Experience with Databricks.
Experience with Azure cloud-based infrastructure & systems.