Apart from normal Java level (which is typically just one of the competencies) the engineers should meet the following:
- Very good programmers - languages: Java, Scala and Python.
- Specialist skills in “big data” technologies. Spark is key for a good chunk of the positions (RDD / Dataframe / Dataset API, Spark functions, joins, aggregations, basic optimization techniques)
- Various tech skills on Cloudera stack
- Good knowledge of Hadoop ecosystem, HDFS, Hive, Avro
- Specialist skills in data warehousing architectures (ETL, etc.)
- Specialist skills on one or more data platforms (Oracle, MSSQL or similar)
- Overall knowledge of distributed systems, data partitioning/replication
- Development tools: Git, Bash, Unix ecosystem
- Skills in BI/analytics
- Knowledge of Jenkins or Ansible would be a bonus