All offersWarszawaOtherFounding ML Engineer (CUDA, ROCm, C++)
Founding ML Engineer (CUDA, ROCm, C++)
new
Other
DevsData LLC

Founding ML Engineer (CUDA, ROCm, C++)

DevsData LLC
Warszawa
7 900 - 8 300 USDNet/month - B2B
Type of work
Full-time
Experience
Senior
Employment Type
B2B
Operating mode
Hybrid
DevsData LLC

DevsData LLC

DevsData is a premium recruitment and software development agency specialized in developing unique software, artificial intelligence, and Big Data solutions. We’re working 100% remotely so that we can change the world from every place on Earth.

Company profile

Tech stack

    GPU
    advanced
    CUDA
    advanced
    C++
    advanced
    C
    advanced
    Python
    advanced

Job description

Online interview

Founding ML Engineer


  • 💰 Salary: $100 000 USD+ Equity (highly negotiable for the right candidate)
  • 🌎 Hybrid role: 2-3 days at an office in Warsaw/Gdansk
  • 🕦 Full-time position
  • 📝B2B contract or Contract of Employment, negotiable
  • ☑️Home office budget & relocation/traveling cost included


A rapidly scaling startup, recently emerging from stealth mode and backed by a top-tier venture capital fund, is embarking on a mission to democratize AI across any hardware platform


Our client's R&D team is building a highly efficient engine for deploying genAI models. This entails a wide array of tasks, ranging from fine-tuning GPU kernels to optimizing system performance. The Founding ML Engineer will play a pivotal role in driving significant enhancements in GPU performance while spearheading innovative AI and machine learning initiatives.


To tackle this mission - they are seeking an expert-level engineer for either Kernel, Compiler, or Runtime Optimization, with a robust background in CUDA, ROCm, or Triton kernel optimization.


This role presents an exceptional opportunity to shape the technical direction of the company and contribute to groundbreaking advancements in AI technology.


Requirements:


  • Deep understanding and experience in GPU performance optimizations.
  • Proven track record of kernel optimizations on CUDA, ROCm, or other accelerators.
  • Proficiency in programming languages such as C/C++ and Python.
  • Experience with the training and deployment of ML models.
  • Familiarity with distributed systems development or distributed ML workloads.
  • Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field.
  • Great understanding of English with strong communication and collaboration skills.


An exceptional candidate will also have:


  • Familiarity with OSS projects like FlashAttention, mlc-llm, vllm
  • Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.


You would be:


  • Analyzing the bottlenecks in ML training and inference
  • Developing and optimizing computing kernels in CUDA, Triton or ROCm
  • Working on the GPU performance optimizations to maximize performance


Get to know DevsData


We are a technology consulting company and a recruitment agency, delivering software solutions to clients from Europe and the US. We work 100% remotely, in an international team, including people from Asia, London, or San Francisco. We employ people with experience in international corporations as well as students of the best technical and business universities.


Find out more: https://devsdata.com



7 900 - 8 300 USD

B2B

Apply for this job

File upload
Add document

Format: PDF, DOCX, JPEG, PNG. Max size 5 MB

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Informujemy, że administratorem danych jest DevsData LLC z siedzibą na 1820 Avenue M #481, Brooklyn, NY 11230, USA (dale...more