*Please note, the role is remote and candidates should be based in Poland.
The Opportunity
Software Engineering plays a key role in insitro’s approach to rethinking drug development. Our team is responsible for the software platform at insitro, which covers everything, including providing orchestration and workflow software for our automated lab (LIMS, data review/visualization), data pipelines for ingesting data from pharma and research collaborators, bio/cheminformatic pipelines for our genomic and chemical datasets, and APIs and tooling for our data science and machine learning teams.
You will be joining insitro’s first satellite hub that is emerging in Poland. Initially, this role will be based “remote”, and you may work from your home office. As a Data Engineer (Chemistry), you will work closely with a cross-functional team of computational chemists, cheminformaticians, data scientists, and ML engineers to help accelerate their data access and modeling pipelines by maintaining, organizing, and enhancing TB-scale small molecule datasets.
You will be joining a biotech startup that has long-term stability due to significant funding, providing many opportunities for meaningful impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!
Specifically, you will:
- Onboard and scalably/continuously maintain large-scale virtual libraries from diverse vendors for internal use in our DNA encoded library design and large-scale molecular inference (10**6 to 10**10 molecules)
- Work closely with ML engineers in integrating internal lab and public data into the ML workflow
- Onboard and scalably/continuously maintain public and private chemical property datasets for easy access for ML modeling and drug-discovery campaigns
- Design novel methods for harmonization and continuous integration of small-molecule datasets from various sources
- Develop and continuously deploy methods for reliable large scale molecular inference on large virtual libraries
- Develop search and visualization tools to aid virtual screening of massive molecular libraries
- Enhance insitro’s current suite of informatics tools to advance our cheminformatics and drug design environment, and integrate them with our machine learning platform
- Design and develop novel tools for visualization of chemical, biological, and structural data
About You
- MS/PhD (or BS +3-4 years industry experience) in computational chemistry or equivalent practical experience
- >5 years of relevant experience
- Expertise with relational databases and SQL querying and scripting
- Strong working knowledge of at least one common computational chemistry stack (such as Rdkit, OpenEye or Schrodinger)
- Strong working knowledge of SMILES string manipulation and common issues in chemical datasets
- Expertise in one or more general-purpose programming languages (such as Python, Java, Scala, C/C++, or Go)
- Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs)
- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions
- Familiarity with cloud computing services (AWS or GCP)
- Familiarity with web services and application frameworks (Django, Flask)
- Proficiency in Linux environment (including shell scripting), and experience with version control practices and tools (Git, Mercurial, etc.)
- Experience with working with machine learning and/or data scientist stakeholders in accelerating workflows
- Experience with working with lab-scientist stakeholders in a life sciences or physical sciences field
- Experience with large-sized data sets (100TB+) and associated technologies such as HPC/SLURM, Spark/Big Query, etc.
- Passion for making a difference in the world.
Nice to Have
- Experience with DNA encoded library datasets
- Working knowledge of statistics and various flavors of statistical modeling techniques
- Experience with deploying convolutional neural networks and generative models
Benefits at insitro
- Flexible work schedule
- Health insurance benefits
About insitro
insitro is a drug discovery and development company using machine learning and data generation at scale to transform the way that drugs are discovered and delivered to patients. We rely on human genetic cohorts, human-derived cellular disease models, and high-throughput biology and chemistry to identify coherent patient segments, actionable therapeutic targets, and new or existing chemical matter. The goal is to deliver predictive insights to improve the probability of success and reduce the number of costly dead ends along the R&D journey. The company has established collaborations with Gilead in NASH and Bristol Myers Squibb in ALS and is building a pipeline of wholly owned and partnered medicines leveraging its unique insights on patient biomarkers, targets, and molecules. insitro is located in South San Francisco, CA and has raised over $600M from top tech, biotech, and crossover investors since formation in 2018. For more information on insitro, please visit the company’s website at
www.insitro.com.
GDPR
The Controller of your personal data is Insitro, Inc., with offices at 279 East Grand Avenue, South San Francisco, California, United States. Your personal data is processed for the purposes of the current recruitment process. Providing your personal data is voluntary, but its processing and transfer to the United States by or on behalf of Insitro, Inc. is necessary for this purpose. You have the right to access, correct, modify, update, rectify, and request the transfer or deletion of your personal data.
You hereby consent to Insitro, Inc., with offices at 279 East Grand Avenue, South San Francisco, California, United States, retaining and processing your personal data after the current recruitment process is finished, for the purposes of future recruitment processes. You have the right to withdraw this consent at any time by sending a notification to recruiting@insitro.com.