Intern: Cheminformatics - Reaction Data Standardization Nouveau
At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.
The Position
In Roche’s Computational Sciences - Center of Excellence (CS CoE), we enable Roche to make transformative medicines for patients in order to tackle some of the world’s toughest unmet healthcare needs. The CS CoE Molecular Modality Domain is the bridge between digital technology and science and brings together all competencies required to design, build, operate and evolve our digital landscape and the data backbone for molecule and reaction research. We enhance the full scientific data pipeline from structured data capture to powerful data integration and efficient data interrogation in order to create and develop successful drug molecules as efficiently as possible.
The Opportunity
Chemical reaction data frequently suffers from lack of standardization, especially regarding naming and identifying chemical reagents or reactants. To increase the usability of our internal reaction data for machine learning predictions and insights generation, we are looking for a highly motivated chemistry student to help us build a robust synonym handling and reagent unification framework to drastically improve our data mining capabilities.
As an intern on this project, you will play a critical role in standardizing our reaction data. Your core responsibilities will include:
- Exploring & Benchmarking Tools: Document common naming issues within our internal data
- Leveraging AI: Explore the use of Large Language Models (LLMs) to clean and process ambiguous chemical names
- Curating a Dictionary: Build and test a curated synonym dictionary covering the 500-2,000 most common reagents and catalysts, mapping abbreviations, systematic names, trivial names, and salt forms.
- Engage with lab scientists and learn from them how to accurately represent and structure key experimental variables within our reaction database
Who You Are
- You are passionate about the intersection of chemistry and data science.
- Currently pursuing or recently completed an MSc in Chemistry within the last 12 months.
- Familiarity with chemical nomenclature, including IUPAC names, trivial names, and common abbreviations.
- A strong interest in cheminformatics and chemical data handling.
- Basic Python and SQL programming skills, exposure to RDKit or similar cheminformatics libraries, and an understanding of chemical identifiers like SMILES, InChI, and CAS.
- Please note that, due to regulations, non-EU/EFTA citizens must provide a certificate from the university stating that an internship is mandatory as part of the application documents, and must be continuously enrolled in their university or PhD program for the whole duration of this internship.
Ready to take the next step? We’d love to hear from you. Apply now to explore this exciting opportunity!
Who we are
A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.
Let’s build a healthier future, together.
Roche is an Equal Opportunity Employer.



