Jul 08, 2024


Responsible for enhancing and operating industry-scale preclinical data pipelines that expose, unify, and harmonize data from thousands of preclinical studies critical to achieving data42 (and partner) objectives.

About the Role

Major Accountabilities : 

  • Work with technical and domain experts (SMEs) to plan and deliver on data42 (and partner) objectives relying on preclinical data assets.  

  • Supervise the operation of end-to-end data pipelines enabling the delivery of high-value preclinical data. 

  • Be responsive to new preclinical data needs raised by data42 partners. 

  • Responsible for the ongoing development, maintenance, and individualization of the preclinical data pipelines  

  • Support the implementation of essential data quality processes and metrics to ensure high-quality data products are delivered to collaborators. 

  • Support data unification and harmonization efforts performed by preclinical data SMEs 

  • Support the team’s semantics approach to ensure data and metadata standards are met.  

  • Implement and support the FAIR principles through the data pipelines implemented and operated within this role.  

  • Collaborate with governance bodies and data strategy roles to ensure alignment of activities.  

  • Support the linking of preclinical data to other data assets including clinical data, in-vitro assay data and molecular/omics data.  

  • Support strategic decision making around data standards such as CDISC SEND 

  • Serve as a single point of contact for all preclinical data pipeline related needs, questions, and concerns within data42. 

  • Define the roadmap and strategic planning of the preclinical data pipeline team.  

  • In conjunction with a scrum master, run the team through an agile methodology. 

  • Collaboratively maintain documentation of the preclinical data pipelines 


 Key Performance Indicators  

  • Demonstration of scalable platform implementation and operation.  

  • Improving metrics of data quality and FAIR compliance.  

  • Responsiveness to changes to preclinical data requirements/priorities. 

  • Collaboration with other data pipeline teams 

 Impact on the organization:  

Drive one of the most important data assets through the data42 platform, at scale, bringing insight-generation opportunities to hundreds of consumers of data42’s platform and data 


Ideal Background Master’s degree or higher in Computer Science, Data Engineering, Biostatistics or Bioinformatics 

Experience/Professional requirements:  


  • 8+ years working with preclinical in-vivo study data e.g. from toxicology and pharmacokinetics studies 

  • Understanding of preclinical in-vivo study designs, conduct and data collection   

  • Detailed understanding of CDISC SEND standard 

  • Experience with big data processing platforms e.g. Spark 

  • Operational experience running large data operations 

  • Knowledge of data pipeline and architectural decisions/trade-offs. 

  • Proficient in SQL. Pyspark experience desirable. 

  • Experience with semantics technologies and approaches 

  • Experience working in an environment that leverages agile methodologies 

  • Honesty and Transparency are core attributes 

    Why Novartis: Helping people with disease and their families takes more than innovative science. It takes a community of smart, passionate people like you. Collaborating, supporting and inspiring each other. Combining to achieve breakthroughs that change patients’ lives. Ready to create a brighter future together?

    Join our Novartis Network: Not the right Novartis role for you? Sign up to our talent community to stay connected and learn about suitable career opportunities as soon as they come up:

    Biomedical Research
    Pharma Research
    Hyderabad (Office)
    Research & Development
    Full time

    Novartis is committed to building an outstanding, inclusive work environment and diverse teams' representative of the patients and communities we serve.

    A female Novartis scientist wearing a white lab coat and glasses, smiles in front of laboratory equipment.

    Preclinical Pipeline Lead- Data42

    Apply to Job