Open Source Science

Novartis is pioneering new informatics tools for drug discovery. We believe in the power of open-sourced, global collaboration for the greater good. Join us to help patients worldwide.

Peax

Peax is a tool for interactive concept learning and exploration of epigenomic patterns based on unsupervised machine learning with autoencoders.

GitHub Project | Download Peax

Jenkins-LSCI

Jenkins-LSCI enables research scientists to build workflows and data pipelines on the same robust framework and plugin ecosystem as Jenkins-CI the widely used continuous integration server that supports building, deploying and automating any software project.

GitHub Project

Habitat

Habitat is a simple and yet powerful self-contained object storage management system. Based on Amazon Web Services, it is capable of virtually unlimited storage. Instead of a large centralized management system, Habitat can be used as a local repository for a single application or it can be shared and used with many clients.

Habitat is best used for situations where the client producers and consumers of the files do not require a file system protocol interface and can use http(s) to access the store.

GitHub Project | Download Habitat

YADA

Access any data, at any source, in any format, from any environment, using just a URL, with just one-time configuration.

Get data from multiple sources, in different formats, merge the results into one with uniform column names, on-the-fly, using one URL.

Its raisons d'être are to enable efficient, non-redundant development of data-dependent applications and utilities, data source querying, data analysis, processing pipelines, extract, transform, and load (ETL) processes, etc. YADA does all this while preserving total decoupling between data access and other aspects of application architecture such as user interface.

GitHub Project | Download YADA

OntoBrowser

The OntoBrowser tool was developed to manage ontologies and code lists. The primary goal of the tool is to provide an online collaborative solution for expert curators to map code list terms (sourced from multiple systems/databases) to preferred ontology terms. Other key features include visualisation of ontologies in hierarchical/graph format, advanced search capabilities, peer review/approval workflow and web service access to data.

GitHub Project | Download OntoBrowser

Railroadtracks

Railroadtracks is a Python toolkit to handle graphs of dependent tasks such as the ones found in bioinformatics pipelines.

It was created for comparing RNA-Seq pipelines and found its use is other situations, such as writing a flexible system for the QC of NGS data.

GitHub project | Download Railroadtracks | Documentation (PDF 0.5 MB)

Yet Another Pipeline

YAP is an extensible parallel framework, written in Python using OpenMPI libraries. It allows researchers to quickly build high throughput big data pipelines without extensive knowledge of parallel programming. The user interacts with the framework through simple configuration files to capture analysis parameters and user directed metadata, enabling reproducible research. Using YAP, analysts have been able to achieve a significant speed up of up to 36× in RNASeq workflow execution time.

YAP has been designed to be scalable and flexible. We have implemented YAP with a focus on next-generation sequencing (NGS), to meet the large data processing challenges at NIBR. However, the framework can be easily adapted for any kind of analysis. It can be executed on your local Linux workstations or large HPC cluster systems. The framework achieves efficiency by implementing optimal data handling mechanisms such as, parallel data distribution, avoiding file I/O using data streams and named pipes.

GitHub project | Download Yet Another Pipeline

RDKit

The RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. The core algorithms and data structures are written in C++. Wrappers are provided to use the toolkit from either Python, Java, or C#. Additionally, the RDKit distribution includes a PostgreSQL-based cartridge that allows molecules to be stored in a relational database and retrieved via substructure and similarity searches.

Please see the RDKit Documentation for more information on installation, usage, cookbooks, and lots more.

GitHub project | Download RDKit

GridVar

GridVar is a jQuery plugin that visualizes multi-dimensional datasets as layers organized in a row-column format. At each cell (i.e., rectangle at the intersection of a row and column), GridVar displays your data as a background color (like a color/heat map) and/or a glyph (shape). This enables different characteristics of your dataset to be layered on top of each other. For more information on usage, required libraries, and other developer information, please see our documentation on GitHub.

GitHub project | Download GridVar