Our research focuses on large-scale empirical virtual screening (VS) models. VS by conventional docking has not achieved the accuracy needed to replace expensive and time-consuming experimental high-throughput screens (HTSs). And, while conventional QSAR can be accurate for compounds similar to the training set, it generally fails for the novel chemical matter of interest. We employ machine learning to build HTS-quality VS models. AutoShim creates accurate, target-customized scoring functions by adjusting the weights of pharmacophore “shims” in the protein binding site, optimized on a few hundred training IC50s. Kinase Surrogate AutoShim pre-docks the screening collection into an ensemble of 8 diverse representative kinases. These dockings are then “shimmed” to quickly predict the activities of the entire compound collection on hundreds of additional kinases, very accurately, without further docking or protein structures. Profile-QSAR is a 2D ligand-based method that predicts activity for thousands of diverse assays with unparalleled accuracy by using estimated activity from thousands of conventional single-assay QSAR models as the compound descriptors.
We will be expanding on these methodologies and their applications in several directions:
Further enhance Profile-QSAR by adding 3D Surrogate AutoShim predictions to the current 2D ligand-based predictions
Develop Surrogate AutoShim ensembles for membrane-bound protein families like GPCRs and ion channels that stand to benefit greatly because they have few experimental protein structures
Develop Surrogate AutoShim ensembles for targets outside the large protein families
Adapt AutoShim beyond broad screening, for pose prediction and lead optimization
Build cross reactivity-based protein family trees, more relevant to drug design than the current sequence-based trees that reflect evolutionary history