In a first phase of testing, the collaborative team has trained the PathAI system to look at slides from untreated patients and distinguish tumor from normal tissue. The system can also identify different cell types on a slide reliably. For a pathologist, these feats are akin to finding a needle in a haystack and then labeling every piece of straw.
The ability to label every cell is becoming increasingly important as cancer therapies evolve to include medicines that target not only cancer cells but also immune cells. If computers can analyze an entire slide at once and quantify cell types and locations, they could potentially reveal patterns that predict how well a patient might fare on a given therapy.
“Hopefully we can figure out which features correlate with survival or response to a drug,” says Meg McLaughlin, a pathologist and Director of the Oncology Pathology and Biomarkers group in the Oncology Translational Research team at the Novartis Institutes for BioMedical Research (NIBR).
With a recent explosion of experimental immuno-oncology options alongside therapies that target cancer-driving mutations, one of the biggest challenges for drug hunters is matching the most appropriate therapy to individual patients. While genomic information helps drive smart decisions, valuable clues in pathology slides could also help. “We want to create a platform that enables the field of pathology to support the accelerating pace of drug development,” says Andrew Beck, a pathologist, computer scientist and CEO of PathAI, located in Boston, Massachusetts, in the US.
We want to create a platform that enables the field of pathology to support the accelerating pace of drug development.
Andrew Beck, CEO of PathAI
Training the AI model
In collaboration with the Institute of Pathology at the University Hospital Basel in Switzerland, the Novartis team gained access to 400 pathology images from breast and lung cancer tissues along with anonymized information about the patients’ diagnoses and survival times.
The challenge for PathAI’s platform? Given an image, identify cancer, identify cell types and predict the patient’s probability of surviving five years.
One way to approach the challenge is to feed a set of untrained AI algorithms a subset of the data and see what it learns. Unlike a trained pathologist, the machine approaches the problem with no knowledge of cells or cancer.
“A human already has a lot of knowledge,” says NIBR data scientist Holger Hoefling, who is working on the project with PathAI and with an internal NIBR group aiming to use AI to assess safety concerns in pathology images. “Think about autonomous cars. To train a car to drive, the amount of time and data required for training is gigantic. In contrast, you put a human behind the wheel for 20 hours and let them drive.”
To give the untrained algorithms more knowledge about the training data, PathAI decided to feed them even more rich data. A team of consulting pathologists marks up the slides, giving the algorithms more information to work with. It’s a bit like annotations in a hefty piece of literature that highlight and explain critical passages.
For example, when training the algorithms to distinguish cell types, PathAI diced the training slides into about 10 000 smaller images and had pathologists label the cell types in each slice. “We had to think really hard about how we annotate the images,” says McLaughlin. “That step determines to a large extent what you get out of the AI model in the end.”
What is a black box?
AI experts refer to the trained algorithms as a “black box” because it’s difficult to know what the system has learned from the training data or how it makes decisions.
Inside the black box is a set of machine learning algorithms. These algorithms are a cascade of formulas that recognize features, such as the presence of a certain shape, and associate them with real-world data, such as how long a patient actually survived.
As the algorithms see more and more images, they adjust their understanding of the patterns they see in the data. Eventually they learn that certain shapes in a slide predict likely health outcomes, such as having a good chance of living one year or a poor chance of surviving six months.
The black box approach has the benefit of taking a fresh view of the data, so it can reveal unexpected biological patterns. But it can also discover patterns that have no biological meaning at all. Data scientists need to scrutinize the AI model’s output, identify the meaningless conclusions, and adjust the training data and algorithms in ways that weed them out.
Seeing through a machine’s eyes
After training, the PathAI platform lets users see pathology images through the machine’s eyes. Regions of the slides determined to be cancer glow bright red in a field of green surrounding tissue. Different cell types stand out in vivid colors like candies in a dish. The existing platform is for research use only, but PathAI aims to build applications that could be used by doctors in the future.