A machine learning algorithm has identified an antibiotic that kills E. coli and many other disease-causing bacteria, including some strains that are resistant to all known antibiotics. To test it, mice were infected on purpose with A. baumannii and C. difficile and the antibiotic cleared the mice of both infections.
"The computer model, which can screen more than a hundred million chemical compounds in a matter of days, is designed to pick out potential antibiotics that kill bacteria using different mechanisms than those of existing drugs."
"The researchers also identified several other promising antibiotic candidates, which they plan to test further. They believe the model could also be used to design new drugs, based on what it has learned about chemical structures that enable drugs to kill bacteria."
"The machine learning model can explore, in silico, large chemical spaces that can be prohibitively expensive for traditional experimental approaches."
"Over the past few decades, very few new antibiotics have been developed, and most of those newly approved antibiotics are slightly different variants of existing drugs." "We're facing a growing crisis around antibiotic resistance, and this situation is being generated by both an increasing number of pathogens becoming resistant to existing antibiotics, and an anemic pipeline in the biotech and pharmaceutical industries for new antibiotics."
"The researchers designed their model to look for chemical features that make molecules effective at killing E. coli. To do so, they trained the model on about 2,500 molecules, including about 1,700 FDA-approved drugs and a set of 800 natural products with diverse structures and a wide range of bioactivities."
"Once the model was trained, the researchers tested it on the Broad Institute's Drug Repurposing Hub, a library of about 6,000 compounds. The model picked out one molecule that was predicted to have strong antibacterial activity and had a chemical structure different from any existing antibiotics. Using a different machine-learning model, the researchers also showed that this molecule would likely have low toxicity to human cells."
"This molecule, which the researchers decided to call halicin, after the fictional artificial intelligence system from '2001: A Space Odyssey,' has been previously investigated as possible diabetes drug. The researchers tested it against dozens of bacterial strains isolated from patients and grown in lab dishes, and found that it was able to kill many that are resistant to treatment, including Clostridium difficile, Acinetobacter baumannii, and Mycobacterium tuberculosis. The drug worked against every species that they tested, with the exception of Pseudomonas aeruginosa, a difficult-to-treat lung pathogen."
"Preliminary studies suggest that halicin kills bacteria by disrupting their ability to maintain an electrochemical gradient across their cell membranes. This gradient is necessary, among other functions, to produce ATP (molecules that cells use to store energy), so if the gradient breaks down, the cells die. This type of killing mechanism could be difficult for bacteria to develop resistance to, the researchers say."
"The researchers found that E. coli did not develop any resistance to halicin during a 30-day treatment period. In contrast, the bacteria started to develop resistance to the antibiotic ciprofloxacin within one to three days, and after 30 days, the bacteria were about 200 times more resistant to ciprofloxacin than they were at the beginning of the experiment."
The way the system works is, they developed a "directed message passing neural network", open sourced as "Chemprop", that learns to predict molecular properties directly from the graph structure of the molecule, where atoms are represented as nodes and bonds are represented as edges. For every molecule, the molecular graph corresponding to each compound's simplified molecular-input line-entry system (SMILES) string was reconstructed, and the set of atoms and bonds determined using an open-source package called RDKit. From this a feature vector describing each atom and bond was computed, with the number of bonds for each atom, formal charge, chirality, number of bonded hydrogens, hybridization, aromaticity, atomic mass, bond type for each bond (single/double/triple/aromatic), conjugation, ring membership, and stereochemistry. "Aromatic" refers to rings of bonds. "Conjugation" refers to those chemistry diagrams you see where they look like alternating single and double (or sometimes triple) bonds -- what's going on here is the molecule has connected p orbitals with electrons that move around. "Stereochemistry" refers to the fact that molecules with the same formula can form different "stereoisomers", which have different 3D arrangements that are mirror images of each other.
From here, and the reason the system is called "directed message passing", the model applies a series of message passing steps where it aggregates information from neighboring atoms and bonds to build an understanding of local chemistry. "On each step of message passing, each bond's featurization is updated by summing the featurization of neighboring bonds, concatenating the current bond's featurization with the sum, and then applying a single neural network layer with non-linear activation. After a fixed number of message-passing steps, the learned featurizations across the molecule are summed to produce a single featurization for the whole molecule. Finally, this featurization is fed through a feed-forward neural network that outputs a prediction of the property of interest. Since the property of interest in our application was the binary classification of whether a molecule inhibits the growth of E. coli, the model is trained to output a number between 0 and 1, which represents its prediction about whether the input molecule is growth inhibitory."
The system has additional optimizations including 200 additional molecule-level features computed with RDKit to overcome the problem that the message passing paradigm works for local chemistry, it does not do well with global molecular features, and this is especially true the larger the molecule gets and the larger the number of message-passing hops involved.
They used a Bayesian hyperparameter optimization system, which optimized such things as the number of hidden and feed-forward layers in the neural network and the amount of dropout (a regularization technique) involved.
On top of that they used ensembling, which in this case involved independently training several copies of the same model and combining their output. They used an ensemble of 20 models.
The training set was 2,335 molecules, with 120 of them having "growth inhibitory" effects against E. coli.
Once trained, the system was set loose on the Drug Repurposing Hub library, which was 6,111 molecules, the WuXi anti-tuberculosis library, which was 9,997 molecules, and parts of the ZINC15 database thought to contain likely antibiotic molecules, which was 107,349,233 molecules.
A final set of 6,820 compounds was found, and further reduced using the scikit-learn random forest and support vector machine classifiers.
To predict the toxicity of the molecules, they retrained Chemprop on a different training set, called the ClinTox dataset. This dataset has 1,478 molecules with clinical trial toxicity and FDA approval status. Once this model was made it was used to test the toxicity of the candidate antibiotic molecules.
At that point they hit the lab and started growing E. coli on 96 flat-bottomed assay plates. 63 molecules were tested. The chemical they named halicin did the best and went on to further testing against other bacteria and in mice.Artificial intelligence yields new antibiotic
A deep-learning model identifies a powerful new drug that can kill many species of antibiotic-resistant bacteria.news.mit.edu