New manuscript and software package: AMAISE
Our group is interested in bringing sequencing technology to the bedside to improve the rapid diagnosis of infections, and have previously demonstrated the potential of real-time metagenomics to identify respiratory pathogens in a clinically relevant timeframe. Among the barriers to this goal is that we can now generate metagenomic data faster than our bioinformatic tools can make sense of it. A problem with many specimens (e.g., respiratory) is the astronomical host:bug ratio. In metagenomic sequencing results from sputum or bronchoalveolar lavage fluid, human DNA overwhelms microbial DNA by a ratio of >99.9:00.1, and you burn all of your time and computational resources classifying human sequences.
For several years, we’ve been working with Jenna Wiens and Meera Krishnamoorthy (Computer Science & Engineering) to address this problem using machine learning. We’re excited to share AMAISE: A Machine Learning Approach to Index-Free Sequence Enrichment. This tool uses machine learning to perform "in silico host depletion," so you can jump faster to microbial classification. It quickly and accurately identifies and excludes host-derived sequences… no need for time-intensive alignment of human sequences.
AMAISE: a machine learning approach to index-free sequence enrichment