Clinical metagenomics involves the process of identifying foreign organisms in a given clinical human sequence reads. Stand-alone versions of clinical metagenomic analysis pipelines that are capable of producing results in quick time aid clinicians to gain better understanding of the causative agent.


CMAP, a standalone UNIX shell script, is a 3-stage metagenomic analysis pipeline which makes use of prinseq, a preprocessing tool to filter low quality reads and bowtie, a tool to align sequence reads to the reference.

CMAP is currently capable of identifying viral sequences within the reads of the given patient sample. It can also be configured to identify bacterial and fungal pathogens of interest. The pipeline is as shown below:


Bio-raptor, a modified version of the CMAP tool was used to analyze around 990 directory of the 1000 genome project data sets. The whole idea behind this exercise was to find out viral sequences in publicly available data sets. Majority of the directories contained Epstein-Barr virus/ Human Herpesvirus 4 reads which was consistent with the fact that DNA of the 1000 genome project was purified from EBV/HHV-4 transformed cell lines. Results analyzed without the Herpes viral strains indicated the presence of possible viral pathogens.

Based on the sample origin, a global pathogen distribution heat map was generated for a given organism.

Future scope

Reads from sequence simulators and samples spiked with known quantities of viruses are being fed to the pipeline in-order to test the sensitivity and specificity.