Metagenome Prediction Tutorial¶
This tutorial explains how to predict a microbial community metagenome using PICRUST, based on 16S (or other marker gene) data as detailed in Picking OTUs for use in PICRUSt.
- Please note that PICRUSt by default uses the relatively new biom format for representing OTU tables and Gene tables (e.g. KOs by samples). This has several benefits including easier integration with other software (e.g. QIIME and others in the future) and allows embedding of extra metadata about both the samples and observations (OTUs/KOs).
- However, PICRUSt also allow users to input OTU tables and export PICRUSt predictions in tab-delimited format by using the ‘-f’ option (see below).
- In addition, users can always convert to/from biom format to tab-delimited format using BIOM’s built in conversion script.
Step 1: Normalize OTU Table¶
normalize_by_copy_number.py normalizes the OTU table by dividing each OTU by the known/predicted 16S copy number abdundance.
Input is the users OTU table (that has been referenced picked against Greengenes).
Input and output files are in biom format:
normalize_by_copy_number.py -i your_otu_table.biom -o normalized_otus.biom
(Optional) Input format of OTU table can be changed to tab-delimited “classic” OTU table instead of BIOM format using the ‘-f’ option:
normalize_by_copy_number.py -f -i your_otu_table.tab -o normalized_otus.biom
(Optional) Previous examples assume the most recent Greengenes was used for closed OTU picking. Older versions can be specified using the
normalize_by_copy_number.py --gg_version 18may2012 -i hmp_mock_16S.biom -o normalized_otus.biom
Step 2: Predict Functions For Metagenome¶
predict_metagenomes.py creates the final metagenome functional predictions. It multiplies each normalized OTU abundance by each predicted functional trait abundance to produce a table of functions (rows) by samples (columns).
Input is the normalized OTU table created by normalize_by_copy_number.py.
Output is in biom format by default:
predict_metagenomes.py -i normalized_otus.biom -o metagenome_predictions.biom
(Optional) Output format can be changed from BIOM to tab delimited using ‘-f’ option:
predict_metagenomes.py -f -i normalized_otus.biom -o metagenome_predictions.tab
(Optional) NSTI values for each sample can be obtained using the
-a option. (We strongly recommend this step, as NSTI values are precalculated for common inputs):
predict_metagenomes.py -i normalized_otus.biom -o metagenome_predictions.tab -a nsti_per_sample.tab
(Optional) Previous examples assume the most recent GreenGenes was used for closed OTU picking. Older versions can be specified using the
predict_metagenomes.py --gg_version 18may2012 -i normalized_otus.biom -o metagenome_predictions.biom
(Optional) Previous examples assume that KEGG Orthologs predictions are wanted. Other types of functions (e.g. COGs) can be specified using the
predict_metagenomes.py --type_of_prediction cog -i normalized_otus.biom -o metagenome_predictions.biom