categorize_by_function.py – Collapse table data to a specified level in a hierarchy.¶
Description:
This script collapses hierarchical data to a specified level. For instance, often it is useful to examine KEGG results from a higher level within the pathway hierarchy. Many genes are sometimes involved in multiple pathways, and in these circumstances (also know as a one-to-many relationship), the gene is counted for each pathway. This has a side effect of increasing the total count of genes in the table.
Usage: categorize_by_function.py [options]
Input Arguments:
Note
[REQUIRED]
- -i, --input_fp
- The predicted metagenome table
- -o, --output_fp
- The resulting table
- -c, --metadata_category
- The metadata category that describes the hierarchy
- -l, --level
- The level in the hierarchy to collapse to. A value of 0 is not allowed, a value of 1 is the highest level, and any higher value nears the leaves of the hierarchy. For instance, if the hierarchy contains 4 levels, specifying 3 would collapse at one level above being fully specified.
[OPTIONAL]
- --ignore
- Ignore the comma separated list of names. For instance, specifying –ignore_unknown=unknown,unclassified will ignore those labels while collapsing. The default is to not ignore anything. [default: None]
- -f, --format_tab_delimited
- Output the predicted metagenome table in tab-delimited format [default: False]
Output:
Output table is contains gene counts at a higher level within a hierarchy.
Collapse predicted metagenome using KEGG Pathway metadata.
categorize_by_function.py -i predicted_metagenomes.biom -c KEGG_Pathways -l 3 -o predicted_metagenomes.L3.biom
Change output to tab-delimited format (instead of BIOM).
categorize_by_function.py -f -i predicted_metagenomes.biom -c KEGG_Pathways -l 3 -o predicted_metagenomes.L3.txt
Collapse predicted metagenome using taxonomy metadata (not one-to-many).
categorize_by_function.py -i observation_table.biom -c taxonomy -l 1 -o observation_table.L1.biom