pool_test_datasets.py – Pool character predictions within a directory, given directories of expected vs. observed test results


The script finds all paired expected and observed values in a set of directories and generates pooled .biom files in a specified output directory

Usage: pool_test_datasets.py [options]

Input Arguments:



-i, --trait_table_dir
The input trait table directory (files in biom format)
-e, --exp_trait_table_dir
The input expected trait table directory (files in biom format)
-o, --output_dir
The output directory


-f, --field_order
Pass comma-separated categories, in the order they appear in file names. Categories are “file_type”,”prediction_method”,”weighting_method”,”holdout_method” (randomization vs. holdout),”distance”,and “organism”. Example: “-f file_type,test_method,asr_method specifies that files will be in the form: predict_traits–distance_exclusion–wagner. Any unspecified values are set to “not_specified”. [default: file_type,prediction_method,weighting_method,holdout_method,distance,organism]
-p, --pool_by
Pass comma-separated categories to pool results by those metadata categories. Valid categories are: holdout_method, prediction_method,weighting_method,distance and organism. For example, pass “distance” to output results pooled by holdout distance in addition to holdout method and prediction method [default: False]


Outputs will be obs,exp data points for the comparison

Pool .biom files according to holdout_distance.

pool_test_datasets.py -i obs_otu_table_dir -e exp_otu_table_dir -p distance -o./evaluation_results/pooled_by_distance/