How to run PICRUSt2 (v2)?

gene_x 0 like s 734 view s

Tags: processing, tool

https://github.com/picrust/picrust2/wiki/Infer-pathway-abundances

  1. Difference between unstratified and stratified

    1. In the context of PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States), unstratified and stratified outputs refer to different ways of presenting the predicted functional profiles of microbial communities.
    2. 1. Unstratified Output:
    3. Definition: Unstratified output provides the overall predicted abundance of each function (e.g., gene families, metabolic pathways) across the entire microbial community.
    4. Characteristics:
    5. Summarized Data: It aggregates the functional predictions for all taxa in a sample, giving a single abundance value for each function.
    6. No Taxonomic Information: Does not break down the contribution of each specific taxon to the predicted function. It only provides the total abundance of each function without detailing which taxa are contributing to those functions.
    7. Use Case: Useful when the overall functional potential of a microbial community is of interest without needing to know the contribution of individual taxa. It simplifies the data and reduces complexity.
    8. Example: If you are interested in the total predicted abundance of a specific gene across all microbes in a sample, you would use the unstratified output.
    9. 2. Stratified Output:
    10. Definition: Stratified output provides the predicted abundance of each function, but also stratifies (or breaks down) this data by the taxonomic origin of the microbes contributing to each function.
    11. Characteristics:
    12. Detailed Data: It provides more granular information by showing the predicted abundance of each function for each taxon in the community.
    13. Taxonomic Breakdown: This output allows you to see how much each taxon (e.g., a specific species or genus) contributes to the predicted abundance of each function.
    14. Use Case: Useful for understanding the functional contributions of specific taxa within a microbial community. It provides insight into which organisms are potentially driving certain functions within a community.
    15. Example: If you want to know which specific microbes are contributing to the abundance of a certain gene, the stratified output will give you this information by listing the abundance of that gene for each taxon.
    16. Key Differences:
    17. Level of Detail: Unstratified output provides a high-level summary, whereas stratified output offers a detailed breakdown by taxon.
    18. Data Granularity: Stratified output is more granular and complex, while unstratified output is simpler and more straightforward.
    19. Purpose: The choice between unstratified and stratified depends on whether you are interested in the total functional potential of the community (unstratified) or in understanding the functional roles of specific taxa (stratified).
    20. Summary:
    21. Unstratified: Overall predicted functional abundance without taxonomic breakdown.
    22. Stratified: Predicted functional abundance with detailed taxonomic breakdown for each function.
    23. PICRUSt2(通过重建未观测状态进行群落的系统发育调查)的背景下,**未分层(unstratified)和分层(stratified)**输出是指呈现微生物群落的预测功能特征的不同方式。
    24. 1. 未分层(Unstratified)输出:
    25. 定义:未分层输出提供了整个微生物群落中每个功能(例如,基因家族、代谢途径)的总体预测丰度。
    26. 特点:
    27. 汇总数据:它汇总了样本中所有分类单元的功能预测,为每个功能提供一个总的丰度值。
    28. 无分类信息:不显示每个具体分类单元对预测功能的贡献,仅提供每个功能的总丰度,而不细分哪些分类单元在贡献这些功能。
    29. 适用场景:当对微生物群落的总体功能潜力感兴趣,而不需要知道单个分类单元的贡献时,未分层输出是有用的。它简化了数据,减少了复杂性。
    30. 示例:如果你感兴趣的是一个样本中所有微生物的特定基因的总预测丰度,你可以使用未分层输出。
    31. 2. 分层(Stratified)输出:
    32. 定义:分层输出提供了每个功能的预测丰度,同时按贡献这些功能的微生物的分类来源进行了分层。
    33. 特点:
    34. 详细数据:通过显示群落中每个分类单元的每个功能的预测丰度,提供了更详细的信息。
    35. 分类细分:这种输出方式可以让你看到每个分类单元(例如,具体的物种或属)对每个功能的预测丰度的贡献。
    36. 适用场景:当需要了解特定分类单元在微生物群落中的功能贡献时,分层输出是有用的。它提供了哪些微生物可能在群落中驱动特定功能的见解。
    37. 示例:如果你想知道哪些具体的微生物在贡献某个基因的丰度,分层输出将提供此信息,列出每个分类单元的该基因丰度。
    38. 关键区别:
    39. 细节层次:未分层输出提供的是一个高级概述,而分层输出则提供按分类单元的详细细分。
    40. 数据粒度:分层输出更为细化和复杂,而未分层输出更为简单和直接。
    41. 用途:选择未分层还是分层,取决于你是对群落的总体功能潜力感兴趣(未分层),还是希望了解特定分类单元的功能作用(分层)。
    42. 总结:
    43. 未分层(Unstratified):总体的预测功能丰度,不包含分类细分。
    44. 分层(Stratified):包含详细分类细分的预测功能丰度。
    45. For analyzing differential pathways expressed between two sample groups, you should use the unstratified input in PICRUSt2.
    46. Reason for Choosing Unstratified Input:
    47. Focus on Overall Functional Differences: When comparing the functional profiles of two groups of samples, the primary interest is often in identifying which pathways are differentially abundant overall between the groups, regardless of which specific taxa are contributing to these differences. Unstratified input provides a summary of the total abundance of each function or pathway across the entire microbial community in each sample group, making it easier to compare the overall functional profiles.
    48. Simpler and More Direct Comparison: Unstratified data aggregates the functional predictions for all taxa within each sample. This aggregation simplifies the comparison between groups because it provides a single value per function or pathway for each sample, allowing for straightforward statistical testing of differential abundance.
    49. Reduces Complexity: Stratified input, which breaks down functional contributions by taxa, adds a layer of complexity that is not necessary for identifying overall differential pathways between groups. The unstratified output eliminates this complexity and focuses purely on the functions themselves, rather than on which specific taxa are contributing to these functions.
    50. When to Use Stratified Input:
    51. If you are interested in which specific taxa are responsible for the differences in pathway abundances between the two groups, then stratified input would be useful. It allows you to see not only which pathways are differentially expressed but also how the contribution of these pathways varies across different taxa.
    52. Summary:
    53. For identifying differential pathways expressed between two sample groups, use unstratified input to focus on the overall differences in functional profiles without considering the taxonomic breakdown.
    54. Use stratified input if you need to understand the taxonomic origins of these functional differences.
  2. Pathway inference

    1. Input files:
    2. *_metagenome_out/*unstrat.tsv.gz
    3. Mapfiles:
    4. KEGG_pathways_to_KO.tsv
    5. KEGG_modules_to_KO.tsv
    6. * ec_level4_to_metacyc_rxn.tsv
    7. * metacyc_path2rxn_struc_filt_pro.txt
    8. metacyc_path2rxn_struc_filt_euk.txt
    9. metacyc_pathways_structured_filtered
    10. metacyc_path2rxn_struc_filt_fungi.txt
    11. metacyc_path2rxn_struc_filt_fungi_present.txt
    12. metacyc_rxn_to_level4ec.tsv
    13. Output files:
    14. ./MetaCyc_pathways_out/path_abun_unstrat.tsv
    15. ./KEGG_pathways_out/path_abun_unstrat.tsv
    16. #The default is to map the EC numbers to Metacyc reactions and then to Metacyc Pathways. ERROR: runtime is too long!
    17. #pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_contrib.tsv.gz -o pathways_out -p 80
    18. #FILE_GENERATED_FOR_DOWNSTREAM: Map EC numbers to MetaCyc pathways and get stratified output corresponding to contribution of predicted gene family abundances within each predicted genome:
    19. pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -o MetaCyc_pathways_out_per_seq_contrib -p 80 --per_sequence_contrib --per_sequence_abun EC_metagenome_out/seqtab_norm.tsv.gz --per_sequence_function EC_predicted.tsv.gz
    20. ##ERROR: pred_metagenome_strat.tsv.gz does not exist. Mapping predicted KO abundances to legacy KEGG pathways (with stratified output that represents contributions to community-wide abundances):
    21. ##Why use '--no_gregroup'? no rows remain after regrouping input table. The default pathway and regroup mapfiles are meant for EC numbers. Note that KEGG pathways are not supported since KEGG is a closed-source database, but you can input custom pathway mapfiles if you have access. If you are using a custom function database did you mean to set the --no-regroup flag and/or change the default pathways mapfile used?
    22. #pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_strat.tsv.gz -o KEGG_pathways_out -p 80 --no_regroup --map /home/jhuang/Tools/picrust2/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv
    23. #FILE_GENERATED_FOR_DOWNSTREAM
    24. pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -o KEGG_pathways_out -p 80 --no_regroup --map /home/jhuang/Tools/picrust2/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv
    25. pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -o KEGG_pathways_out_per_seq_contrib -p 80 --per_sequence_contrib --per_sequence_abun KO_metagenome_out/seqtab_norm.tsv.gz --per_sequence_function KO_predicted.tsv.gz --no_regroup --map /home/jhuang/Tools/picrust2/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv
    26. #Note that the path of map files is under /home/jhuang/Tools/picrust2/picrust2/default_files/pathway_mapfiles
    27. #ERROR: COG does not fit the pathway_mapfiles KEGG_pathways_to_KO.tsv??
    28. #pathway_pipeline.py -i COG_metagenome_out/pred_metagenome_contrib.tsv.gz -o COG_pathways_out -p 80 --no_regroup --map /home/jhuang/Tools/picrust2/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv
    29. # The files KEGG_pathways_out/path_abun_unstrat.tsv KEGG_pathways_out_per_seq_contrib/path_abun_unstrat.tsv are the same!!!!!!!
    30. diff KEGG_pathways_out/path_abun_unstrat.tsv KEGG_pathways_out_per_seq_contrib/path_abun_unstrat.tsv
  3. Add descriptions to 5(gene_family)+2(pathway) tables

    1. #description_mapfiles
    2. KEGG_pathways_info.tsv.gz
    3. KEGG_modules_info.tsv.gz
    4. metacyc_pathways_info.txt.gz
    5. ec_level4_info.tsv.gz
    6. cog_info.tsv.gz
    7. tigrfam_info.tsv.gz
    8. pfam_info.tsv.gz
    9. ko_info.tsv.gz
    10. #--6.1. Add descriptions in gene family tables
    11. # EC and METACYC is a pair, EC for gene_annotation and METACYC for pathway_annotation, therefore we have 5 m-options for gene family tables, 1 m-option for pathway abundance table, for KEGG a custom description_mapfile is needed.
    12. add_descriptions.py -i COG_metagenome_out/pred_metagenome_unstrat.tsv.gz -m COG -o COG_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
    13. add_descriptions.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC -o EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
    14. add_descriptions.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO -o KO_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
    15. add_descriptions.py -i PFAM_metagenome_out/pred_metagenome_unstrat.tsv.gz -m PFAM -o PFAM_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
    16. add_descriptions.py -i TIGRFAM_metagenome_out/pred_metagenome_unstrat.tsv.gz -m TIGRFAM -o TIGRFAM_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz
    17. #--6.2. Add descriptions in pathway abundance tables -m {METACYC,COG,EC,KO,PFAM,TIGRFAM}
    18. cd MetaCyc_pathways_out_per_seq_contrib
    19. add_descriptions.py -i path_abun_unstrat.tsv.gz -m METACYC -o path_abun_unstrat_descrip.tsv.gz
    20. gunzip path_abun_unstrat_descrip.tsv.gz
    21. cd ..
    22. cd KEGG_pathways_out_per_seq_contrib
    23. add_descriptions.py -i path_abun_unstrat.tsv.gz -o path_abun_unstrat_descrip.tsv.gz --custom_map_table /home/jhuang/Tools/picrust2/picrust2/default_files/description_mapfiles/KEGG_pathways_info.tsv.gz
    24. gunzip path_abun_unstrat_descrip.tsv.gz
    25. cd ..
  4. Difference between Kxxxxxxx (gene or protein) and koxxxxxxx (pathway)

    1. The terms "ORTHOLOGY: K10989" and "ko00001" refer to different concepts and components within the KEGG (Kyoto Encyclopedia of Genes and Genomes) database, which is used for understanding high-level functions and utilities of biological systems.
    2. 1. ORTHOLOGY: K10989
    3. Definition: K10989 refers to a specific KEGG Orthology (KO) identifier.
    4. What It Represents: This identifier is assigned to a specific group of orthologous genes or proteins that perform the same function across different species. For example, K10989 might correspond to a particular enzyme or protein that is conserved across multiple organisms.
    5. Usage: K10989 is used to refer to a specific function at the gene/protein level. When you see "ORTHOLOGY: K10989," it indicates that this specific gene or protein in a genome has been classified under this orthology group.
    6. 2. ko00001
    7. Definition: ko00001 refers to a specific KEGG pathway map identifier.
    8. What It Represents: This identifier is associated with a KEGG pathway, which is a collection of manually drawn pathway maps representing molecular interaction and reaction networks, such as metabolic pathways, signaling pathways, and more.
    9. Usage: ko00001 typically refers to a high-level map, like the KEGG pathway overview, which includes an entire collection of pathways or a very broad view of metabolism or other cellular processes. The "ko" prefix indicates that it is a KEGG Orthology-based pathway map.
    10. Summary of Differences:
    11. Scope:
    12. K10989 is specific to a particular orthologous group of genes/proteins.
    13. ko00001 refers to a broad KEGG pathway or map.
    14. Focus:
    15. K10989 focuses on the function of specific genes/proteins across species.
    16. ko00001 provides a visual representation of biological processes or pathways.
    17. Level of Detail:
    18. K10989 is detailed at the molecular or functional level of specific proteins/genes.
    19. ko00001 covers a broader, more comprehensive overview of biological systems or networks.
    20. These identifiers help researchers navigate between specific gene functions and broader biological processes within the KEGG database.
  5. Preparing the input files for STAMP, e.g. path_abun_unstrat_descrip.tsv.gz and metadata.tsv

    1. Input files needed for STAMP are:
    2. * pred_metagenome_unstrat_descrip.tsv.gz / path_abun_unstrat_descrip.tsv.gz (from STEP 3)
    3. * metadata.tsv (see below)
    4. cut -d$'\t' -f1 map_corrected.txt > 1
    5. cut -d$'\t' -f5 map_corrected.txt > 5
    6. cut -d$'\t' -f6 map_corrected.txt > 6
    7. paste -d$'\t' 1 5 > 1_5
    8. paste -d$'\t' 1_5 6 > metadata.tsv
    9. # NOTE_1: Modify '#SampleID' to 'SampleID' !!
    10. SampleID Group Sex_age
    11. 1 Group1 f.aged
    12. 2 Group1 f.aged
    13. 5 Group1 f.aged
    14. ...
    15. # NOTE_2: for loading of EC[COG|KO|PFAM|TIGRFAM]_metagenome_out/pred_metagenome_unstrat_descrip.tsv, it doesn't work since 'Data does not form a strict hierarchy. Child FAD binding domain has multiple parents (e.g., PF00667, PF00890)'.
    16. # NOTE_3: for each pathway type (e.g. KEGG or MetaCyc), we need to restart the program. An example setting see STAMP_Screenshot.png.

    STAMP_Screenshot

  6. Install STAMP

    1. #https://github.com/picrust/picrust2/wiki/STAMP-example
    2. conda activate base
    3. conda install mamba
    4. # -- Install method 1 (Failed) --
    5. #https://beikolab.cs.dal.ca/software/Quick_installation_instructions_for_STAMP
    6. mamba create -n stamp_py2 python=2 pyqt=4 numpy scipy matplotlib biom-format stamp
    7. #pip install matplotlib
    8. pip install STAMP
    9. #Alternative: mamba create -n stamp bioconda::stamp
    10. # -- Install method 2 (Failed) --
    11. cd ~/Tools/STAMP-2.1.3
    12. python setup.py install
    13. #byte-compiling /home/jhuang/miniconda3/envs/stamp_py2/lib/python2.7/site-packages/stamp/metagenomics/StringHelper.py to StringHelper.pyc
    14. #running install_scripts
    15. #copying build/scripts-2.7/checkHierarchy.py -> /home/jhuang/miniconda3/envs/stamp_py2/bin
    16. #copying build/scripts-2.7/STAMP -> /home/jhuang/miniconda3/envs/stamp_py2/bin
    17. #changing mode of /home/jhuang/miniconda3/envs/stamp_py2/bin/checkHierarchy.py to 775
    18. #changing mode of /home/jhuang/miniconda3/envs/stamp_py2/bin/STAMP to 775
    19. #running install_data
    20. #copying LICENSE.txt -> /home/jhuang/miniconda3/envs/stamp_py2/.
    21. #creating /home/jhuang/miniconda3/envs/stamp_py2/manual
    22. #copying ./manual/STAMP_Users_Guide.pdf -> /home/jhuang/miniconda3/envs/stamp_py2/./manual
    23. #copying README.md -> /home/jhuang/miniconda3/envs/stamp_py2/.
    24. #running install_egg_info
    25. #Writing /home/jhuang/miniconda3/envs/stamp_py2/lib/python2.7/site-packages/STAMP-2.1.3-py2.7.egg-info
    26. python STAMP_test.py -v
    27. python STAMP.py
    28. #BUG: The two methods above could successfully install STAMP successfully, however, it stalls if starts? Try to install it on notebook!
    29. curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
    30. bash Miniforge3-$(uname)-$(uname -m).sh
    31. mamba install bioconda::stamp pyqt=4
    32. # -- Install method 3 (Failed) --
    33. #--Quick installation instructions for STAMP--
    34. #SUCCESSFUL install on Virtualbox 14 or 16
    35. sudo apt-get install libblas-dev liblapack-dev gfortran
    36. sudo apt-get install freetype* python-pip python-dev python-numpy python-scipy python-matplotlib
    37. sudo pip install STAMP #pip could not find, manually download the pip-package and install with the following command
    38. sudo python setup.py install #in the STAMP-pip-library.
    39. #ImportError: No module named biom.parse
    40. sudo pip install --upgrade biom-format
    41. conda remove -n stamp --all
    42. #conda create -n stamp pyqt=4
    43. #conda activate stamp
    44. #conda install -c bioconda stamp
    45. conda config --show channels
    46. mamba create -n stamp_py2 pip python=2 pyqt=4 numpy scipy biom-format
    47. mamba activate stamp_py2
    48. #pip install matplotlib
    49. pip install stamp
    50. # -- Install method 4 (Failed) --
    51. conda remove stamp_pyqt4
    52. mamba install pyqt=4 stamp
    53. #conda install icu=56
    54. # -- Install method 5: Windows system on Virtualbox (Failed) --
    55. sudo apt update
    56. sudo apt install virtualbox
    57. sudo apt install virtualbox-ext-pack
    58. virtualbox
    59. #http://www.winwin7.com/Win7QiJianBan/XTZJWin7QiJianBan-116517.html
    60. #http://win.hgyji.com/fanqiexp.html
    61. #https://eprebys.faculty.ucdavis.edu/2020/04/08/installing-windows-xp-in-virtualbox-or-other-vm/
    62. https://jingyan.baidu.com/article/a17d52851540e08098c8f219.html
    63. https://msdn.cyanlemon.net/%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/Windows%20XP/%E4%B8%AD%E6%96%87-%E7%AE%80%E4%BD%93/
    64. MRX3F-47B9T-2487J-KWKMF-RPWBY
    65. https://blog.51cto.com/u_16213618/11137698
    66. https://msdn.itellyou.cn/
    67. # -- Install method 6: STAMP_2_1_3.exe on Windows 7 in VirtualBox (Successful) --
  7. ALDEx2 (Not_Used!)

    1. https://bioconductor.org/packages/release/bioc/html/ALDEx2.html
  8. Convert png to svg and pdf

    1. inkscape error_bar.png --export-plain-svg=error_bar.svg (embbed)
    2. sudo apt update
    3. sudo apt install autotrace
    4. sudo apt-get install -y libpng-dev libtiff-dev imagemagick
    5. git clone https://github.com/autotrace/autotrace.git
    6. cd autotrace
    7. #sudo apt install intltool
    8. #sudo apt install gettext libglib2.0-dev
    9. #sudo apt install libtool libtool-bin
    10. #sudo apt install automake
    11. sudo apt-get install libxml-parser-perl
    12. ./autogen.sh
    13. ./configure
    14. make
    15. autotrace -output-format svg -output-file error_bar.svg error_bar.png

like unlike

点赞本文的读者

还没有人对此文章表态


本文有评论

没有评论

看文章,发评论,不要沉默


© 2023 XGenes.com Impressum