PyPHLAWD contains many scripts, and most of them can be run independently. Below is a list of some of the scripts and their functionality.

Main Programs

  • This is the main program and will execute the clustering analysis. To run you will need to specify your clade of interest, a database as downloaded by phlawd_db_maker, and a premade outdirectory for the outputs of the analysis to be placed in.
    • python CLADE PHLAWD_DB.db OUTDIR/
  • This runs the same analysis as, however, it allows you to provide your own bait sequences. These sequences should be in fasta format and put in a directory. If the sequences you have are not in fasta format, the program pxs2fa from the suite phyx can be used to convert it.

Other Scripts

  • This will align fasta sequences within a folder using mafft and the phyx program pxssort. This is good to run before if the sequences have not been aligned.
  • This will run during the main program or can be run separately to combine clusters together. The user specifies a folder with clusters to combine. This should contain a set of fasta files that end in .fa and their corresponding alignments ending in .aln. The user can also specify a log file or alternatively the logfile pyphlawd.log will have the information written to.
  • This will allow you to change names in a user input fasta file with a list of given names. The input is a tab delimited file containing the current names in the first column and the names to be replaced with in the second.
    • python Table.tsv InputFasta.fa OutputFile
  • Change the names on a tree. This will allow you to switch the names from ncbi to species file. The table should be tab delimited.
    • python Table.tsv Tree.tre OutTree.tre
  • If you plan on using cython while running PyPHLAWD, this program will set it up to make the possible.
    • bash
  • This program will allow the user to split clades connect by a branch of a designated length. This is intended to divide clusters that have brought together based upon misidentified orthology. The user specifies a branch length cutoff and the number of taxa for a clade. The input will be a folder containing the trees that are to be cut and the file ending of those trees (e.g .new, .tre etc…).
  • This program will create an mcl file out of a blast output.
  • This is a script that will read all the names from your database and output a table with them. This is useful if you want to quickly see if species of interest are available. The table can be input to the name changing program or in order to change ncbi names to species names.
    • python Database.db OutputFileName
  • This program will summarize the length to standard out of every fasta sequence in a file.
    • python FASTA
  • This program is designed to help clean trees after they have been inferred. You specify an absolute value and a relative value for which terminal branches of these lengths will be removed. This is to help remove sequences that have been included from misidentified orthology, long branch attraction or another source of systematic error. Details regarding choice for absolute and relative values may be found in Yang and Smith, 2014. This is performed during the regular analysis with the absolute and relative values specified in the file, however, can also be used for refinement of final trees.
    • python TREE.tre REL_VALUE ABS_VALUE
  • This program is designed to extract the clusters identified through markov clustering as implemented in mcl. The input is a fasta file which contains all sequences clustered (typically the file which an all-by-all blast was performed on), the outfile from the clustering analysis and the minimal number of required to be in the clustering analysis. This program requires a premade folder, and the output will be all the clusters identified by mcl that meet the minimum number of taxa requirement.
    • python AllFasta.fa mcl_outfile minimum_taxa OUTDIR