Clustering Run Example

In the example below, I conduct the run in the examples/clustered directory but there isn’t really anything there. It is just a space for conducting the analyses. With this example, we will conduct a clustering run for PyPHLAWD on the Adoxaceae plant clade.

Setting things up

We are going to assume that you have already installed all the dependencies (if not, head over to the installation instructions). This animation below starts from cloning the repo (again, assuming that all the dependencies are installed). This includes changing a couple parameters in the and compiling the cython file.

setting things up

Commands from gif

  • git clone
  • cd PyPHLAWD/src
  • bash
  • edit changing the DI to the directory where PyPHLAWD src is, changing the smallest_size=500, and changing length_limit=0.5
  • cd ../examples/clustered
  • python ../../src/ Adoxaceae ~/Desktop/pln.041118.db .

Starting a run

We have the run going now (and the gif is a little sped up). But all of the commands can be found in the that is created. Check out the baited example here if you want to see what that file looks like.

starting a run

Commands from gif

  • python ../../src/ Adoxaceae ~/Desktop/pln.041118.db .

Looking at the results

Inside the Adoxaceae_4206, there is a info.html file that you can see (part of) here. The clusters themselves can be found in Adoxaceae_4206/clusters.


Finding good clusters

You can manually pick what clusters you like or you can use a script called as demonstrated here. This script will rename the clusters from GenBank ids to NCBI taxon ids. If you want to manually do this, the script is called and is demonstrated in the baited example here.


Commands from gif

  • python ../../src/ Adoxaceae_4206/
  • Do you want to rename these clusters? y/n/# y
  • Do you want to make trees and trim tips for these gene regions y/n n
  • Do you want to concat? y/n y
  • Do you want to make a constraint? y/n n

Changing the names

I went ahead and conducted a phylogenetic analysis on the concatenated dataset from the above procedure. The resulting tree has NCBI taxon ids as the tip labels. I haven’t memorized these so I need taxon labels. To change that, you can use the script as demonstrated below.


Commands from gif

  • python ../../src/ Adoxaceae_4206/Adoxaceae_4206.table Adoxaceae_4206/RAxML_bestTree.ADOX Adoxaceae_4206/

What is in the directory

Here I am just showing some of the things in the directory. For example, the info.csv file has the occupancy matrix for the clusters.