This directory contains the input sequences, the BEAST input files, and the analysis scripts used to generate the phylogenetic trees and mutational paths. Most of these files are BEAST input/output files, or are python scripts to analyze these. The *.dot files contain the evolutionary trajectories in the format read by the Graphviz program. 

Some of the output BEAST files are very large (several gigabytes for the *.trees files), and so these full files have been deleted to make the overall size of this directory suitable for supplementary information. However, the first 875 lines of the NPhumanH3N2-MarkovJumps_1.trees file are included in order to indicate the format of the mutation mapped trees to make it clear how the parser works. In principle, all of the files could be regenerated by using BEAST and the scripts in this directory.

The mapping of mutations utilizes the Markov Jumps feature of BEAST, which currently is only implemented in the development trunk and requires use of the BEAGLE library. There is some ongoing development of the output format by which the mutations are written to the branches in the *.trees files, so it may be necessary to modify the Python parser code in this directory that reads the mutations from the *.trees files depending on the version of BEAST being used. 

Here is a list of the full files:
**********

NPhumanH3N2.fasta is the set of all full-length coding nucleotide sequences from human H3N2 influenza, excluding lab strains, as downloaded from the Influenza Virus Resource on May-19-2011.

Aichi68-NP.fasta, Nan95-NP.fasta, and BR10.fasta are the coding sequences for the NP genes from A/Aichi/2/1968, A/Nanchang/933/1995, and A/Brisbane/10/2007 (H3N2), as taken from my pHW series plasmids with the same names.

NPhumanH3N2_unique_protein_alignment.fasta is an alignment of all unique proteins encoded by the sequences in NPhumanH3N2.fasta plus those in Aichi68-NP.fasta, Nan95-NP.fasta, and BR10-NP.fasta, after removing the outlier sequences indicated in parse_seqs.py and parse_seqs.out.  This file was generated by running parse_seqs.py.

NPhumanH3N2_unique_protein_alignment.nex is the Nexus file corresponding to NPhumanH3N2_unique_protein_alignment.fasta, generated by running convert_fasta_to_nexus.py.

NPhumanH3N2-MarkovJumps_1.xml to NPhumanH3N2-MarkovJumps_5.xml are identical XML files that were used to generate the output files NPhumanH3N2-MarkovJumps_1.log, ..., NPhumanH3N2-MarkovJumps_5.log and NPhumanH3N2-MarkovJumps_1.trees.gz, ..., NPhumanH3N2-MarkovJumps_5.trees.gz. The latter of these files were gzipped to shrink their size. The runs were done with the developer's version of BEAST and BEAGLE, using the run command "java -Xmx2024m -Xms2024m -Djava.library.path=/usr/local/lib -cp ~/BEAST/build/dist/beast.jar dr.app.beast.BeastMain -beagle NPhumanH3N2-MarkovJumps.xml". Note that none of the actual BEAST output files (*.log, *.trees.gz) are included in this directory since they are too large to be distributed as Supplementary Material.

The NPhumanH3N2-MarkovJumps_treesample_*.trees, NPhumanH3N2-MarkovJumps_sites.trees, NPhumanH3N2-MarkovJumps_mutations.trees, NPhumanH3N2-MarkovJumps_paths.txt, NPhumanH3N2-MarkovJumps_digraph.dot, and NPhumanH3N2-MarkovJumps_sample_*.dot files were generated by running the script get_trees_and_paths.py on the *.trees.gz BEAST output files. These represent specific files from the posterior samples. The NPhumanH3N2-MarkovJumps_sites.trees, NPhumanH3N2-MarkovJumps_merged.trees, and NPhumanH3N2-MarkovJumps_mutations.trees files are not included in this directory because they are too large to be distributed as Supplementary Material.

NPhumanH3N2-MarkovJumps_digraph_handannotated.dot is a version of NPhumanH3N2-MarkovJumps_digraph.dot that has been hand annotated to improve the visual appearance, by adding invisible edges that slightly shifts the node layout. The structure and weights of visible nodes/edges are NOT altered.

NPhumanH3N2-MarkovJumps_maxcladecredibility.trees is the maximum clade credibility tree from those in NPhumanH3N2-MarkovJumps_sites.trees, generated by running the command: java -Xms20048m -Xmx20048m -cp ~/BEASTv1.6.1/lib/beast.jar dr.app.tools.TreeAnnotator NPhumanH3N2-MarkovJumps_sites.trees NPhumanH3N2-MarkovJumps_maxcladecredibility.trees

mutation_dates.txt and mutationdates.pdf are text files and plots showing the estimated dates of the mutations, generated by running the script datemutations.py, and also use the manually created file mostprobablpath.txt.

NPhumanH3N2-MarkovJumps_annotated_maxcladecredibility.trees is an annotated version of NPhumanH3N2-MarkovJumps_maxcladecredibility.trees, generated by running annotate_maxcladecredibility.py.
