# Description of files used in the CAFE analysis:

# all CAFE script files provided are for running with CAFE v4.2.1

# Based on results files from OrthoFinder and KinFin:
#	Orthogroups.csv provides sequence IDs for each gene family (provided as supplementary file)
#	Orthogroups.GeneCount.txt provides counts of gene family size (can be separately summarized from above file)
#	cluster_functional_annotation*.tsv files from KinFin combined into a single file cluster_functional_annotation.tsv

# First, use Orthogroups.GeneCount.txt output from OrthoFinder as output for CAFE (requires minor reformatting)
# Next, follow CAFE manual for filtering sequence files before running CAFE

# Next, prior to CAFE model fitting, run cafe_error.sh, which is run as follows: 
# This allows for determining error rates in gene family sizes for species

python ~/bin/CAFE/cafe/caferror.py -i cafe_error.sh -l caferror.log -d reports/caferror_files -f 1

# Next, fit alternative models of gene family size evolution to compare models with different global rates
# on certain branches.
# Note that the 5 rate model fit does not converge on these data

cafe cafe_1rate.sh
cafe cafe_2rate.sh
cafe cafe_5rate.sh

# For comparing model fit, it is possible to produce null distribution of single-rate gene family size evolution by running
cafe cafe_genfamily.sh

# With the null distribution of gene families, it is possible to perform Likelihood Ratio test 
# for the 2 rate and 5 rate models against the null (single rate) model by running
cafe cafe_2rate_lhtest.sh
cafe cafe_2rate_lhtest.sh

# Subsequent analyses were performed on the single global rate analysis results (provided as supplementary file)

# CAFE output was first parsed using commands in cafe_output_bash_process.sh
# Parsed CAFE output can then be analyzed using cafe_output_R_process.R
# Enrichment tests of CAFE output were performed using functions provided in ortho.R