Synopsis of analysis:

This analysis was performed to determine how gene families determined by OrthoFinder
correspond to previously determined ohnologs to explore whether OrthoFinder may or may
not be over-splitting known ohnologs into separate gene families.

Analysis is based on previously determined ohnologs, publicly available from here: 
http://ohnologs.curie.fr/
Strict ohnologs were used for this analysis for human, anole, possum, danio, and gar
The main paper reports numbers for human, as they were similar to those of the other four
species. Results for all five species are summarized in the supplementary text.

Step 1 (bash): Ohnolog files are reformatted for easier processing in R. SequenceIDs.txt 
from OrthoFinder is also reformatted so the Ensembl protein IDs can be matched to Ensembl
gene IDs using biomaRt.

Step 2 (R): Ohnolog txt files contain Ensembl gene IDs, while Ensembl proteins were clustered
by OrthoFinder. Hence, to match them up, we use biomaRt to determine the Ensembl gene IDs
for all sequences from OrthoFinder. 

Step 3 (bash): With the linkage between Ensembl gene IDs and protein IDs, IDs are matched 
between the ohnolog files and the orthogroups to determine which ohnologs contain sequences
corresponding to which orthogroups. Oversplit ohnologs will hence match multiple orthogroups.
The number of times an ohnolog is matched to an orthogroup is counted to determine how often
ohnologs appeared in multiple orthogroups.