Figure 5 - Source data 3:

This folder contains the results of the model selection process for all models used in Figure 5, Figure 5 - Figure supplement 1, and Figure 5 - Figure supplement 2.

As described in our Methods section "Training, evaluation and model selection", we tested whether each consensus model performed at least as good as the "worst" human expert for each validation image, measured as the F1 score to the estimated ground truth.

Note: In contrast to the "from scratch" and "fine-tuned" models, the "frozen" models are validated on all images (no training data needed). Thus, they also must meet the selection criterion for all images to be selected.

_______________________________________

Additional file descriptions:

*_selection_criterion.csv:

Contains the results of the model selection process for the respective lab (Mue, Inns1, Inns2, or Wue2): 
 - initialization: 	type of initialization
 - model:		identification of model used for analysis
 - validation_file_id:	image_id of the image that was used as validation in that split
 - f1_score:		F1-score of the predicted annotation of the respective model compared to the estimated ground truth
 - reference_f1_score:	lowest score of all manual annotations compared to the estimated ground truth on that image
 - selection_criterion:	indicates whether the selection criterion was met (True) or not (False)