BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 150629 Batch 150722 Batch 151119 Batch 170217
Condition crowned 4 3 5 0
Condition worker 0 3 2 5

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.7956 0.6805

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.39 0 0
1st Qu. 22.58 0.267 19.67
Median 34.47 1.151 32.19
Mean 34.62 2.557 32.55
3rd Qu. 46.68 3.355 45.16
Max. 81.98 46.33 81.32

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 1.502e-06 0.01339 0.07447 0.1951 0.287 1 0.4321
Condition P-values 0.001211 0.4143 0.5539 0.564 0.7144 1 0.004216

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
LGALS2 -27.56 20.68 -3.342 0.003726 0.9837 -4.595
DDO -10569 8228 -2.834 0.01119 0.9837 -4.595
PCDHB9 -40.56 54.41 -2.597 0.01846 0.9837 -4.595
BIRC3 39.68 74.41 2.309 0.0333 0.9837 -4.595
NKAIN1 -30.27 51.41 -2.255 0.03713 0.9837 -4.595
EGF 36.34 33.59 2.206 0.04093 0.9837 -4.595
CENPP -10.27 23.41 -2.205 0.041 0.9837 -4.595
ABI3BP -22.02 50.86 -2.166 0.0443 0.9837 -4.595
CH25H 12.37 17.14 2.149 0.04579 0.9837 -4.595
HLA-DOA 12.24 21.05 2.136 0.047 0.9837 -4.595

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 68.1 68.1 40.4 1.2 0.5214 38.9 0.03136
PC2 7.381 75.48 36.8 8.1 0.7127 36.3 0.08777
PC3 3.66 79.14 10.1 3.2 0.5891 8.5 0.7339
PC4 2.258 81.39 13.5 13.2 0.3098 7.9 0.9964
PC5 2.026 83.42 58.8 14.3 0.8149 58.6 0.00517
PC6 1.867 85.29 34.3 0.9 0.1656 26.2 0.06646
PC7 1.707 86.99 20.3 2.2 0.3529 16 0.3114
PC8 1.44 88.43 7.6 0.1 0.4816 4.8 0.7107
PC9 1.349 89.78 9 0.1 0.5133 6.6 0.6557
PC10 1.294 91.08 2.4 1.7 0.8449 2.2 0.9873
PC11 1.226 92.3 3.1 0.6 0.834 2.9 0.9292
PC12 1.04 93.34 19.8 0 0.3823 16 0.2773
PC13 0.9501 94.29 15.5 4.4 0.1239 2.5 0.54
PC14 0.9258 95.22 5.2 0.5 0.9864 5.2 0.8394
PC15 0.8654 96.09 5.2 0.2 0.4658 2.1 0.826
PC16 0.8473 96.93 6.6 5.5 0.6177 5.2 0.9766
PC17 0.6755 97.61 53.2 34.5 0.00328 21 0.1186
PC18 0.6539 98.26 20 1 0.8669 19.8 0.2926
PC19 0.619 98.88 8.9 3.9 0.2887 2.4 0.8164
PC20 0.6021 99.48 20.2 0.2 0.4108 16.8 0.272
PC21 0.5167 100 9.2 4.3 0.2078 0 0.8206
PC22 1.143e-28 100 27.5 5.7 0.812 27.3 0.2034

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.06147
## p-value = 0
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1389
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 2