BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 150629 Batch 150722 Batch 151218 Batch 170217
Condition crowned 4 3 5 0
Condition worker 0 3 2 6

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.8108 0.6998

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.41 0 0.212
1st Qu. 26.32 0.522 24.65
Median 37.62 2.354 36.32
Mean 38.38 5.061 36.97
3rd Qu. 49.85 7.007 48.84
Max. 91.34 47.12 89.81

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 1.041e-09 0.008028 0.05534 0.1528 0.2107 0.9985 0.4851
Condition P-values 0.001539 0.4855 0.6946 0.654 0.8553 1 0.006726

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
FCGR1A 28.76 66.61 2.995 0.007558 1 -4.595
DDO -4495 4271 -2.949 0.008366 1 -4.595
SLC9A4 34.07 55.43 2.918 0.008946 1 -4.595
NFATC2 22.24 79.26 2.742 0.01312 1 -4.595
UNG 34.51 86.65 2.724 0.01363 1 -4.595
PCDHB9 -32.66 45.26 -2.702 0.01429 1 -4.595
FMNL1 121.4 531.3 2.535 0.02038 1 -4.595
PPP1R3E 9.073 10.74 2.523 0.02091 1 -4.595
BIRC3 29.76 73.13 2.505 0.0217 1 -4.595
PPFIBP2 143.5 572.3 2.488 0.02249 1 -4.595

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 51.24 51.24 44.2 0.8 0.8729 44.1 0.01387
PC2 12.06 63.3 82.4 31.7 0.4103 81.7 2e-05
PC3 7.245 70.54 9.3 0.5 0.5679 7.6 0.6331
PC4 4.362 74.91 12.4 0.9 0.3639 8.2 0.515
PC5 3.722 78.63 64.8 0.6 0.917 64.8 0.00026
PC6 2.563 81.19 8.8 5 0.5967 7.4 0.8584
PC7 2.031 83.22 6.5 0.3 0.8703 6.4 0.7546
PC8 1.781 85 7 0 0.4276 3.6 0.72
PC9 1.654 86.66 10.4 7 0.1768 0.6 0.8766
PC10 1.538 88.19 6.7 1.6 0.9394 6.6 0.8079
PC11 1.492 89.69 8.9 0 0.4492 5.9 0.6309
PC12 1.41 91.1 7.2 0.4 0.399 3.3 0.7282
PC13 1.155 92.25 2.7 1.1 0.7166 1.9 0.9595
PC14 1.069 93.32 7.1 3.6 0.8079 6.8 0.8805
PC15 1.055 94.37 18.3 9.9 0.4098 15 0.6139
PC16 1.008 95.38 33.9 16.5 0.01437 6.9 0.231
PC17 0.873 96.26 4.1 1.5 0.9639 4.1 0.9214
PC18 0.8531 97.11 6.7 1.3 0.7784 6.3 0.7912
PC19 0.788 97.9 28.8 1.7 0.05399 11.9 0.1139
PC20 0.7564 98.65 2.1 0.3 0.9064 2.1 0.9506
PC21 0.6867 99.34 19.8 12.5 0.07855 4.3 0.6571
PC22 0.6594 100 8.1 2.9 0.2435 0.7 0.7983
PC23 4.987e-29 100 30.6 21.6 0.3018 26.3 0.5221

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.043
## p-value = 0
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1355
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 2