BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 151110 Batch 170217
Condition crowned 12 0
Condition worker 6 6

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.7071 0.5774

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0 0 0
1st Qu. 3.291 0.362 1.009
Median 7.712 1.564 4.178
Mean 10.17 3.284 6.956
3rd Qu. 14.62 4.41 10.36
Max. 79.65 46.08 79.55

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 4.116e-07 0.1262 0.3542 0.4074 0.6641 1 0.1234
Condition P-values 0.00112 0.3249 0.5559 0.544 0.7749 1 0.0241

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
SLC40A1 398.7 1190 3.819 0.0009649 1 -4.593
INHBB -16.42 27.96 -3.712 0.001248 1 -4.593
COL24A1 8.333 6.417 3.497 0.002088 1 -4.593
PGR 158.2 195.2 3.477 0.002187 1 -4.593
FIBIN 63.75 90.79 3.355 0.002922 1 -4.593
LUZP6 45.33 342.4 3.339 0.003038 1 -4.593
TRNAU1AP 21.25 71 3.255 0.003703 1 -4.594
EXPH5 20.5 15.71 3.249 0.003752 1 -4.594
PAK1 22.83 37.46 3.197 0.004243 1 -4.594
CCBE1 22.75 23.83 3.189 0.004317 1 -4.594

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 18.29 18.29 19.7 1.3 0.4429 17.3 0.04004
PC2 14.93 33.22 0 0 0.9677 0 0.9432
PC3 9.113 42.33 11.1 0.1 0.4334 8.4 0.1223
PC4 7.607 49.94 11.3 7.4 0.6046 10.2 0.3466
PC5 4.373 54.31 10.9 3.8 0.9863 10.9 0.2074
PC6 4.117 58.43 18.4 0.1 0.1803 10.9 0.04181
PC7 3.886 62.31 17.4 3 0.7053 16.8 0.07013
PC8 3.551 65.86 1.5 0.4 0.9562 1.5 0.6282
PC9 3.219 69.08 13.1 13 0.1448 3.6 0.9101
PC10 2.898 71.98 16.1 12.4 0.05961 0.2 0.349
PC11 2.804 74.79 12 10.9 0.1214 1.1 0.611
PC12 2.757 77.54 12.4 11.3 0.3031 7.7 0.6208
PC13 2.559 80.1 8.2 8.1 0.253 2.1 0.9164
PC14 2.434 82.53 2.8 1 0.9795 2.8 0.5427
PC15 2.26 84.79 4.7 4.4 0.3528 0.6 0.7962
PC16 2.168 86.96 4.7 3.5 0.3205 0 0.6059
PC17 2.033 88.99 7.8 0.8 0.2945 2.8 0.2182
PC18 2.02 91.01 6.9 5.8 0.2363 0.3 0.6215
PC19 1.981 93 5.2 1.1 0.3515 1.1 0.3547
PC20 1.887 94.88 3.6 3.6 0.4844 1.3 0.9864
PC21 1.776 96.66 11.1 7.9 0.1209 0 0.394
PC22 1.734 98.39 0.8 0.1 0.7461 0.3 0.6873
PC23 1.608 100 0.4 0.1 0.8004 0.1 0.7859
PC24 1.998e-29 100 1.2 0.1 0.8695 1 0.6381

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 2 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.02067
## p-value = 6.57e-06
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1583
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 4