BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 150629 Batch 150722 Batch 151119 Batch 170217
Condition crowned 4 2 5 0
Condition worker 0 3 2 6

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.8283 0.7225

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.33 0 0.068
1st Qu. 22.28 1.469 19.13
Median 38.26 5.703 35.73
Mean 40.3 8.407 38.1
3rd Qu. 58.23 13.14 56.72
Max. 91.39 59.47 91.38

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 6.946e-08 0.004556 0.07067 0.2021 0.3284 0.9994 0.4576
Condition P-values 7.839e-05 0.3952 0.6302 0.5966 0.8207 1 0.01456

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
SPON1 -909 1103 -5.264 5.563e-05 0.8446 -4.595
BCAN -680.2 423.3 -4.26 0.0004881 1 -4.595
AFF2 -952.2 949.3 -4.003 0.0008589 1 -4.595
MANEA -120.3 499.3 -3.928 0.001014 1 -4.595
GSTM1 -462.2 2043 -3.811 0.001313 1 -4.595
BIRC3 45.55 120.8 3.554 0.002318 1 -4.595
DDC -845.8 1656 -3.466 0.002812 1 -4.595
GPNMB -64.82 110.4 -3.454 0.002892 1 -4.595
GPR115 -56.76 59.45 -3.442 0.002969 1 -4.595
THG1L -34.1 175 -3.278 0.004254 1 -4.595

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 34 34 64.2 15.7 0.8884 64.1 0.00188
PC2 20.96 54.96 50.8 2.2 0.9594 50.8 0.00747
PC3 5.283 60.24 44.9 20.7 0.3514 41.9 0.09618
PC4 4.125 64.37 19 0.4 0.6552 18 0.306
PC5 3.983 68.35 6.1 1.6 0.5732 4.3 0.8428
PC6 3.49 71.84 35.4 23.3 0.04156 16.9 0.3932
PC7 3.197 75.04 3.3 0.5 0.8607 3.1 0.9198
PC8 2.917 77.95 7.7 0 0.4882 5 0.7039
PC9 2.631 80.58 32.5 0.5 0.2233 26.1 0.07942
PC10 2.575 83.16 0.7 0.1 0.8889 0.6 0.9895
PC11 2.198 85.36 10.3 5.2 0.1975 0.8 0.8083
PC12 1.959 87.31 6.2 5.5 0.5494 4.1 0.9882
PC13 1.793 89.11 24.1 6.9 0.1521 14.1 0.3127
PC14 1.653 90.76 1 0 0.9944 1 0.9812
PC15 1.625 92.39 7.4 2.1 0.9126 7.4 0.8042
PC16 1.479 93.87 6.1 2.5 0.4524 2.9 0.8818
PC17 1.367 95.23 17.4 5.9 0.08284 0.9 0.5172
PC18 1.27 96.5 20.1 2.5 0.09075 5 0.3242
PC19 1.205 97.71 4.2 1.6 0.4446 0.7 0.9272
PC20 1.183 98.89 33.6 1 0.3613 30.2 0.07235
PC21 1.11 100 5 1.8 0.4819 2.1 0.8989
PC22 4.684e-29 100 25.6 2.8 0.5894 24.3 0.1969

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.04356
## p-value = 0
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.152
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 0