BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 151113 Batch 170203 Batch 170208
Condition crowned 12 0 0
Condition worker 6 4 2

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.7071 0.5774

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.013 0 0
1st Qu. 8.312 0.838 5.114
Median 16.63 3.771 12.77
Mean 20.28 6.33 17.57
3rd Qu. 29.03 9.73 26.58
Max. 97.96 46.05 97.95

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 0 0.07976 0.289 0.3589 0.6046 1 0.1923
Condition P-values 0.0009045 0.354 0.5976 0.5737 0.809 1 0.01706

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
KNTC1 92 314.5 3.942 0.0007669 1 -4.595
GPR157 -22.58 33.46 -3.672 0.001452 1 -4.595
POU2F2 207.2 518.7 3.288 0.003565 1 -4.595
HELLS 113.5 607.2 3.236 0.004026 1 -4.595
CTSO -212.3 913.8 -3.21 0.004277 1 -4.595
IKZF2 146.6 545.5 3.148 0.004922 1 -4.595
STOML1 60.67 190.2 3.083 0.005717 1 -4.595
SWSAP1 -25.92 141.8 -3.041 0.006292 1 -4.595
CEP55 38.75 99.67 3.016 0.006672 1 -4.595
FCAMR 74.92 73.54 2.935 0.008006 1 -4.595

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 33.93 33.93 28.9 9.1 0.9626 28.9 0.08619
PC2 8.822 42.75 32.4 4 0.4455 30.3 0.03015
PC3 7.268 50.02 8.1 6.2 0.2879 2.7 0.8124
PC4 6.219 56.24 32.6 18.5 0.1274 24.1 0.1486
PC5 4.803 61.04 4.3 0.3 0.597 2.9 0.6654
PC6 4.368 65.41 24.2 2.9 0.9104 24.2 0.08399
PC7 3.84 69.25 2.9 0.2 0.6988 2.1 0.7602
PC8 3.064 72.31 27.7 3.9 0.8422 27.5 0.05826
PC9 2.857 75.17 19.9 0.3 0.8061 19.7 0.1109
PC10 2.795 77.97 2.8 0.1 0.6466 1.8 0.7582
PC11 2.519 80.48 32.2 22 0.00829 3.2 0.2444
PC12 2.382 82.87 10.7 0 0.6898 10 0.3224
PC13 2.231 85.1 0.5 0 0.8961 0.4 0.95
PC14 2.047 87.14 1.4 0.3 0.9254 1.4 0.8889
PC15 1.827 88.97 13.9 2.1 0.143 3.9 0.277
PC16 1.766 90.74 4.1 1.5 0.3868 0.4 0.7651
PC17 1.631 92.37 9.4 7 0.2495 3 0.7748
PC18 1.447 93.82 4.2 1.2 0.8075 3.9 0.7294
PC19 1.366 95.18 9.1 1.1 0.4559 6.5 0.4299
PC20 1.326 96.51 3.6 1.2 0.4432 0.7 0.7781
PC21 1.231 97.74 1.1 0.2 0.7498 0.6 0.9116
PC22 1.155 98.89 7.7 5.8 0.2535 1.4 0.8079
PC23 1.106 100 17.9 12 0.05222 0.4 0.5023
PC24 6.891e-29 100 18.9 4.2 0.8346 18.7 0.1881

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 3 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.01573
## p-value = 0.001456
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.1765
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 1