BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 150629 Batch 150722 Batch 151218 Batch 170208
Condition crowned 4 2 5 0
Condition worker 0 3 2 6

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.8283 0.7225

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0.219 0 0.067
1st Qu. 21.26 1.149 17.32
Median 37.07 4.938 32.74
Mean 38.17 8.321 34.3
3rd Qu. 54.26 12.81 49.88
Max. 98.3 61.59 98.18

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 1.354e-14 0.009383 0.1059 0.2378 0.4017 0.9994 0.4033
Condition P-values 6.406e-05 0.2082 0.4499 0.4663 0.7136 1 0.06121

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
CTBP2 359.8 1525 5.35 4.704e-05 0.4455 -4.539
ALDH18A1 404.4 1443 4.989 0.000101 0.4455 -4.542
ATAD2 103 299 4.984 0.0001022 0.4455 -4.542
NOL8 168.6 878.5 4.912 0.0001192 0.4455 -4.543
APLF 148.1 396.7 4.627 0.0002204 0.6012 -4.546
IFITM10 -719.7 450.6 -4.515 0.0002814 0.6012 -4.547
TNFSF10 146.9 362.2 4.421 0.0003457 0.6012 -4.548
MUT 1119 2776 4.418 0.0003478 0.6012 -4.548
EHD3 333.9 820.2 4.4 0.000362 0.6012 -4.548
SHF -52.5 110 -4.193 0.0005702 0.8036 -4.551

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 28.87 28.87 68.7 3.7 0.172 64.9 0.00021
PC2 16.67 45.53 59.5 33.2 0.2924 56.7 0.03318
PC3 8.366 53.9 17.5 0 0.6989 16.7 0.3395
PC4 4.781 58.68 14 5.8 0.3728 9.7 0.6637
PC5 4.628 63.31 6 0.2 0.5451 3.9 0.7919
PC6 4.283 67.59 37.8 7 0.5049 36.1 0.07058
PC7 3.588 71.18 8.3 6.8 0.5163 5.9 0.9623
PC8 2.991 74.17 32 3.8 0.03247 10.3 0.1083
PC9 2.891 77.06 14 4.9 0.2757 7.6 0.6237
PC10 2.769 79.83 14.5 0 0.9324 14.5 0.4337
PC11 2.514 82.34 10.2 4.7 0.7125 9.5 0.7898
PC12 2.346 84.69 8 0.8 0.3662 3.4 0.7225
PC13 2.17 86.86 5.4 3.2 0.4367 1.9 0.9372
PC14 1.979 88.84 19.7 2.5 0.7216 19.1 0.3351
PC15 1.842 90.68 8.2 3.2 0.2979 2 0.817
PC16 1.768 92.45 17.9 3.2 0.3397 13.2 0.4127
PC17 1.721 94.17 23.9 7 0.04968 3.8 0.3215
PC18 1.635 95.8 10.3 1.4 0.7361 9.7 0.6455
PC19 1.481 97.29 13.9 4 0.1923 4.6 0.5897
PC20 1.415 98.7 6 3.9 0.4572 2.8 0.9447
PC21 1.299 100 4.3 0.9 0.7632 3.7 0.8946
PC22 5.891e-29 100 43.8 17.9 0.8862 43.7 0.08481

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 4 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.04297
## p-value = 0
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.196
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 1