BatchQC Report

Tests for checking Batch Effects

Summary

Confounding

Number of samples in each Batch and Condition

  Batch 151113 Batch 170208
Condition crowned 12 0
Condition worker 6 5

Measures of confounding between Batch and Condition

  Standardized Pearson Correlation Coefficient Cramer’s V
Confounding Coefficients (0=no confounding, 1=complete confounding) 0.682 0.5505

Variation Analysis

Variation explained by Batch and Condition

  Full (Condition+Batch) Condition Batch
Min. 0 0 0
1st Qu. 5.697 1.812 0.437
Median 10.72 5.468 1.766
Mean 12.11 7.289 4.337
3rd Qu. 16.39 10.7 5.273
Max. 81.33 54.47 80.83

P-value Analysis

Distribution of Batch and Condition Effect p-values Across Genes

  Min. 1st Qu. Median Mean 3rd Qu. Max. Ps<0.05
Batch P-values 4.06e-07 0.2077 0.4219 0.4591 0.7031 1 0.04219
Condition P-values 9.497e-05 0.1129 0.2481 0.3332 0.5069 1 0.09364

Differential Expression

Expression Plot

Boxplots for all values for each of the samples and are colored by batch membership.

LIMMA

  Condition: worker (logFC) AveExpr t P.Value adj.P.Val B
RND3 272.2 446 4.697 0.0001297 0.4698 -4.37
CDKL2 18.67 22.17 4.531 0.0001915 0.4698 -4.379
DUSP27 36.5 36.09 4.203 0.0004159 0.4698 -4.397
GPR146 48.5 128.2 4.086 0.0005488 0.4698 -4.404
PTPN14 69.92 181 4.059 0.0005847 0.4698 -4.405
FAT2 112.3 123 4.035 0.0006191 0.4698 -4.407
FREM2 44.17 70.78 4.002 0.0006697 0.4698 -4.409
FAM83B 38.08 58.7 3.968 0.0007262 0.4698 -4.411
MFSD6 99.58 241.1 3.966 0.0007294 0.4698 -4.411
GRIK2 44.5 105.7 3.951 0.0007552 0.4698 -4.412

Median Correlations

This plot helps identify outlying samples.

Heatmaps

Heatmap

This is a heatmap of the given data matrix showing the batch effects and variations with different conditions.

Sample Correlations

This is a heatmap of the correlation between samples.

Circular Dendrogram

This is a Circular Dendrogram of the given data matrix colored by batch to show the batch effects.

PCA: Principal Component Analysis

PCA

This is a plot of the top two principal components colored by batch to show the batch effects.

Explained Variation

  Proportion of Variance (%) Cumulative Proportion of Variance (%) Percent Variation Explained by Either Condition or Batch Percent Variation Explained by Condition Condition Significance (p-value) Percent Variation Explained by Batch Batch Significance (p-value)
PC1 46.33 46.33 12.6 10.1 0.1082 0.2 0.463
PC2 10.56 56.89 6.4 1.4 0.317 1.5 0.3137
PC3 9.053 65.94 8.4 0.7 0.714 7.7 0.2124
PC4 4.48 70.42 45.3 28 0.2114 40.7 0.02073
PC5 3.771 74.19 32.8 0.6 0.1939 26.7 0.00571
PC6 3.254 77.45 17.8 16.7 0.1797 9.9 0.6039
PC7 2.23 79.68 6.8 5.9 0.4954 4.5 0.6632
PC8 2.035 81.71 2.3 2.1 0.6632 1.4 0.8447
PC9 1.897 83.61 0.1 0.1 0.8981 0 0.9996
PC10 1.845 85.45 6.6 0.8 0.3508 2.3 0.2772
PC11 1.69 87.14 0.3 0.1 0.9887 0.3 0.8558
PC12 1.534 88.68 2.1 2.1 0.5933 0.7 0.9895
PC13 1.46 90.14 10.1 5.4 0.1547 0.3 0.3149
PC14 1.308 91.44 4.7 0.4 0.4485 1.8 0.3588
PC15 1.297 92.74 0.1 0.1 0.9129 0 0.9998
PC16 1.232 93.97 0.3 0 0.9574 0.3 0.817
PC17 1.166 95.14 7.2 3.7 0.2356 0.2 0.4006
PC18 1.108 96.25 3.1 2.6 0.4365 0.1 0.7397
PC19 1.064 97.31 8.1 3.3 0.2193 0.7 0.3175
PC20 0.9734 98.28 1.8 0.3 0.6179 0.5 0.5923
PC21 0.954 99.24 21.4 13.9 0.03049 0.1 0.1833
PC22 0.7616 100 1.8 1.6 0.5529 0.1 0.8105
PC23 4.698e-29 100 2.3 0.5 0.5566 0.6 0.5446

Shape

This is a heatmap plot showing the variation of gene expression mean, variance, skewness and kurtosis between samples grouped by batch to see the batch effects variation

## Note: Sample-wise p-value is calculated for the variation across samples on the measure across genes. Gene-wise p-value is calculated for the variation of each gene between batches on the measure across each batch. If the data is quantum normalized, then the Sample-wise measure across genes is same for all samples and Gene-wise p-value is a good measure.

Combat Plots

This is a plot showing whether parametric or non-parameteric prior is appropriate for this data. It also shows the Kolmogorov-Smirnov test comparing the parametric and non-parameteric prior distribution.

## Found 2 batches
## Adjusting for 1 covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(t2[1])): ties should not be present for the Kolmogorov-Smirnov test

## Warning in ks.test(gamma.hat[1, ], "pnorm", gamma.bar[1], sqrt(shinyInput$t2[1])): ties should not be present for the Kolmogorov-Smirnov test
## Warning in ks.test(delta.hat[1, ], invgam): p-value will be approximate in the presence of ties
## Batch mean distribution across genes: Normal vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.04411
## p-value = 0
## 
## 
## Batch Variance distribution across genes: Inverse Gamma vs Empirical distribution
## Two-sided Kolmogorov-Smirnov test
## Selected Batch: 1
## Statistic D = 0.159
## p-value = 0Note: The non-parametric version of ComBat takes much longer time to run and we recommend it only when the shape of the non-parametric curve widely differs such as a bimodal or highly skewed distribution. Otherwise, the difference in batch adjustment is very negligible and parametric version is recommended even if p-value of KS test above is significant.

SVA

Summary

## Number of Surrogate Variables found in the given data: 0