The aim of the experiment is to visualize the changes in total and cytosolic concentrations of EGFP-tagged Ede1 in the three strains used throughout the paper: wild-type background, 3∆, and overexpression driven by the ADH1 promoter.
All strains within one dataset were imaged on the same day and in the same conditions on the Nikon Ti microscope equipped with a Yokogawa CSW1 spinning disk and a Photometrics Prime 95B sCMOS camera. The Cellular and Cytosolic acquisitions were taken on different days so they are not directly comparable; the only meaningful comparison is between different strains within one localization / dataset combination.
Two different series of exposures were taken: high exposure for cytosolic quantification (saturated condensates) and low exposure for total quantification (avoiding any saturation). The cytosolic quantification was done on cells co-expressing Abp1-mCherry, and the total on cells co-expressing Rvs167-mCherry (explanation below).
strain | nickname | mCherry |
---|---|---|
MKY0381 | wt | Abp1 |
MKY0583 | 3Δ | Abp1 |
MKY3448 | oe | Abp1 |
MKY4450 | wt | Rvs167 |
MKY3487 | 3Δ | Rvs167 |
MKY3488 | oe | Rvs167 |
Single planes were acquired at the equatorial plane using 488 nm laser at 100% power, with 500 ms exposure, and camera gain level 2.
The three strains used were the ones co-expressing Abp1-mCherry.
Two-color stacks were acquired with 0.2 μm spacing and 5 μm range around the equatorial plane.
Initially both the Cytosolic and Cellular datasets were taken during the same acquisition, using the Abp1-mCherry strains. However, Abp1 turned out suboptimal for masking entire cells (too much contrast between sites and cytosol), and I repeated the experiment with the Rvs167 strains. The diffuse signal of Rvs167-mCherry makes it a perfect protein to create cellular masks. The results were ultimately very similar on average, but I choose to show Ede1-GFP/Rvs167-mCherry because the quantification is more precise.
All images were background-subtracted using the ImageJ rolling ball algorithm with a 50px radius.
5x5 px square regions away from the condensates and vacuoles were manually selected in ImageJ and mean pixel intensity was saved to file.
Individual cells were cropped in ImageJ and the function batch_mask
from my personal Python package mkimage
was used to quantify intensity. Briefly, the RFP channel was median-filtered with a disk brush of 10 px radius, and thresholded using Li’s method followed by one round of morphological opening to create a mask. The mask was applied to GFP images and summary statistics of the masked regions (mean, median, sd) were saved.
The mean pixel values of cytosolic regions and entire cells were loaded from text files using a separate R notebook. That notebook was used only to gather all observations in the tidy data structures accessible here, without any further processing.
The table below summarizes intensity measurements from each dataset, as well as normalized intensities. The normalization was performed for each dataset by dividing each observation by the mean wild-type intensity for that dataset.
localization | strain | dataset | n | intensity_mean | intensity_sd | intensity_median | intensity_se | normalized_mean | normalized_sd | normalized_median | normalized_se |
---|---|---|---|---|---|---|---|---|---|---|---|
Cellular | wt | 1 | 120 | 4.464492 | 0.9466484 | 4.2845 | 0.0864168 | 1.0000000 | 0.2120395 | 0.9596837 | 0.0193565 |
Cellular | wt | 2 | 120 | 4.341617 | 0.8611170 | 4.2470 | 0.0786089 | 1.0000000 | 0.1983402 | 0.9782070 | 0.0181059 |
Cellular | wt | 3 | 120 | 4.428617 | 0.9096351 | 4.3150 | 0.0830379 | 1.0000000 | 0.2053994 | 0.9743449 | 0.0187503 |
Cellular | 3Δ | 1 | 120 | 5.262183 | 1.2097965 | 5.2855 | 0.1104388 | 1.1786747 | 0.2709819 | 1.1838974 | 0.0247372 |
Cellular | 3Δ | 2 | 120 | 5.507100 | 1.2369592 | 5.4540 | 0.1129184 | 1.2684446 | 0.2849075 | 1.2562141 | 0.0260084 |
Cellular | 3Δ | 3 | 117 | 5.567872 | 1.2579994 | 5.4290 | 0.1163021 | 1.2572485 | 0.2840615 | 1.2258907 | 0.0262615 |
Cellular | oe | 1 | 120 | 13.907208 | 5.2141612 | 13.4065 | 0.4759856 | 3.1150710 | 1.1679182 | 3.0029175 | 0.1066159 |
Cellular | oe | 2 | 119 | 14.113403 | 5.8849060 | 13.5730 | 0.5394684 | 3.2507254 | 1.3554642 | 3.1262548 | 0.1242552 |
Cellular | oe | 3 | 120 | 17.306900 | 6.4476950 | 17.1480 | 0.5885913 | 3.9079698 | 1.4559163 | 3.8720895 | 0.1329064 |
Cytosolic | wt | 1 | 141 | 13.337979 | 3.3300969 | 12.9200 | 0.2804448 | 1.0000000 | 0.2496703 | 0.9686625 | 0.0210260 |
Cytosolic | wt | 2 | 140 | 14.197786 | 3.1509078 | 13.6800 | 0.2663003 | 1.0000000 | 0.2219295 | 0.9635305 | 0.0187565 |
Cytosolic | wt | 3 | 142 | 10.774648 | 2.1687790 | 10.7000 | 0.1819999 | 1.0000000 | 0.2012854 | 0.9930719 | 0.0168915 |
Cytosolic | 3Δ | 1 | 136 | 14.372368 | 4.1426283 | 13.8600 | 0.3552274 | 1.0775521 | 0.3105889 | 1.0391380 | 0.0266328 |
Cytosolic | 3Δ | 2 | 140 | 13.182186 | 3.9816074 | 12.9400 | 0.3365072 | 0.9284677 | 0.2804386 | 0.9114097 | 0.0237014 |
Cytosolic | 3Δ | 3 | 140 | 10.770571 | 2.5775639 | 10.4600 | 0.2178439 | 0.9996217 | 0.2392249 | 0.9707974 | 0.0202182 |
Cytosolic | oe | 1 | 135 | 13.519489 | 3.7559074 | 12.8400 | 0.3232570 | 1.0136085 | 0.2815949 | 0.9626646 | 0.0242358 |
Cytosolic | oe | 2 | 140 | 14.344321 | 3.0766393 | 14.0000 | 0.2600235 | 1.0103210 | 0.2166985 | 0.9860693 | 0.0183144 |
Cytosolic | oe | 3 | 140 | 11.012286 | 2.7679976 | 10.7000 | 0.2339385 | 1.0220553 | 0.2568991 | 0.9930719 | 0.0217119 |
Quick summary of normalized mean
from above expressed as average values of repeat-level means with 95% confidence intervals:
localization | strain | mean | lower | upper |
---|---|---|---|---|
Cellular | wt | 1.00 | 1.00 | 1.00 |
Cellular | 3Δ | 1.23 | 1.11 | 1.36 |
Cellular | oe | 3.42 | 2.37 | 4.48 |
Cytosolic | wt | 1.00 | 1.00 | 1.00 |
Cytosolic | 3Δ | 1.00 | 0.82 | 1.19 |
Cytosolic | oe | 1.02 | 1.00 | 1.03 |
This hints that the true mean of cytosolic fluorescence is unchanged in different strains, and that 3Δ and overexpression strains feature some (sgnificant?) levels of overexpression. More accurate estimates and p-values can be found in the ‘Modeling’ tab.
I have chosen to show this data using the SuperPlot style. Each point shows mean intensity of a small region in the cytoplasm of an individual cell, or one entire cell (separated into facets).
Big colour points show mean measurements from three independent repeats.
Range is mean +/- SD, calculated based on the three independent repeats.
These plots show raw intensity measures. The dataset effect is not so bad in the Cellular data, which were imaged 3 days in a row. It is quite pronounced in the Cytosolic data, where the facility realigned the laser fiber in the microscope between datasets 2 and 3, affecting the absolute values.
Also, if it seems weird that the cytosolic values are higher than the cellular ones, that is not accidental. The exposure of the cellular data had to be kept low in order to avoid saturating the condensates. In the cytosolic data, the exposures were high in order to maximize the cytosolic signal, allowing for the condensates to saturate.
All observations have been scaled to the mean value of the wild-type strain of each dataset. This looks good, but has the unfortunate side-effect of zeroing the variance of the dataset means in the wild-type strain.
This breaks all the assumptions of ANOVA if we wanted to run it on repeat-level means. It will be better to show the raw data, and make a linear model which accounts for batch effects to get estimated means, effect sizes and p-values.
We want to find out how the mean intensity values of cytoplasm or entire cells depend on the strain used (null hypothesis is that they do not).
In other experiments I followed the recommendations of several papers and performed hypothesis tests on repeat-level means.
This approach takes between-experiment variability into account, and avoids p-value inflation, where tiny differences might end up ‘highly significant’ due to testing a large number of observations which are not necessarily independent.
But it’s also very conservative because using a small N of repeats means losing power when compared to the hundreds of cell-level observations.
Because we have a complete block design (all strains are represented in all batches), we can include the dataset as a fixed or random effect. Strain is obviously the fixed effect we want to analyze. Because we do not care about estimating the effect of any particular batch, a mixed model with the dataset as a random effect will be appropriate.
Fitting a linear mixed model using the lmerTest
package (lmer
comes from lme4
, lmerTest
just ads an interface for comparing models):
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: intensity ~ strain + (1 | dataset)
## Data: cell_data
##
## REML criterion at convergence: 5798.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2952 -0.3397 -0.0084 0.3135 7.2404
##
## Random effects:
## Groups Name Variance Std.Dev.
## dataset (Intercept) 0.4261 0.6528
## Residual 12.7450 3.5700
## Number of obs: 1076, groups: dataset, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 4.4116 0.4212 2.6582 10.473 0.003101 **
## strain3Δ 1.0392 0.2667 1071.0032 3.897 0.000103 ***
## strainoe 10.6995 0.2663 1070.9998 40.182 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strn3Δ
## strain3Δ -0.315
## strainoe -0.316 0.499
We have significant effect of strain
on intensity according to an F-test:
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## strain 24995 12497 2 1071 980.57 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dataset
does not seem to explain much variance, but it definitely could due to the nature of measuring absolute fluorescence values, as we see with the Cytosolic data. It makes sense to keep it in the model.
In the plots we saw that maybe the data is not really normal, as is typical for fluorescence values. Model residuals are also not normal:
This is a big departure from normality, with a strong skew, although it looks better on the histogram More importantly, the variance in the data was also not homogeneous (bigger spread in OE intensities). We can see this reflected in the residuals:
This is really not good, but then again the conclusions of the experiment are obvious and do not hang on the quality of the modeling exercise. Probably log-transform or a GLM could help; lets stick to transforming the data.
We specify the LMM with log(intensity) ~ strain + (1|dataset)
:
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log(intensity) ~ strain + (1 | dataset)
## Data: cell_data
##
## REML criterion at convergence: 479
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2742 -0.6000 0.0491 0.6506 3.6589
##
## Random effects:
## Groups Name Variance Std.Dev.
## dataset (Intercept) 0.002219 0.04711
## Residual 0.089617 0.29936
## Number of obs: 1076, groups: dataset, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.464e+00 3.144e-02 2.885e+00 46.546 3.07e-05 ***
## strain3Δ 2.056e-01 2.236e-02 1.071e+03 9.195 < 2e-16 ***
## strainoe 1.168e+00 2.233e-02 1.071e+03 52.310 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strn3Δ
## strain3Δ -0.354
## strainoe -0.355 0.499
We again have strong effect of strain
and a random effect with very little explanatory power for dataset
. Let’s look at Q-Q plot and histogram of model residuals:
The residual distribution is still fat-tailed and skews slightly left now, but the transformation helped. More importantly, the differences in spread also decreased, although they are still there:
We could try a gamma GLM, but again, the absolute quality of the model is not critical to the conclusions here so let’s not over complicate.
I say we stop here and evaluate the group contrasts. The emmeans
package has a wonderful interface for pairwise contrasts, and it computes group means with confidence intervals.
The means and CIs are computed in log-space and back-transformed, giving the absolute fluorescence values:
strain | response | std.error | df | conf.low | conf.high |
---|---|---|---|---|---|
wt | 4.321147 | 0.1358662 | 2.885930 | 3.900804 | 4.786785 |
3Δ | 5.307548 | 0.1670583 | 2.898204 | 4.791964 | 5.878605 |
oe | 13.895157 | 0.4370472 | 2.889984 | 12.544115 | 15.391711 |
We can also compute differences between groups along with their CIs and p-values for pairwise comparisons. Because the differences are computed in log-space, we get ratios after back-transforming. This is actually very useful because the ratio to wild-type is what we wanted to know in the first place.
contrast | ratio | SE | df | lower.CL | upper.CL | null | t.ratio | p.value |
---|---|---|---|---|---|---|---|---|
3Δ / wt | 1.228273 | 0.0274644 | 1071.005 | 1.165477 | 1.294452 | 1 | 9.195339 | 0 |
oe / wt | 3.215618 | 0.0718002 | 1071.001 | 3.051444 | 3.388625 | 1 | 52.310471 | 0 |
oe / 3Δ | 2.617999 | 0.0585798 | 1071.007 | 2.484062 | 2.759158 | 1 | 43.011254 | 0 |
Finally, the emmeans
interface allows us to plot the modeled means along with confidence intervals (bars) and confidence intervals for the group differences (arrows).
We can conclude that all groups are significantly different, with extremely low p-values (not that it matters). 3∆ background increases Ede1 levels by a modest ~20%, and ADH1-driven overexpression by ~200%.
After looking at the plots, we go into this exercise already suspecting that strain
has no effect. But let’s build a model as before:
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: intensity ~ strain + (1 | dataset)
## Data: cyto_data
##
## REML criterion at convergence: 6555.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.4170 -0.6684 -0.1093 0.5928 3.5146
##
## Random effects:
## Groups Name Variance Std.Dev.
## dataset (Intercept) 2.925 1.710
## Residual 10.810 3.288
## Number of obs: 1254, groups: dataset, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.277e+01 1.000e+00 2.070e+00 12.766 0.00533 **
## strain3Δ -8.837e-04 2.270e-01 1.249e+03 -0.004 0.99689
## strainoe 1.936e-01 2.272e-01 1.249e+03 0.852 0.39434
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strn3Δ
## strain3Δ -0.113
## strainoe -0.112 0.496
After modeling the fixed effect of strain
, dataset explains ~25% of the variance. strain
itself is not a significant predictor:
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## strain 10.45 5.2249 2 1249 0.4833 0.6168
We really could stop here, since strain
does not explain anything, and that’s as clear from the model as from the plots. But let’s check model residuals:
This is not so bad actually but we have some fat tails, and a small difference in spread for larger values. Log-transform helped before so let’s see what it does here:
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log(intensity) ~ strain + (1 | dataset)
## Data: cyto_data
##
## REML criterion at convergence: 146.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0764 -0.6464 0.0103 0.6726 2.9964
##
## Random effects:
## Groups Name Variance Std.Dev.
## dataset (Intercept) 0.01834 0.1354
## Residual 0.06437 0.2537
## Number of obs: 1254, groups: dataset, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.515e+00 7.916e-02 2.066e+00 31.770 0.000822 ***
## strain3Δ -1.259e-02 1.752e-02 1.249e+03 -0.719 0.472512
## strainoe 9.595e-03 1.753e-02 1.249e+03 0.547 0.584248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) strn3Δ
## strain3Δ -0.110
## strainoe -0.110 0.496
This is better, error-wise:
But strain
is still not significant:
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## strain 0.10288 0.051442 2 1249 0.7991 0.4499
And if we use emmeans
to get contrasts, we find no significant differences between groups:
contrast | ratio | SE | df | lower.CL | upper.CL | null | t.ratio | p.value |
---|---|---|---|---|---|---|---|---|
3Δ / wt | 0.9874892 | 0.0173002 | 1249.001 | 0.9477174 | 1.028930 | 1 | -0.7186153 | 0.7525132 |
oe / wt | 1.0096410 | 0.0176992 | 1249.001 | 0.9689525 | 1.052038 | 1 | 0.5473328 | 0.8478621 |
oe / 3Δ | 1.0224324 | 0.0179974 | 1249.000 | 0.9810616 | 1.065548 | 1 | 1.2602988 | 0.4180250 |
Since fluorescence intensity values are not meaningful, I am skipping the estimated group means here.
We cannot detect any differences in cytosolic Ede1 intensities across different strains.
We see significant differences in mean total cellular levels of Ede1, namely:
A table summarising all group contrasts:
contrast | ratio | lower.CL | upper.CL | localization |
---|---|---|---|---|
3Δ / wt | 0.99 | 0.95 | 1.03 | Cytosolic |
oe / wt | 1.01 | 0.97 | 1.05 | Cytosolic |
oe / 3Δ | 1.02 | 0.98 | 1.07 | Cytosolic |
3Δ / wt | 1.23 | 1.17 | 1.29 | Cellular |
oe / wt | 3.22 | 3.05 | 3.39 | Cellular |
oe / 3Δ | 2.62 | 2.48 | 2.76 | Cellular |
R session used to generate this document.
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] emmeans_1.7.2 lmerTest_3.1-3 lme4_1.1-27.1 Matrix_1.3-3
## [5] ggsignif_0.6.2 ggbeeswarm_0.6.0 broom_0.7.9 knitr_1.36
## [9] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [13] readr_2.0.1 tidyr_1.1.3 tibble_3.1.3 ggplot2_3.3.5
## [17] tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] TH.data_1.1-0 minqa_1.2.4 colorspace_2.0-2
## [4] ellipsis_0.3.2 estimability_1.3 htmlTable_2.2.1
## [7] base64enc_0.1-3 fs_1.5.0 rstudioapi_0.13
## [10] farver_2.1.0 fansi_0.5.0 mvtnorm_1.1-3
## [13] lubridate_1.7.10 xml2_1.3.2 codetools_0.2-18
## [16] splines_4.1.0 Formula_1.2-4 jsonlite_1.7.2
## [19] nloptr_1.2.2.2 pbkrtest_0.5.1 cluster_2.1.2
## [22] dbplyr_2.1.1 png_0.1-7 compiler_4.1.0
## [25] httr_1.4.2 backports_1.2.1 assertthat_0.2.1
## [28] cli_3.1.0 htmltools_0.5.1.1 tools_4.1.0
## [31] gtable_0.3.0 glue_1.4.2 Rcpp_1.0.7
## [34] cellranger_1.1.0 jquerylib_0.1.4 vctrs_0.3.8
## [37] nlme_3.1-152 xfun_0.25 rvest_1.0.1
## [40] mime_0.11 lifecycle_1.0.0 MASS_7.3-54
## [43] zoo_1.8-9 scales_1.1.1 hms_1.1.0
## [46] parallel_4.1.0 sandwich_3.0-1 RColorBrewer_1.1-2
## [49] yaml_2.2.1 gridExtra_2.3 rpart_4.1-15
## [52] latticeExtra_0.6-29 stringi_1.7.3 highr_0.9
## [55] checkmate_2.0.0 boot_1.3-28 rlang_0.4.11
## [58] pkgconfig_2.0.3 evaluate_0.14 lattice_0.20-44
## [61] htmlwidgets_1.5.3 labeling_0.4.2 tidyselect_1.1.1
## [64] magrittr_2.0.1 R6_2.5.1 generics_0.1.0
## [67] Hmisc_4.5-0 multcomp_1.4-18 DBI_1.1.1
## [70] pillar_1.6.2 haven_2.4.3 foreign_0.8-81
## [73] withr_2.4.2 survival_3.2-11 nnet_7.3-16
## [76] modelr_0.1.8 crayon_1.4.1 utf8_1.2.2
## [79] tzdb_0.1.2 rmarkdown_2.11 jpeg_0.1-9
## [82] grid_4.1.0 readxl_1.3.1 data.table_1.14.0
## [85] reprex_2.0.1 digest_0.6.27 xtable_1.8-4
## [88] numDeriv_2016.8-1.1 munsell_0.5.0 beeswarm_0.4.0
## [91] vipor_0.4.5