The input for the following analyses are all the transmission lineages containing more than 10 sequences from each location: (Norway, Australia, USA). For each of the transmission lineages we run skygrowth v.0.3.1 on the clades, and extract the growth rate of the effective population size in the time interval 2010-2017
## [1] Size of transmission lineages containing more than 10 sequences:
## [1] 185 121 90 85 70 57 56 51 27 23 20 18 17 17 14 14 13 12 11
## [20] 11 11 11 11
## [1] Number of transmission containing more than 10 sequences:
## [1] 23
With estimated arrival times (midpoint of the ancestral branch of the transmission lineage) in Norway:
## [1] Estimated arrival time in Norway
## [1] 1987.737 2003.177 2007.786 1981.407 2008.687 1945.542 1965.954 2005.998
## [9] 1942.936 2005.313 2008.090 2002.812 2015.061 2015.741 2008.050 1995.163
## [17] 2006.754 2005.377 1996.370 2000.257 2009.179 1979.664 2011.285
From skygrowth we find that the average growth rate of these effective population size of these lineage on the time interval 2010-2017 is:
## [1] Average growth rate in 2010-2017:
## [1] 0.14957086 0.40586782 0.18823745 0.21193805 1.42939245 -0.06648479
## [7] -0.08292415 0.37414602 -0.15004728 0.72711437 0.34760232 0.31184957
## [13] 1.95753800 0.23908737 0.59424712 -0.04110421 0.13925043 0.12100560
## [19] -0.16718384 NA 0.45422935 -0.12434018 0.38957345
The fourth to the last value produces the value “NA”, because this transmission lineage does not exist on the time interval [2010, 2017].
Similarly, for Australia and USA we find:
## [1] Size of transmission lineages containing more than 10 sequences:
## [1] 254 195 190 165 125 121 88 81 49 49 46 44 43 41 35 24 20 20 18
## [20] 18 16 15 15 15 14 13 12 12 12 11
## [1] Number of transmission containing more than 10 sequences:
## [1] 30
## [1] Estimated arrival time in Australia
## [1] 2009.832 2006.724 2012.104 2000.280 1985.136 2009.195 1999.664 1997.703
## [9] 2002.988 1993.818 1912.241 1980.317 2012.335 2011.512 2003.826 2013.257
## [17] 2010.635 1974.381 2010.632 2007.702 2004.582 2000.387 2009.536 2005.816
## [25] 2010.210 2011.727 2001.528 2012.804 2005.632 2014.190
## [1] Average growth rate in 2010-2017:
## [1] 0.73209313 1.11545224 1.00124041 0.31586499 0.46841077 0.86668767
## [7] 0.24254021 0.42180944 0.86064904 0.39355229 -0.12251161 0.13631717
## [13] 0.89808862 0.87791222 0.49186664 1.20640143 0.55365795 1.51277112
## [19] 0.51024777 0.33821078 0.05822009 0.09104814 0.48610210 0.29641919
## [25] 1.89199840 1.07345474 1.26148635 1.04581440 0.60096397 1.04929500
## [1] Size of transmission lineages containing more than 10 sequences:
## [1] 446 168 157 127 104 81 62 52 51 45 44 43 41 40 38 38 33 30 29
## [20] 29 27 27 25 25 22 22 19 18 16 15 14 14 13 13 11
## [1] Number of transmission containing more than 10 sequences:
## [1] 35
## [1] Estimated arrival time in USA
## [1] 1923.403 1900.930 1967.540 1917.941 1902.333 1997.361 1916.062 1890.556
## [9] 1923.501 1864.753 1971.060 1909.338 1985.211 1978.712 1939.994 1973.433
## [17] 2001.340 1974.247 1992.234 2001.689 1886.491 2003.135 1945.823 1974.156
## [25] 1966.038 1986.422 1963.833 2008.835 1890.400 1996.932 2002.723 2007.754
## [33] 1982.887 1991.598 2008.687
## [1] Average growth rate in 2010-2017:
## [1] 0.01705917 0.26534872 0.04530616 0.11855652 -0.01634738 0.21689110
## [7] -0.03370991 0.03230967 0.12271823 0.04961591 0.08800455 -0.09398067
## [13] 0.27308222 0.08980389 -0.08543731 -0.05440778 0.18701226 -0.12719074
## [19] NaN 0.30104825 0.05148252 0.19908136 -0.03968641 0.27248917
## [25] 0.03236765 0.13618597 NA 0.69691584 NA 0.06505375
## [31] 0.03942743 0.32173550 -0.11371844 -0.25372351 0.20878554
## [1] Size of transmission lineages containing more than 10 sequences:
## [1] 1084 525 294 265 212 168 111 98 97 84 69 67 65 64 63
## [16] 55 45 29 24 22 21 17 14 12 12 11 11 11 11 11
## [1] Number of transmission containing more than 10 sequences:
## [1] 30
## [1] Estimated arrival time in Europe
## [1] 1750.328 1909.398 1974.636 1976.957 1994.941 1991.804 1936.948 2007.786
## [9] 1974.315 1965.954 1987.639 1912.241 1985.873 1994.671 2006.973 2009.338
## [17] 1955.322 2003.033 1951.718 1957.023 1924.918 2015.741 1999.409 1993.150
## [25] 2005.377 2000.257 1977.074 2005.497 2003.715 2011.285
## [1] Average growth rate in 2010-2017:
## [1] -0.06587396 -0.10412601 -0.22890385 0.02746668 0.09484219 -0.16258100
## [7] 0.20671913 0.17773628 0.02506060 0.01871881 0.05809292 -0.06742322
## [13] -0.19023736 0.25169081 0.71211171 0.39924518 -0.05092549 -0.14266194
## [19] 0.09686703 -0.16964107 -0.23667508 0.80990472 0.03136371 NA
## [25] 0.12451574 NA NA 0.16936694 NA 0.38963834
To study the relation of the growth rates to metadata we collect the growth rates in a data frame along with variables that describe the sex distribution, penA status and mtrD status for each transmission lineage.
The dataset now looks like
## GrowthRate locations tlSize penA mtr sexDistribution
## 1 0.14957086 Norway 185 non-mosaic non-mosaic 0.9944134
## 2 0.40586782 Norway 121 non-mosaic non-mosaic 0.9666667
## 3 0.18823745 Norway 90 non-mosaic non-mosaic 0.5795455
## 4 0.21193805 Norway 85 non-mosaic non-mosaic 0.9759036
## 5 1.42939245 Norway 70 non-mosaic mosaic4 0.9565217
## 6 -0.06648479 Norway 57 non-mosaic non-mosaic 0.8596491
## lineage_age
## 1 31.21321
## 2 15.76705
## 3 11.15803
## 4 37.50995
## 5 9.79867
## 6 73.27176
## 'data.frame': 118 obs. of 7 variables:
## $ GrowthRate : num 0.15 0.406 0.188 0.212 1.429 ...
## $ locations : chr "Norway" "Norway" "Norway" "Norway" ...
## $ tlSize : num 185 121 90 85 70 57 56 51 27 23 ...
## $ penA : chr "non-mosaic" "non-mosaic" "non-mosaic" "non-mosaic" ...
## $ mtr : chr "non-mosaic" "non-mosaic" "non-mosaic" "non-mosaic" ...
## $ sexDistribution: num 0.994 0.967 0.58 0.976 0.957 ...
## $ lineage_age : num 31.2 15.8 11.2 37.5 9.8 ...
Here we investigate the growth rate and size of the transmission lineages, and the relation of these to the explanatory variables: penA, mtrD, sex distribution and location.
Lineage size versus variables explanatory variables
Doing the analyses individually for Norway/Australia, and Europe/USA. For the regression analyses we first consider epifactors alone: location and sex distribution (fraction M/(M+F)) and the interaction of these. We perform the comparisons for Norway versus Australia and Europe versus the USA - since these geographical regions are of more similar sizes. We run the regressions with 1. both factors included 2. one regression for each factor alone.
Next we look at the effects of mtr and penA variants. Since we believe location and sexDistribution to be important, we keep these in the model and their interaction
##
## Call:
## lm(formula = log(tlSize) ~ locations + lineage_age, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0768 -0.7827 -0.3150 0.7229 2.0502
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.413471 0.206356 16.542 <2e-16 ***
## locationsNorway -0.296233 0.268154 -1.105 0.275
## lineage_age 0.009019 0.006680 1.350 0.183
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.945 on 49 degrees of freedom
## Multiple R-squared: 0.05154, Adjusted R-squared: 0.01283
## F-statistic: 1.331 on 2 and 49 DF, p-value: 0.2735
##
## Call:
## lm(formula = GrowthRate ~ locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8117 -0.3566 -0.1112 0.2210 1.6208
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.68920 0.08811 7.822 3.13e-10 ***
## locationsNorway -0.35245 0.13546 -2.602 0.0122 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4826 on 50 degrees of freedom
## Multiple R-squared: 0.1193, Adjusted R-squared: 0.1016
## F-statistic: 6.77 on 1 and 50 DF, p-value: 0.01216
##
## Call:
## lm(formula = log(tlSize) ~ sexDistribution, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3707 -0.7063 -0.1555 0.5329 2.0296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9851 0.6755 2.939 0.00498 **
## sexDistribution 1.7835 0.8004 2.228 0.03039 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9162 on 50 degrees of freedom
## Multiple R-squared: 0.09033, Adjusted R-squared: 0.07214
## F-statistic: 4.965 on 1 and 50 DF, p-value: 0.03039
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75885 -0.37941 -0.08939 0.34655 1.38511
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8162 0.3771 2.165 0.0352 *
## sexDistribution -0.3331 0.4468 -0.746 0.4594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5114 on 50 degrees of freedom
## Multiple R-squared: 0.011, Adjusted R-squared: -0.008784
## F-statistic: 0.5559 on 1 and 50 DF, p-value: 0.4594
##
## Call:
## lm(formula = log(tlSize) ~ locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.73513 -0.76676 -0.00625 0.45569 2.85538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1330 0.2038 20.283 <2e-16 ***
## locationsUSA -0.4728 0.2784 -1.698 0.0952 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.039 on 54 degrees of freedom
## Multiple R-squared: 0.05071, Adjusted R-squared: 0.03313
## F-statistic: 2.885 on 1 and 54 DF, p-value: 0.09519
##
## Call:
## lm(formula = GrowthRate ~ locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32030 -0.14196 -0.04921 0.09541 0.72628
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.08363 0.04272 1.958 0.0555 .
## locationsUSA 0.01613 0.05837 0.276 0.7833
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2178 on 54 degrees of freedom
## Multiple R-squared: 0.001412, Adjusted R-squared: -0.01708
## F-statistic: 0.07638 on 1 and 54 DF, p-value: 0.7833
##
## Call:
## lm(formula = log(tlSize) ~ sexDistribution, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6967 -0.7352 -0.1383 0.5093 3.1308
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0107 0.6892 4.368 5.72e-05 ***
## sexDistribution 1.0839 0.8416 1.288 0.203
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.05 on 54 degrees of freedom
## Multiple R-squared: 0.0298, Adjusted R-squared: 0.01183
## F-statistic: 1.659 on 1 and 54 DF, p-value: 0.2033
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26150 -0.13946 -0.04598 0.10705 0.65959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1425 0.1393 -1.023 0.3108
## sexDistribution 0.2928 0.1701 1.722 0.0908 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2122 on 54 degrees of freedom
## Multiple R-squared: 0.05205, Adjusted R-squared: 0.03449
## F-statistic: 2.965 on 1 and 54 DF, p-value: 0.09082
##
## Call:
## lm(formula = log(tlSize) ~ sexDistribution + locations + sexDistribution *
## locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3550 -0.6703 -0.1953 0.5755 1.9347
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.8638 1.6994 -0.508 0.6136
## sexDistribution 5.2316 1.9975 2.619 0.0118 *
## locationsNorway 3.3344 1.8451 1.807 0.0770 .
## sexDistribution:locationsNorway -4.1718 2.1752 -1.918 0.0611 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8973 on 48 degrees of freedom
## Multiple R-squared: 0.1624, Adjusted R-squared: 0.11
## F-statistic: 3.102 on 3 and 48 DF, p-value: 0.03523
##
## Call:
## lm(formula = log(tlSize) ~ sexDistribution + locations + sexDistribution *
## locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7541 -0.6862 -0.1228 0.4973 2.8595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0463 1.0015 4.040 0.000177 ***
## sexDistribution 0.1057 1.1957 0.088 0.929903
## locationsUSA -1.7617 1.3708 -1.285 0.204410
## sexDistribution:locationsUSA 1.6449 1.6704 0.985 0.329322
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.037 on 52 degrees of freedom
## Multiple R-squared: 0.09025, Adjusted R-squared: 0.03777
## F-statistic: 1.72 on 3 and 52 DF, p-value: 0.1744
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution + locations + sexDistribution *
## locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.76053 -0.33307 -0.05834 0.25218 1.45709
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6632 0.9163 1.815 0.0758 .
## sexDistribution -1.1502 1.0770 -1.068 0.2909
## locationsNorway -1.0348 0.9949 -1.040 0.3035
## sexDistribution:locationsNorway 0.7876 1.1729 0.672 0.5051
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4838 on 48 degrees of freedom
## Multiple R-squared: 0.1502, Adjusted R-squared: 0.09712
## F-statistic: 2.829 on 3 and 48 DF, p-value: 0.04826
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution + locations + sexDistribution *
## locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.22576 -0.15290 -0.05402 0.10312 0.63554
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3796 0.2041 -1.860 0.0686 .
## sexDistribution 0.5648 0.2437 2.317 0.0244 *
## locationsUSA 0.4401 0.2794 1.575 0.1212
## sexDistribution:locationsUSA -0.5148 0.3404 -1.512 0.1365
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2113 on 52 degrees of freedom
## Multiple R-squared: 0.09559, Adjusted R-squared: 0.04341
## F-statistic: 1.832 on 3 and 52 DF, p-value: 0.1528
##
## Call:
## lm(formula = log(tlSize) ~ penA + mtr + sexDistribution + locations +
## sexDistribution:locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4655 -0.6599 -0.2189 0.6122 2.0111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.7513 1.8187 -0.963 0.3406
## penAnon-mosaic 0.6890 0.5199 1.325 0.1916
## mtrnon-mosaic 0.1762 0.2809 0.627 0.5336
## sexDistribution 5.3745 2.0037 2.682 0.0101 *
## locationsNorway 4.3380 2.0249 2.142 0.0375 *
## sexDistribution:locationsNorway -5.2701 2.3654 -2.228 0.0308 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8981 on 46 degrees of freedom
## Multiple R-squared: 0.1957, Adjusted R-squared: 0.1083
## F-statistic: 2.239 on 5 and 46 DF, p-value: 0.06627
##
## Call:
## lm(formula = log(GrowthRate) ~ penA + mtr + sexDistribution +
## locations + sexDistribution:locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2243 -0.4602 0.1175 0.5215 1.7666
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.71123 1.69890 0.419 0.678
## penAnon-mosaic 0.49379 0.50734 0.973 0.336
## mtrnon-mosaic -0.02391 0.27523 -0.087 0.931
## sexDistribution -2.05756 1.85519 -1.109 0.274
## locationsNorway -1.54177 1.89170 -0.815 0.420
## sexDistribution:locationsNorway 1.37624 2.20957 0.623 0.537
##
## Residual standard error: 0.827 on 39 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.1193, Adjusted R-squared: 0.006352
## F-statistic: 1.056 on 5 and 39 DF, p-value: 0.3992
##
## Call:
## lm(formula = log(tlSize) ~ penA + mtr + sexDistribution + locations +
## sexDistribution:locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4750 -0.5945 -0.1651 0.3951 2.6713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.4158 1.1171 3.058 0.00358 **
## penAnon-mosaic 0.7265 0.5875 1.237 0.22202
## mtrnon-mosaic 0.3886 0.3899 0.997 0.32372
## sexDistribution -0.2736 1.2920 -0.212 0.83318
## locationsUSA -2.2514 1.4461 -1.557 0.12581
## sexDistribution:locationsUSA 2.1109 1.7482 1.207 0.23293
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.034 on 50 degrees of freedom
## Multiple R-squared: 0.1302, Adjusted R-squared: 0.04324
## F-statistic: 1.497 on 5 and 50 DF, p-value: 0.2076
##
## Call:
## lm(formula = log(GrowthRate) ~ -1 + penA + mtr + sexDistribution +
## locations + sexDistribution:locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.22920 -0.74025 0.07409 0.81275 1.89249
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## penAmosaic 0.53877 2.14169 0.252 0.803
## penAnon-mosaic 0.04357 1.78836 0.024 0.981
## mtrnon-mosaic -0.36273 0.44516 -0.815 0.421
## sexDistribution -2.14690 1.96289 -1.094 0.282
## locationsUSA -2.33817 2.04238 -1.145 0.261
## sexDistribution:locationsUSA 2.65816 2.36280 1.125 0.269
##
## Residual standard error: 1.081 on 32 degrees of freedom
## (18 observations deleted due to missingness)
## Multiple R-squared: 0.8274, Adjusted R-squared: 0.795
## F-statistic: 25.56 on 6 and 32 DF, p-value: 6.677e-11
##
## Call:
## lm(formula = log(tlSize) ~ sexDistribution + locations + sexDistribution:locations,
## data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7541 -0.6862 -0.1228 0.4973 2.8595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0463 1.0015 4.040 0.000177 ***
## sexDistribution 0.1057 1.1957 0.088 0.929903
## locationsUSA -1.7617 1.3708 -1.285 0.204410
## sexDistribution:locationsUSA 1.6449 1.6704 0.985 0.329322
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.037 on 52 degrees of freedom
## Multiple R-squared: 0.09025, Adjusted R-squared: 0.03777
## F-statistic: 1.72 on 3 and 52 DF, p-value: 0.1744