In this analysis, we aim to investigate the relationship between the growth and size of transmission lineages in the time period of 2000-2017 and various explanatory variables. These variables include the sex distribution within the clade, the presence of mosaic variants of the mtrD gene, the presence of the mosaic variants penA gene, and the final size of the transmission lineage.
The data for this analysis includes all transmission lineages containing a minimum of 10 sequences from each of the locations of Norway, Australia, and the United States. Using the skygrowth v.0.3.1 R-package, we analyze the clades and extract the growth rate of the effective population size within the specified time interval of 2000-2017.
Size of transmission lineages containing more than 10 sequences:
## [1] 185 121 90 85 70 57 56 51 27 23 20 18 17 17 14 14 13 12 11
## [20] 11 11 11 11
Number of transmission containing more than 10 sequences:
## [1] 23
With estimated arrival times (midpoint of the ancestral branch of the transmission lineage) in Norway:
## [1] 1987.737 2003.177 2007.786 1981.407 2008.687 1945.542 1965.954 2005.998
## [9] 1942.936 2005.313 2008.090 2002.812 2015.061 2015.741 2008.050 1995.163
## [17] 2006.754 2005.377 1996.370 2000.257 2009.179 1979.664 2011.285
From skygrowth we find that the average growth rate of these effective population size of these lineage on the time interval 2010-2017 is:
## [1] 0.161981919 0.370304650 1.070982756 0.350432551 2.829087867
## [6] 0.009947276 0.108618585 0.554324178 -0.067974605 1.120258747
## [11] 0.798340297 0.379926199 NaN NaN 0.844755871
## [16] 0.086119692 0.098421706 0.144338806 -0.064004236 NA
## [21] 0.754498999 -0.033264093 0.422261382
The fourth to the last value produces the value “NA”, because this transmission lineage does not exist on the time interval [2010, 2017].
Similarly, for Australia and USA we find:
Size of transmission lineages containing more than 10 sequences:
## [1] 254 195 190 165 125 121 88 81 49 49 46 44 43 41 35 24 20 20 18
## [20] 18 16 15 15 15 14 13 12 12 12 11
Number of transmission containing more than 10 sequences:
## [1] 30
## [1] Estimated arrival time in Australia
## [1] 2009.832 2006.724 2012.104 2000.280 1985.136 2009.195 1999.664 1997.703
## [9] 2002.988 1993.818 1912.241 1980.317 2012.335 2011.512 2003.826 2013.257
## [17] 2010.635 1974.381 2010.632 2007.702 2004.582 2000.387 2009.536 2005.816
## [25] 2010.210 2011.727 2001.528 2012.804 2005.632 2014.190
## [1] Average growth rate in 2010-2017:
## [1] 3.14592042 1.84739055 2.09423350 0.61522329 0.26073401 1.30711009
## [7] 0.22314530 0.24659985 0.99893470 0.23400796 -0.05415300 0.03360193
## [13] 1.41119636 1.49353044 0.63992922 2.26325653 0.78739922 3.01859170
## [19] 0.60333356 0.34408087 0.12936869 0.08941349 0.69528840 0.37681987
## [25] 4.50239164 1.92325452 2.57573431 1.60502161 1.10619019 2.08906976
Size of transmission lineages containing more than 10 sequences:
## [1] 446 168 157 127 104 81 62 52 51 45 44 43 41 40 38 38 33 30 29
## [20] 29 27 27 25 25 22 22 19 18 16 15 14 14 13 13 11
Number of transmission containing more than 10 sequences:
## [1] 35
Estimated arrival time in USA
## [1] 1923.403 1900.930 1967.540 1917.941 1902.333 1997.361 1916.062 1890.556
## [9] 1923.501 1864.753 1971.060 1909.338 1985.211 1978.712 1939.994 1973.433
## [17] 2001.340 1974.247 1992.234 2001.689 1886.491 2003.135 1945.823 1974.156
## [25] 1966.038 1986.422 1963.833 2008.835 1890.400 1996.932 2002.723 2007.754
## [33] 1982.887 1991.598 2008.687
Average growth rate in 2010-2017:
## [1] 0.017932017 0.033258929 0.097395508 0.036742079 -0.035354819
## [6] 0.291761073 0.047641387 -0.004619964 0.103667329 0.051802962
## [11] 0.009236953 -0.061109457 0.297533993 0.163869871 -0.042778020
## [16] -0.037154484 0.238011698 -0.012359792 NA 0.415485851
## [21] 0.010073001 0.302153495 0.004367296 0.232432700 0.035922997
## [26] 0.114836913 NA 0.971363763 NA 0.068930162
## [31] 0.182207973 0.408496420 0.076920677 -0.022076982 0.241367396
Size of transmission lineages containing more than 10 sequences:
## [1] 1084 525 294 265 212 168 111 98 97 84 69 67 65 64 63
## [16] 55 45 29 24 22 21 17 14 12 12 11 11 11 11 11
Number of transmission containing more than 10 sequences:
## [1] 30
Estimated arrival time in Europe:
## [1] 1750.328 1909.398 1974.636 1976.957 1994.941 1991.804 1936.948 2007.786
## [9] 1974.315 1965.954 1987.639 1912.241 1985.873 1994.671 2006.973 2009.338
## [17] 1955.322 2003.033 1951.718 1957.023 1924.918 2015.741 1999.409 1993.150
## [25] 2005.377 2000.257 1977.074 2005.497 2003.715 2011.285
Average growth rate in 2010-2017:
## [1] -0.058565227 -0.031248837 -0.031476623 0.131462499 0.312550634
## [6] 0.111307347 0.135660582 1.063180868 0.119435711 0.143631998
## [11] 0.086000922 -0.004122313 -0.074377383 0.345473148 0.845365055
## [16] 0.599537574 -0.041936884 0.016964461 0.082561180 -0.091813924
## [21] -0.154210474 NaN 0.146646135 NA 0.144339028
## [26] NA NA 0.213350689 NA 0.422261382
To study the relation of the growth rates to metadata we collect the growth rates in a data frame along with variables that describe the sex distribution, penA status and mtrD status for each transmission lineage.
The dataset now looks like:
## GrowthRate locations tlSize penA mtr sexDistribution
## 1 0.161981919 Norway 185 non-mosaic non-mosaic 0.9944134
## 2 0.370304650 Norway 121 non-mosaic non-mosaic 0.9666667
## 3 1.070982756 Norway 90 non-mosaic non-mosaic 0.5795455
## 4 0.350432551 Norway 85 non-mosaic non-mosaic 0.9759036
## 5 2.829087867 Norway 70 non-mosaic mosaic4 0.9565217
## 6 0.009947276 Norway 57 non-mosaic non-mosaic 0.8596491
## lineage_age
## 1 31.21321
## 2 15.76705
## 3 11.15803
## 4 37.50995
## 5 9.79867
## 6 73.27176
With the structures:
## 'data.frame': 118 obs. of 7 variables:
## $ GrowthRate : num 0.162 0.37 1.071 0.35 2.829 ...
## $ locations : chr "Norway" "Norway" "Norway" "Norway" ...
## $ tlSize : num 185 121 90 85 70 57 56 51 27 23 ...
## $ penA : chr "non-mosaic" "non-mosaic" "non-mosaic" "non-mosaic" ...
## $ mtr : chr "non-mosaic" "non-mosaic" "non-mosaic" "non-mosaic" ...
## $ sexDistribution: num 0.994 0.967 0.58 0.976 0.957 ...
## $ lineage_age : num 31.2 15.8 11.2 37.5 9.8 ...
Here we investigate the growth rate and size of the transmission lineages, and the relation of these to the explanatory variables: penA, mtrD, sex distribution and location.
We do all the analyses individually for Norway/Australia, and Europe/USA. For the regression analyses we first consider epifactors alone: location and sex distribution (fraction M/(M+F)) and the interaction of these. We perform the comparisons for Norway versus Australia and Europe versus the USA - since these geographical regions are of more similar sizes.
##
## Call:
## lm(formula = GrowthRate ~ locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2744 -0.5765 -0.1839 0.3755 3.2822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2202 0.1750 6.973 8.09e-09 ***
## locationsNorway -0.7233 0.2767 -2.614 0.0119 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9585 on 48 degrees of freedom
## Multiple R-squared: 0.1246, Adjusted R-squared: 0.1064
## F-statistic: 6.832 on 1 and 48 DF, p-value: 0.01192
##
## Call:
## lm(formula = GrowthRate ~ locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33149 -0.14057 -0.06315 0.07524 0.88590
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.17728 0.04969 3.568 0.000773 ***
## locationsUSA -0.04520 0.06728 -0.672 0.504631
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2484 on 53 degrees of freedom
## Multiple R-squared: 0.008443, Adjusted R-squared: -0.01027
## F-statistic: 0.4513 on 1 and 53 DF, p-value: 0.5046
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0125 -0.7400 -0.3310 0.4552 3.5493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7328 0.8458 0.866 0.391
## sexDistribution 0.2373 0.9980 0.238 0.813
##
## Residual standard error: 1.024 on 48 degrees of freedom
## Multiple R-squared: 0.001176, Adjusted R-squared: -0.01963
## F-statistic: 0.05653 on 1 and 48 DF, p-value: 0.8131
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29502 -0.15710 -0.07433 0.06330 0.92123
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.11101 0.16500 0.673 0.504
## sexDistribution 0.05214 0.20239 0.258 0.798
##
## Residual standard error: 0.2493 on 53 degrees of freedom
## Multiple R-squared: 0.001251, Adjusted R-squared: -0.01759
## F-statistic: 0.06636 on 1 and 53 DF, p-value: 0.7977
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution + locations + sexDistribution *
## locations, data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1687 -0.6398 -0.2153 0.4464 3.4764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.232 1.825 1.771 0.0832 .
## sexDistribution -2.375 2.145 -1.107 0.2738
## locationsNorway -3.195 2.028 -1.575 0.1221
## sexDistribution:locationsNorway 2.938 2.388 1.230 0.2249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9634 on 46 degrees of freedom
## Multiple R-squared: 0.1525, Adjusted R-squared: 0.09722
## F-statistic: 2.759 on 3 and 46 DF, p-value: 0.05284
##
## Call:
## lm(formula = GrowthRate ~ sexDistribution + locations + sexDistribution *
## locations, data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32355 -0.14188 -0.06385 0.08162 0.89312
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.15055 0.24813 0.607 0.547
## sexDistribution 0.03288 0.29878 0.110 0.913
## locationsUSA -0.05703 0.33736 -0.169 0.866
## sexDistribution:locationsUSA 0.01620 0.41282 0.039 0.969
##
## Residual standard error: 0.2532 on 51 degrees of freedom
## Multiple R-squared: 0.009255, Adjusted R-squared: -0.04902
## F-statistic: 0.1588 on 3 and 51 DF, p-value: 0.9235
Next we look at the effects of mtr and penA variants. Since we believe location and gender distribution to be important in the spread of gonorrhoeae, we keep these in the model when studying the effects of penA and mtrD
##
## Call:
## lm(formula = GrowthRate ~ penA + sexDistribution + locations,
## data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2851 -0.6263 -0.1531 0.3459 3.3069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2829 0.8106 1.583 0.1203
## penAnon-mosaic 0.7975 0.5027 1.587 0.1195
## sexDistribution -0.9531 1.1084 -0.860 0.3943
## locationsNorway -0.6450 0.2810 -2.296 0.0263 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9534 on 46 degrees of freedom
## Multiple R-squared: 0.17, Adjusted R-squared: 0.1159
## F-statistic: 3.141 on 3 and 46 DF, p-value: 0.03412
##
## Call:
## lm(formula = GrowthRate ~ penA + sexDistribution + locations,
## data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34333 -0.12937 -0.06678 0.09928 0.87397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.079729 0.192067 0.415 0.680
## penAnon-mosaic 0.107020 0.135982 0.787 0.435
## sexDistribution 0.004147 0.210326 0.020 0.984
## locationsUSA -0.054358 0.069609 -0.781 0.438
##
## Residual standard error: 0.2516 on 51 degrees of freedom
## Multiple R-squared: 0.02111, Adjusted R-squared: -0.03647
## F-statistic: 0.3667 on 3 and 51 DF, p-value: 0.7773
##
## Call:
## lm(formula = GrowthRate ~ mtr + sexDistribution + locations,
## data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2256 -0.5676 -0.2066 0.4196 3.3334
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.38297 0.88668 1.560 0.1257
## mtrnon-mosaic -0.15270 0.30367 -0.503 0.6175
## sexDistribution -0.06597 0.96394 -0.068 0.9457
## locationsNorway -0.72520 0.28330 -2.560 0.0138 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9764 on 46 degrees of freedom
## Multiple R-squared: 0.1294, Adjusted R-squared: 0.07261
## F-statistic: 2.279 on 3 and 46 DF, p-value: 0.09199
##
## Call:
## lm(formula = GrowthRate ~ mtr + sexDistribution + locations,
## data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30930 -0.13424 -0.07242 0.05939 0.90757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.22337 0.19813 1.127 0.265
## mtrnon-mosaic -0.08199 0.09763 -0.840 0.405
## sexDistribution 0.02399 0.20581 0.117 0.908
## locationsUSA -0.03634 0.06893 -0.527 0.600
##
## Residual standard error: 0.2514 on 51 degrees of freedom
## Multiple R-squared: 0.02274, Adjusted R-squared: -0.03475
## F-statistic: 0.3956 on 3 and 51 DF, p-value: 0.7567
##
## Call:
## lm(formula = GrowthRate ~ mtr + penA + sexDistribution + locations,
## data = total_data_small_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2678 -0.6456 -0.1427 0.3595 3.3242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.33711 0.87555 1.527 0.1337
## mtrnon-mosaic -0.05384 0.30688 -0.175 0.8615
## penAnon-mosaic 0.77788 0.52027 1.495 0.1419
## sexDistribution -0.95089 1.12034 -0.849 0.4005
## locationsNorway -0.64758 0.28435 -2.277 0.0276 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9636 on 45 degrees of freedom
## Multiple R-squared: 0.1706, Adjusted R-squared: 0.09687
## F-statistic: 2.314 on 4 and 45 DF, p-value: 0.07194
##
## Call:
## lm(formula = GrowthRate ~ mtr + penA + sexDistribution + locations,
## data = total_data_large_scale)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32979 -0.12725 -0.06582 0.07597 0.88775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.15886 0.22027 0.721 0.474
## mtrnon-mosaic -0.07361 0.09890 -0.744 0.460
## penAnon-mosaic 0.09436 0.13764 0.686 0.496
## sexDistribution -0.00705 0.21179 -0.033 0.974
## locationsUSA -0.04621 0.07077 -0.653 0.517
##
## Residual standard error: 0.2527 on 50 degrees of freedom
## Multiple R-squared: 0.03184, Adjusted R-squared: -0.04561
## F-statistic: 0.4111 on 4 and 50 DF, p-value: 0.7998