Research question

In this analysis, we aim to investigate the relationship between the growth and size of transmission lineages in the time period of 2000-2017 and various explanatory variables. These variables include the sex distribution within the clade, the presence of mosaic variants of the mtrD gene, the presence of the mosaic variants penA gene, and the final size of the transmission lineage.

The data for this analysis includes all transmission lineages containing a minimum of 10 sequences from each of the locations of Norway, Australia, and the United States. Using the skygrowth v.0.3.1 R-package, we analyze the clades and extract the growth rate of the effective population size within the specified time interval of 2000-2017.

Transmission lineages in Norway

Size of transmission lineages containing more than 10 sequences:

##  [1] 185 121  90  85  70  57  56  51  27  23  20  18  17  17  14  14  13  12  11
## [20]  11  11  11  11

Number of transmission containing more than 10 sequences:

## [1] 23

With estimated arrival times (midpoint of the ancestral branch of the transmission lineage) in Norway:

##  [1] 1987.737 2003.177 2007.786 1981.407 2008.687 1945.542 1965.954 2005.998
##  [9] 1942.936 2005.313 2008.090 2002.812 2015.061 2015.741 2008.050 1995.163
## [17] 2006.754 2005.377 1996.370 2000.257 2009.179 1979.664 2011.285

From skygrowth we find that the average growth rate of these effective population size of these lineage on the time interval 2010-2017 is:

##  [1]  0.161981919  0.370304650  1.070982756  0.350432551  2.829087867
##  [6]  0.009947276  0.108618585  0.554324178 -0.067974605  1.120258747
## [11]  0.798340297  0.379926199          NaN          NaN  0.844755871
## [16]  0.086119692  0.098421706  0.144338806 -0.064004236           NA
## [21]  0.754498999 -0.033264093  0.422261382

The fourth to the last value produces the value “NA”, because this transmission lineage does not exist on the time interval [2010, 2017].

Similarly, for Australia and USA we find:

Transmission lineages in Australia

Size of transmission lineages containing more than 10 sequences:

##  [1] 254 195 190 165 125 121  88  81  49  49  46  44  43  41  35  24  20  20  18
## [20]  18  16  15  15  15  14  13  12  12  12  11

Number of transmission containing more than 10 sequences:

## [1] 30
## [1] Estimated arrival time in Australia
##  [1] 2009.832 2006.724 2012.104 2000.280 1985.136 2009.195 1999.664 1997.703
##  [9] 2002.988 1993.818 1912.241 1980.317 2012.335 2011.512 2003.826 2013.257
## [17] 2010.635 1974.381 2010.632 2007.702 2004.582 2000.387 2009.536 2005.816
## [25] 2010.210 2011.727 2001.528 2012.804 2005.632 2014.190
## [1] Average growth rate in 2010-2017:
##  [1]  3.14592042  1.84739055  2.09423350  0.61522329  0.26073401  1.30711009
##  [7]  0.22314530  0.24659985  0.99893470  0.23400796 -0.05415300  0.03360193
## [13]  1.41119636  1.49353044  0.63992922  2.26325653  0.78739922  3.01859170
## [19]  0.60333356  0.34408087  0.12936869  0.08941349  0.69528840  0.37681987
## [25]  4.50239164  1.92325452  2.57573431  1.60502161  1.10619019  2.08906976

Transmission lineages in USA

Size of transmission lineages containing more than 10 sequences:

##  [1] 446 168 157 127 104  81  62  52  51  45  44  43  41  40  38  38  33  30  29
## [20]  29  27  27  25  25  22  22  19  18  16  15  14  14  13  13  11

Number of transmission containing more than 10 sequences:

## [1] 35

Estimated arrival time in USA

##  [1] 1923.403 1900.930 1967.540 1917.941 1902.333 1997.361 1916.062 1890.556
##  [9] 1923.501 1864.753 1971.060 1909.338 1985.211 1978.712 1939.994 1973.433
## [17] 2001.340 1974.247 1992.234 2001.689 1886.491 2003.135 1945.823 1974.156
## [25] 1966.038 1986.422 1963.833 2008.835 1890.400 1996.932 2002.723 2007.754
## [33] 1982.887 1991.598 2008.687

Average growth rate in 2010-2017:

##  [1]  0.017932017  0.033258929  0.097395508  0.036742079 -0.035354819
##  [6]  0.291761073  0.047641387 -0.004619964  0.103667329  0.051802962
## [11]  0.009236953 -0.061109457  0.297533993  0.163869871 -0.042778020
## [16] -0.037154484  0.238011698 -0.012359792           NA  0.415485851
## [21]  0.010073001  0.302153495  0.004367296  0.232432700  0.035922997
## [26]  0.114836913           NA  0.971363763           NA  0.068930162
## [31]  0.182207973  0.408496420  0.076920677 -0.022076982  0.241367396

Transmission lineages in Europe

Size of transmission lineages containing more than 10 sequences:

##  [1] 1084  525  294  265  212  168  111   98   97   84   69   67   65   64   63
## [16]   55   45   29   24   22   21   17   14   12   12   11   11   11   11   11

Number of transmission containing more than 10 sequences:

## [1] 30

Estimated arrival time in Europe:

##  [1] 1750.328 1909.398 1974.636 1976.957 1994.941 1991.804 1936.948 2007.786
##  [9] 1974.315 1965.954 1987.639 1912.241 1985.873 1994.671 2006.973 2009.338
## [17] 1955.322 2003.033 1951.718 1957.023 1924.918 2015.741 1999.409 1993.150
## [25] 2005.377 2000.257 1977.074 2005.497 2003.715 2011.285

Average growth rate in 2010-2017:

##  [1] -0.058565227 -0.031248837 -0.031476623  0.131462499  0.312550634
##  [6]  0.111307347  0.135660582  1.063180868  0.119435711  0.143631998
## [11]  0.086000922 -0.004122313 -0.074377383  0.345473148  0.845365055
## [16]  0.599537574 -0.041936884  0.016964461  0.082561180 -0.091813924
## [21] -0.154210474          NaN  0.146646135           NA  0.144339028
## [26]           NA           NA  0.213350689           NA  0.422261382

Defining Metadata

To study the relation of the growth rates to metadata we collect the growth rates in a data frame along with variables that describe the sex distribution, penA status and mtrD status for each transmission lineage.

The dataset now looks like:

##    GrowthRate locations tlSize       penA        mtr sexDistribution
## 1 0.161981919    Norway    185 non-mosaic non-mosaic       0.9944134
## 2 0.370304650    Norway    121 non-mosaic non-mosaic       0.9666667
## 3 1.070982756    Norway     90 non-mosaic non-mosaic       0.5795455
## 4 0.350432551    Norway     85 non-mosaic non-mosaic       0.9759036
## 5 2.829087867    Norway     70 non-mosaic    mosaic4       0.9565217
## 6 0.009947276    Norway     57 non-mosaic non-mosaic       0.8596491
##   lineage_age
## 1    31.21321
## 2    15.76705
## 3    11.15803
## 4    37.50995
## 5     9.79867
## 6    73.27176

With the structures:

## 'data.frame':    118 obs. of  7 variables:
##  $ GrowthRate     : num  0.162 0.37 1.071 0.35 2.829 ...
##  $ locations      : chr  "Norway" "Norway" "Norway" "Norway" ...
##  $ tlSize         : num  185 121 90 85 70 57 56 51 27 23 ...
##  $ penA           : chr  "non-mosaic" "non-mosaic" "non-mosaic" "non-mosaic" ...
##  $ mtr            : chr  "non-mosaic" "non-mosaic" "non-mosaic" "non-mosaic" ...
##  $ sexDistribution: num  0.994 0.967 0.58 0.976 0.957 ...
##  $ lineage_age    : num  31.2 15.8 11.2 37.5 9.8 ...

The relation of the growth rates and sizes of the transmission lineages to metadata

Here we investigate the growth rate and size of the transmission lineages, and the relation of these to the explanatory variables: penA, mtrD, sex distribution and location.

Growth rate

Modeling the drivers of lineage growth rate

We do all the analyses individually for Norway/Australia, and Europe/USA. For the regression analyses we first consider epifactors alone: location and sex distribution (fraction M/(M+F)) and the interaction of these. We perform the comparisons for Norway versus Australia and Europe versus the USA - since these geographical regions are of more similar sizes.

The effect of location

Norway and Australia

## 
## Call:
## lm(formula = GrowthRate ~ locations, data = total_data_small_scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2744 -0.5765 -0.1839  0.3755  3.2822 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.2202     0.1750   6.973 8.09e-09 ***
## locationsNorway  -0.7233     0.2767  -2.614   0.0119 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9585 on 48 degrees of freedom
## Multiple R-squared:  0.1246, Adjusted R-squared:  0.1064 
## F-statistic: 6.832 on 1 and 48 DF,  p-value: 0.01192

Europe and USA

## 
## Call:
## lm(formula = GrowthRate ~ locations, data = total_data_large_scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33149 -0.14057 -0.06315  0.07524  0.88590 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.17728    0.04969   3.568 0.000773 ***
## locationsUSA -0.04520    0.06728  -0.672 0.504631    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2484 on 53 degrees of freedom
## Multiple R-squared:  0.008443,   Adjusted R-squared:  -0.01027 
## F-statistic: 0.4513 on 1 and 53 DF,  p-value: 0.5046

Observed gender distribution in the TLs

Norway and Australia

## 
## Call:
## lm(formula = GrowthRate ~ sexDistribution, data = total_data_small_scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0125 -0.7400 -0.3310  0.4552  3.5493 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)       0.7328     0.8458   0.866    0.391
## sexDistribution   0.2373     0.9980   0.238    0.813
## 
## Residual standard error: 1.024 on 48 degrees of freedom
## Multiple R-squared:  0.001176,   Adjusted R-squared:  -0.01963 
## F-statistic: 0.05653 on 1 and 48 DF,  p-value: 0.8131

Europe and USA

## 
## Call:
## lm(formula = GrowthRate ~ sexDistribution, data = total_data_large_scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29502 -0.15710 -0.07433  0.06330  0.92123 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)      0.11101    0.16500   0.673    0.504
## sexDistribution  0.05214    0.20239   0.258    0.798
## 
## Residual standard error: 0.2493 on 53 degrees of freedom
## Multiple R-squared:  0.001251,   Adjusted R-squared:  -0.01759 
## F-statistic: 0.06636 on 1 and 53 DF,  p-value: 0.7977

The interaction of gender distribution and location

Norway and Australia

## 
## Call:
## lm(formula = GrowthRate ~ sexDistribution + locations + sexDistribution * 
##     locations, data = total_data_small_scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1687 -0.6398 -0.2153  0.4464  3.4764 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                        3.232      1.825   1.771   0.0832 .
## sexDistribution                   -2.375      2.145  -1.107   0.2738  
## locationsNorway                   -3.195      2.028  -1.575   0.1221  
## sexDistribution:locationsNorway    2.938      2.388   1.230   0.2249  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9634 on 46 degrees of freedom
## Multiple R-squared:  0.1525, Adjusted R-squared:  0.09722 
## F-statistic: 2.759 on 3 and 46 DF,  p-value: 0.05284

Europe and USA

## 
## Call:
## lm(formula = GrowthRate ~ sexDistribution + locations + sexDistribution * 
##     locations, data = total_data_large_scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32355 -0.14188 -0.06385  0.08162  0.89312 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   0.15055    0.24813   0.607    0.547
## sexDistribution               0.03288    0.29878   0.110    0.913
## locationsUSA                 -0.05703    0.33736  -0.169    0.866
## sexDistribution:locationsUSA  0.01620    0.41282   0.039    0.969
## 
## Residual standard error: 0.2532 on 51 degrees of freedom
## Multiple R-squared:  0.009255,   Adjusted R-squared:  -0.04902 
## F-statistic: 0.1588 on 3 and 51 DF,  p-value: 0.9235

The effects of penA and mtrD

Next we look at the effects of mtr and penA variants. Since we believe location and gender distribution to be important in the spread of gonorrhoeae, we keep these in the model when studying the effects of penA and mtrD

penA

Norway and Australia

## 
## Call:
## lm(formula = GrowthRate ~ penA + sexDistribution + locations, 
##     data = total_data_small_scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2851 -0.6263 -0.1531  0.3459  3.3069 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       1.2829     0.8106   1.583   0.1203  
## penAnon-mosaic    0.7975     0.5027   1.587   0.1195  
## sexDistribution  -0.9531     1.1084  -0.860   0.3943  
## locationsNorway  -0.6450     0.2810  -2.296   0.0263 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9534 on 46 degrees of freedom
## Multiple R-squared:   0.17,  Adjusted R-squared:  0.1159 
## F-statistic: 3.141 on 3 and 46 DF,  p-value: 0.03412

Europe and USA

## 
## Call:
## lm(formula = GrowthRate ~ penA + sexDistribution + locations, 
##     data = total_data_large_scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34333 -0.12937 -0.06678  0.09928  0.87397 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)      0.079729   0.192067   0.415    0.680
## penAnon-mosaic   0.107020   0.135982   0.787    0.435
## sexDistribution  0.004147   0.210326   0.020    0.984
## locationsUSA    -0.054358   0.069609  -0.781    0.438
## 
## Residual standard error: 0.2516 on 51 degrees of freedom
## Multiple R-squared:  0.02111,    Adjusted R-squared:  -0.03647 
## F-statistic: 0.3667 on 3 and 51 DF,  p-value: 0.7773

mtrD

Norway and Australia

## 
## Call:
## lm(formula = GrowthRate ~ mtr + sexDistribution + locations, 
##     data = total_data_small_scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2256 -0.5676 -0.2066  0.4196  3.3334 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      1.38297    0.88668   1.560   0.1257  
## mtrnon-mosaic   -0.15270    0.30367  -0.503   0.6175  
## sexDistribution -0.06597    0.96394  -0.068   0.9457  
## locationsNorway -0.72520    0.28330  -2.560   0.0138 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9764 on 46 degrees of freedom
## Multiple R-squared:  0.1294, Adjusted R-squared:  0.07261 
## F-statistic: 2.279 on 3 and 46 DF,  p-value: 0.09199

Europe and USA

## 
## Call:
## lm(formula = GrowthRate ~ mtr + sexDistribution + locations, 
##     data = total_data_large_scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30930 -0.13424 -0.07242  0.05939  0.90757 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)      0.22337    0.19813   1.127    0.265
## mtrnon-mosaic   -0.08199    0.09763  -0.840    0.405
## sexDistribution  0.02399    0.20581   0.117    0.908
## locationsUSA    -0.03634    0.06893  -0.527    0.600
## 
## Residual standard error: 0.2514 on 51 degrees of freedom
## Multiple R-squared:  0.02274,    Adjusted R-squared:  -0.03475 
## F-statistic: 0.3956 on 3 and 51 DF,  p-value: 0.7567

Full model with location, gender distribution, mtr and penA

Norway and Australia

## 
## Call:
## lm(formula = GrowthRate ~ mtr + penA + sexDistribution + locations, 
##     data = total_data_small_scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2678 -0.6456 -0.1427  0.3595  3.3242 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      1.33711    0.87555   1.527   0.1337  
## mtrnon-mosaic   -0.05384    0.30688  -0.175   0.8615  
## penAnon-mosaic   0.77788    0.52027   1.495   0.1419  
## sexDistribution -0.95089    1.12034  -0.849   0.4005  
## locationsNorway -0.64758    0.28435  -2.277   0.0276 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9636 on 45 degrees of freedom
## Multiple R-squared:  0.1706, Adjusted R-squared:  0.09687 
## F-statistic: 2.314 on 4 and 45 DF,  p-value: 0.07194

Europe and USA

## 
## Call:
## lm(formula = GrowthRate ~ mtr + penA + sexDistribution + locations, 
##     data = total_data_large_scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32979 -0.12725 -0.06582  0.07597  0.88775 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)      0.15886    0.22027   0.721    0.474
## mtrnon-mosaic   -0.07361    0.09890  -0.744    0.460
## penAnon-mosaic   0.09436    0.13764   0.686    0.496
## sexDistribution -0.00705    0.21179  -0.033    0.974
## locationsUSA    -0.04621    0.07077  -0.653    0.517
## 
## Residual standard error: 0.2527 on 50 degrees of freedom
## Multiple R-squared:  0.03184,    Adjusted R-squared:  -0.04561 
## F-statistic: 0.4111 on 4 and 50 DF,  p-value: 0.7998