Overview

Here we provide an summary of some of the important statistics for the transmission lineages. From the transmission lineage explorer it is evident that in all locations: Europe, the USA, Norway and Australia, some transmission lineages account for a disproportionate amount of the sequences observed in the locations. Yet in all locations most transmission lineages are small. Here we compare the different locations in terms of transmission lineage size distributions.

It should kept in mind that the size of transmission lineages can be similar, even though they grown in very different ways. For example: a very old, slow spreading, transmission lineage may have a similar size to a a young fast spreading

We consider the research questions (RQs):

  1. How many % of local cases does the 10 largest lineages explain, and how many are explained by singletons?
  2. How is the lineage sizes distributed? Power law distributions?

RQ 1: How many % of local cases does the 10 largest lineages explain, and how many are explained by singletons?


summarize_lineages = function(Result) {
  # The ten largest lineages account for this % of all observations
  c1 = round(sum(Result$Lineage_sizes[1:10])/sum(Result$Lineage_sizes)*100,4)
  # The % of lineages that are singletons
  c2 = round(sum(Result$Lineage_sizes[Result$Lineage_sizes==1])/sum(Result$Lineage_sizes)*100,4)
  return(c(c1,c2))
}

cNOR = summarize_lineages(Result_NOR)
cAUS = summarize_lineages(Result_AUS)
cEUR = summarize_lineages(Result_EUR)
cUSA = summarize_lineages(Result_USA)
mat1 = rbind(cNOR, cAUS, cEUR, cUSA)
colnames(mat1)=c("Percentage of cases explained by 10 largest TL", "Percentage of cases explained by singletons")
rownames(mat1)=c("Norway", "Australia", "Europe","USA")
knitr::kable(mat1)
Percentage of cases explained by 10 largest TL Percentage of cases explained by singletons
Norway 44.3735 15.7193
Australia 59.7821 7.4898
Europe 71.6061 4.8014
USA 54.5110 6.5346

RQ 2: How is the lineage sizes distributed? Power law distributions?

First we plot the lineage size distributions on normal and log-log scale for each location.

Next we fit a powerlaw distribution using the fit_power_law function from the igraph R-package. We use the Kolmogorov-Smirnov included in the package test calculate the p-value indicating if the data if significantly different what we expect under a power law distribution with the estimated coefficient.

powerNOR = fit_power_law(x = Result_NOR$Lineage_sizes, xmin=1)
powerAUS = fit_power_law(x = Result_AUS$Lineage_sizes, xmin=1)
powerEUR = fit_power_law(x = Result_EUR$Lineage_sizes, xmin=1)
powerUSA = fit_power_law(x = Result_USA$Lineage_sizes, xmin=1)
powerMatrix = rbind(c(powerNOR$alpha,paste(powerNOR$KS.p<0.05)),
                    c(powerAUS$alpha,paste(powerAUS$KS.p<0.05)),
                    c(powerEUR$alpha,paste(powerEUR$KS.p<0.05)),
                    c(powerUSA$alpha,paste(powerUSA$KS.p<0.05)))
colnames(powerMatrix)=c("Power law coefficient","Reject power law distribution")
rownames(powerMatrix)=c("Norway", "Australia", "Europe","USA")
knitr::kable(powerMatrix)
Power law coefficient Reject power law distribution
Norway 2.03033515380972 FALSE
Australia 1.85703596214718 FALSE
Europe 1.87619239534857 FALSE
USA 1.79110495883469 FALSE

Lastly we compare the fitted power-law distributions to observed lineage size distributions by drawing the same number of transmission lineage sizes from the power law distributions 1000, and comparing the percentiles.