Here we provide an summary of some of the important statistics for the transmission lineages. From the transmission lineage explorer it is evident that in all locations: Europe, the USA, Norway and Australia, some transmission lineages account for a disproportionate amount of the sequences observed in the locations. Yet in all locations most transmission lineages are small. Here we compare the different locations in terms of transmission lineage size distributions.
It should kept in mind that the size of transmission lineages can be similar, even though they grown in very different ways. For example: a very old, slow spreading, transmission lineage may have a similar size to a a young fast spreading
We consider the research questions (RQs):
summarize_lineages = function(Result) {
# The ten largest lineages account for this % of all observations
c1 = round(sum(Result$Lineage_sizes[1:10])/sum(Result$Lineage_sizes)*100,4)
# The % of lineages that are singletons
c2 = round(sum(Result$Lineage_sizes[Result$Lineage_sizes==1])/sum(Result$Lineage_sizes)*100,4)
return(c(c1,c2))
}
cNOR = summarize_lineages(Result_NOR)
cAUS = summarize_lineages(Result_AUS)
cEUR = summarize_lineages(Result_EUR)
cUSA = summarize_lineages(Result_USA)
mat1 = rbind(cNOR, cAUS, cEUR, cUSA)
colnames(mat1)=c("Percentage of cases explained by 10 largest TL", "Percentage of cases explained by singletons")
rownames(mat1)=c("Norway", "Australia", "Europe","USA")
knitr::kable(mat1)
Percentage of cases explained by 10 largest TL | Percentage of cases explained by singletons | |
---|---|---|
Norway | 44.3735 | 15.7193 |
Australia | 59.7821 | 7.4898 |
Europe | 71.6061 | 4.8014 |
USA | 54.5110 | 6.5346 |
First we plot the lineage size distributions on normal and log-log scale for each location.
Next we fit a powerlaw distribution using the fit_power_law function from the igraph R-package. We use the Kolmogorov-Smirnov included in the package test calculate the p-value indicating if the data if significantly different what we expect under a power law distribution with the estimated coefficient.
powerNOR = fit_power_law(x = Result_NOR$Lineage_sizes, xmin=1)
powerAUS = fit_power_law(x = Result_AUS$Lineage_sizes, xmin=1)
powerEUR = fit_power_law(x = Result_EUR$Lineage_sizes, xmin=1)
powerUSA = fit_power_law(x = Result_USA$Lineage_sizes, xmin=1)
powerMatrix = rbind(c(powerNOR$alpha,paste(powerNOR$KS.p<0.05)),
c(powerAUS$alpha,paste(powerAUS$KS.p<0.05)),
c(powerEUR$alpha,paste(powerEUR$KS.p<0.05)),
c(powerUSA$alpha,paste(powerUSA$KS.p<0.05)))
colnames(powerMatrix)=c("Power law coefficient","Reject power law distribution")
rownames(powerMatrix)=c("Norway", "Australia", "Europe","USA")
knitr::kable(powerMatrix)
Power law coefficient | Reject power law distribution | |
---|---|---|
Norway | 2.03033515380972 | FALSE |
Australia | 1.85703596214718 | FALSE |
Europe | 1.87619239534857 | FALSE |
USA | 1.79110495883469 | FALSE |
Lastly we compare the fitted power-law distributions to observed lineage size distributions by drawing the same number of transmission lineage sizes from the power law distributions 1000, and comparing the percentiles.