Transmission lineage statistics

Overview

Here we provide an summary of some of the important statistics for the transmission lineages. From the transmission lineage explorer it is evident that in all locations: Europe, the USA, Norway and Australia, some transmission lineages account for a disproportionate amount of the sequences observed in the locations. Yet in all locations most transmission lineages are small. Here we compare the different locations in terms of transmission lineage size distributions.

It should kept in mind that the size of transmission lineages can be similar, even though they grown in very different ways. For example: a very old, slow spreading, transmission lineage may have a similar size to a a young fast spreading

We consider the research questions (RQs):

How many % of local cases does the 10 largest lineages explain, and how many are explained by singletons?
How is the lineage sizes distributed? Power law distributions?

RQ 1: How many % of local cases does the 10 largest lineages explain, and how many are explained by singletons?


summarize_lineages = function(Result) {
  # The ten largest lineages account for this % of all observations
  c1 = round(sum(Result$Lineage_sizes[1:10])/sum(Result$Lineage_sizes)*100,4)
  # The % of lineages that are singletons
  c2 = round(sum(Result$Lineage_sizes[Result$Lineage_sizes==1])/sum(Result$Lineage_sizes)*100,4)
  return(c(c1,c2))
}

cNOR = summarize_lineages(Result_NOR)
cAUS = summarize_lineages(Result_AUS)
cEUR = summarize_lineages(Result_EUR)
cUSA = summarize_lineages(Result_USA)
mat1 = rbind(cNOR, cAUS, cEUR, cUSA)
colnames(mat1)=c("Percentage of cases explained by 10 largest TL", "Percentage of cases explained by singletons")
rownames(mat1)=c("Norway", "Australia", "Europe","USA")
knitr::kable(mat1)

	Percentage of cases explained by 10 largest TL	Percentage of cases explained by singletons
Norway	44.3735	15.7193
Australia	59.7821	7.4898
Europe	71.6061	4.8014
USA	54.5110	6.5346

RQ 2: How is the lineage sizes distributed? Power law distributions?

First we plot the lineage size distributions on normal and log-log scale for each location.

Next we fit a powerlaw distribution using the fit_power_law function from the igraph R-package. We use the Kolmogorov-Smirnov included in the package test calculate the p-value indicating if the data if significantly different what we expect under a power law distribution with the estimated coefficient.

powerNOR = fit_power_law(x = Result_NOR$Lineage_sizes, xmin=1)
powerAUS = fit_power_law(x = Result_AUS$Lineage_sizes, xmin=1)
powerEUR = fit_power_law(x = Result_EUR$Lineage_sizes, xmin=1)
powerUSA = fit_power_law(x = Result_USA$Lineage_sizes, xmin=1)
powerMatrix = rbind(c(powerNOR$alpha,paste(powerNOR$KS.p<0.05)),
                    c(powerAUS$alpha,paste(powerAUS$KS.p<0.05)),
                    c(powerEUR$alpha,paste(powerEUR$KS.p<0.05)),
                    c(powerUSA$alpha,paste(powerUSA$KS.p<0.05)))
colnames(powerMatrix)=c("Power law coefficient","Reject power law distribution")
rownames(powerMatrix)=c("Norway", "Australia", "Europe","USA")
knitr::kable(powerMatrix)

	Power law coefficient	Reject power law distribution
Norway	2.03033515380972	FALSE
Australia	1.85703596214718	FALSE
Europe	1.87619239534857	FALSE
USA	1.79110495883469	FALSE

Lastly we compare the fitted power-law distributions to observed lineage size distributions by drawing the same number of transmission lineage sizes from the power law distributions 1000, and comparing the percentiles.