Model parameters

adap is the only variable exported from MKtest module. It is a Mutable structure contaning the variables required to solve the analytical approach. Any value can be easly changed. Remember adap should be change before the execution, in other case, $\alpha_{(x)}$ will be solve with the default values. To change all the values at once, you can use MKtest.parameters in order to set specific models in a new variable.

MKtest.parameters — Type

Mutable structure containing the variables required to solve the analytical approach. All the functions are solve using the internal values of the structure. You should declare a mutable structure to the perform the analytical estimations.

Parameters

gam_flanking::Int64: Selection coefficient for deleterious alleles
gL::Int64: Selection coefficient for weakly benefical alleles
gH::Int64: Selection coefficient for strongly benefical alleles
al_low::Float64: Proportion of α due to weak selection
al_tot::Float64: α
θ_flanking::Float64: Mutation rate defining BGS strength
θ_coding::Float64: Mutation rate on coding region
al::Float64: DFE shape parameter
be::Float64: DFE scale parameter
B::Float64: BGS strength
B_bins::Array{Float64,1}: BGS values to simulate
ppos_l::Float64: Fixation probabily of weakly beneficial alleles
ppos_h::Float64: Fixation probabily of strongly beneficial alleles
N::Int64: Population size
n::Int64: Sample size
Lf::Int64: Flanking region length
ρ::Float64: Recombination rate
TE::Float64

MKtest.analytical_alpha — Function

analytical_alpha(param)

Analytical α(x) estimation. Solve α(x) generally. We used the expected fixation rates and SFSto approach the asympotic value accouting for background selection, weakly and strong positive selection. α(x) can be estimated taking into account the role of positive selected alleles or not. In this way we explore the role of linkage to deleterious alleles in the coding region.

\[\mathbb{E}[\alpha_{x}] = 1 - \left(\frac{\mathbb{E}[D_{s}]}{\mathbb{E}[D_{N}]}\frac{\mathbb{E}[P_{N}]}{\mathbb{E}[P_{S}]}\right)\]

Arguments

param::parameters

Returns

Tuple{Vector{Float64},Vector{Float64}}: α(x) accounting for weak adaptation, α(x) non-accouting for weak adaptation.

MKtest.Br — Function

Br(Lmax,theta)

Expected reduction in nucleotide diversity. Explored at Charlesworth B., 1994:

\[\frac{\pi}{\pi_{0}} = e^{\frac{4\mu L}{2rL+t}}\]

Arguments

param::parameters
theta::Float64

Returns

Float64: expected reduction in diversity given a non-coding length, mutation rate and defined recombination.

Missing docstring.

Missing docstring for MKtest.set_θ. Check Documenter's build log for details.

MKtest.binom_op — Function

binom_op(param)

Binomial convolution to sample the allele frequencies probabilites depending on background selection values and sample size.

Arguments

param::parameters
B_bins::Vector{Float64}

Returns

Array{Float64,2}: convoluted SFS for each B value defined in the model (B_bins). The estimations are saved at convolutedBn.bn.

MKtest.Φ — Function

Φ(param,s)

Reduction in fixation probabilty due to background selection and linkage. The formulas used have been subjected to several theoretical works (Charlesworth B., 1994, Hudson et al., 1995, Nordborg et al. 1995, Barton NH., 1995).

The fixation probabilty of selected alleles are reduce by a factor $\phi$:

\[\phi(t,s) = e^{[\frac{-2\mu}{t(1+\frac{rL}{t}+\frac{2s}{t})}]}\]

Multiplying across all deleterious linkes sites, we find:

\[\Phi = \prod_{1}^{L} = \phi(t,s) = e^{\frac{-2t\mu(\psi[1,\frac{r+2s+L}{r}] - \psi[1,\frac{r(L+1)+2s+t}{r}])}{r^2}}\]

\[\phi(t,s) = e^{\frac{-2t\mu(\psi[1,\frac{r+2s+L}{r}] - \psi[1,\frac{r(L+1)+2s+t}{r}])}{r^2}}\]

Arguments

param::parameters: mutable structure containing the variables required to solve the model
gamma::Int64: selection coefficient.

Returns

Float64: expected rate of positive fixations under background selection.

MKtest estimation

Fixations

MKtest.fix_neut — Function

fix_neut(parameters)

Expected neutral fixations rate reduce by B value.

\[\mathbb{E}[D_{s}] = (1 - p_{-} - p_{+}) B \frac{1}{2N}\]

Arguments

param::parameters: mutable structure containing the variables required to solve the model

Returns

Float64: expected rate of neutral fixations.

MKtest.fix_neg — Function

fix_neg(parameters,ppos)

Expected fixation rate from negative DFE.

\[\mathbb{E}[D_{n-}] = p_{-}\left(2^-\alpha\beta^\alpha\left(-\zeta[\alpha,\frac{2+\beta}{2}] + \zeta[\alpha,1/2(2-\frac{1}{N+\beta})]\right)\right)\]

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
ppos::Float64: positive selected alleles probabilty.

Returns

Float64: expected rate of fixations from negative DFE.

MKtest.p_fix — Function

p_fix(parameters,gam)

Expected positive fixation rate.

\[\mathbb{E}[D_{n+}] = p_{+} \cdot B \cdot (1 - e^{(-2s)})\]

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
gam::Int64: positive selection coefficient.

Returns

Float64: expected rate of positive fixation.

MKtest.fix_pos_sim — Function

fix_pos_sim(parameters,gamma,ppos)

Expected positive fixations rate reduced due to the impact of background selection and linkage. The probabilty of fixation of positively selected alleles is reduced by a factor Φ across all deleterious linked sites Analytical.Φ.

\[\mathbb{E}[D_{n+}] = \Phi \cdot \mathbb{E}[D_{n+}]\]

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
gam::Int64: positive selection coefficient.
ppos::Float64: positive selected alleles probabilty.

Returns

Float64: expected rate of positive fixations under background selection

Polymorphism

MKtest.sfs_neut — Function

sfs_neut(param,binom)

Expected rate of neutral allele frequency reduce by background selection. The spectrum depends on the number of individual []

\[\mathbb{E}[Ps_{(x)}] = \sum{x^{*}=x}{x^{*}=1}f_{B}(x)\]

Input

param::parameters: mutable structure containing the model.
SparseMatrixCSC{Float64, Int64}: convoluted SFS given the B value.

Return:

Vector{Float64}: expected rate of neutral alleles frequencies.

Missing docstring.

Missing docstring for MKtest.sfs_pos. Check Documenter's build log for details.

MKtest.sfs_neg — Function

sfs_neg(param,p,binom)

Expected rate of positive selected allele frequency reduce by background selection. Spectrum drawn on a gamma DFE. It depends on the number of individuals.

Arguments

param::parameters: mutable structure containing the model
p::Float64: positive selected alleles probabilty.
binom::SparseMatrixCSC{Float64, Int64}: convoluted SFS given the B value.

Return:

Array{Float64}: expected negative selected alleles frequencies.

Missing docstring.

Missing docstring for MKtest.cumulative_sfs. Check Documenter's build log for details.

MKtest.reduce_sfs — Function

reduce_sfs(sfs_tmp,bins)

Function to reduce the SFS into N bins.

Arguments

sfs_tmp::Vector{Float64}: SFS vector.
bins::Int64: bin size to collapse the SFS.

Output

Vector{Float64}: Binned SFS vector.

Rates

MKtest.rates — Function

rates(param,gH,gL,gam_flanking,gam_dfe,alpha,iterations,output)

Function to solve randomly N scenarios. The function will create N models, defined by parameters(), to solve analytically fixation rates and the expected SFS for each model. The rates will be used to compute summary statistics required at ABC inference. The function output a HDF5 file containing the solved models, the selected DAC and the analytical solutions.

Arguments

param::parameters: mutable structure containing the model.
gH::Array{Int64,1}: Range of strong selection coefficients.
gL::Union{Array{Int64,1},Nothing}: Range of weak selection coefficients.
gam_flanking::Array{Int64,1}: Range of deleterious selection coefficients at the flanking region.
gam_dfe::Array{Int64,1}: Range of deleterious selection coefficients at the coding region.
alpha::Vector{Float64}: Range of α value to solve.
iterations::Int64: Number of solutions.
output_file::String: File to output HDF5 file.

Returns

DataFrame: models solved.
Output_file: HDF5 file containing models solved and rates.

Summary statistics

MKtest.poisson_fixation — Function

poisson_fixation(observed_values,λds, λdn)

Divergence sampling from Poisson distribution. The expected neutral and selected fixations are subset through their relative expected rates (fix_neut, fix_neg_b, fix_pos_sim). Empirical values are used are used to simulate the locus L along a branch of time T from which the expected Ds and Dn raw count estimated given the mutation rate ($\mu$). Random number generation is used to subset samples arbitrarily given the success rate $\lambda$ in the distribution.

\[\mathbb{E}[D_N] = X \in Poisson\left(\lambda = D \times \left[\frac{\mathbb{E}[D_+] + \mathbb{E}[D_-]}{\mathbb{E}[D_+] + \mathbb{E}[D_-] + \mathbb{E}[D_0]}\right]\right)\]

\[\mathbb{E}[D_S] = X \in Poisson\left(\lambda = D \times \left[\frac{\mathbb{E}[D_0]}{\mathbb{E}[D_+] + \mathbb{E}[D_-] + \mathbb{E}[D_0]}\right]\right)\]

Arguments

observed_values::Array: Array containing the total observed divergence.
λds::Float64: expected neutral fixations rate.
λdn::Float64: expected selected fixations rate.

Returns

Array{Int64,1} containing the expected count of neutral and selected fixations.

MKtest.poisson_polymorphism — Function

poisson_polymorphism(observed_values,λps,λpn)

Polymorphism sampling from Poisson distributions. The total expected neutral and selected polimorphism are subset through the relative expected rates at the frequency spectrum (fix_neut, sfs_neut,). Empirical sfs are used to simulate the locus L along a branch of time T from which the expected Ps and Pn raw count are estimated given the mutation rate ($\mu$). Random number generation is used to subset samples arbitrarily from the whole sfs given each frequency success rate $\lambda$ in the distribution.

The success rate managing the Poisson distribution by the observed count each frequency. We considered both sampling variance and process variance is affecting the number of variable alleles we sample from SFS. This variance arises from the random mutation-fixation process along the branch. To incorporate this variance we do one sample per frequency-bin and use the total sampled variation and the SFS for the summary statistics.

\[\mathbb{E}[P_N] = \sum_{x=0}^{x=1} X \in Poisson\left(\lambda = SFS_{(x)} \times \left[\frac{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}]}{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}] + \mathbb{E}[P_{0(x)}]}\right]\right)\]

\[\mathbb{E}[P_S] = \sum_{x=0}^{x=1} X \in Poisson\left(\lambda = SFS_{(x)} \times \left[\frac{\mathbb{E}[P_{0(x)}]}{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}] + \mathbb{E}[P_{0(x)}]}\right]\right)\]

Arguments

observed_values::Array{Int64,1}: Array containing the total observed divergence.
λps::Array{Float64,1}: expected neutral site frequency spectrum rate.
λpn::Array{Float64,1}: expected selected site frequency spectrum rate.

Returns

Array{Int64,2} containing the expected total count of neutral and selected polymorphism.

MKtest.sampling_summaries — Function

sampled_alpha(observed_values,λds, λdn)

Ouput the expected values from the Poisson sampling process. Please check poisson_fixation and poisson_polymorphism to understand the samplingn process. α(x) is estimated through the expected values of Dn, Ds, Pn and Ps.

Arguments

param::parameters: Array containing the total observed divergence.
d::Array: observed divergence.
afs::Array: observed polymorphism.
λdiv::Array{Float64,2}: expected fixations rate.
λdiv::Array{Float64,2}: expected site frequency spectrum rates.

Returns

αsummaries,expdn,expds,exppn,exp_ps,ssAlpha

Array{Int64,2} containing α(x) values.
Array{Int64,1} expected non-synonymous divergence.
Array{Int64,1} expected synonymous divergence.
Array{Int64,1} expected non-synonymous polymorphism.
Array{Int64,1} expected synonymous polymorphism.
Array{Int64,1} expected synonymous polymorphism.
Array{Int64,1} expected synonymous polymorphism.
Array{Int64,2} containing α(x) binned values.
samplingsummaries(gammaL,gammaH,pposl,ppos_h,observedData,nopos)

MKtest.summary_statistics — Function

summary_statistics(param,h5_file,sfs,divergence,output_folder,summstat_size)

Estimate summary statistics using observed data and analytical rates. output_folder will be used to output summary statistics

Arguments

param::parameters : Mutable structure containing the models
h5_file::String : HDF5 containing solved models, fixation and polymorphic rates
sfs::Vector: SFS data parsed from parse_sfs().
divergence::Vector: divergence data from parse_sfs().
output_folder::String : path to save summary statistics and observed data.
summstat_size::Int64 : number of summary statistics to sample.

Inference tools

MKtest.parse_sfs — Function

parse_sfs(param;data,gene_list,sfs_columns,div_columns,l_columns,bins,isolines)

Function to parse polymorphism and divergence by subset of genes. The input data is based on supplementary material described at Uricchio et al. 2019. Please be sure the file is tabulated.

GeneId	Pn	DAF seppareted by commas	Ps	DAF separated by commas	Dn	Ds
XXXX	0	,0,0,0,0,0,0,0,0	0	,0,0,0,0,0,0,0,0	0	0
XXXX	0	,0,0,0,0,0,0,0,0	0	,0,0,0,0,0,0,0,0	0	0
XXXX	0	,0,0,0,0,0,0,0,0	0	,0,0,0,0,0,0,0,0	0	0

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
data::S: path to polymorphic and divergence raw data.
gene_list:S: path to gene list.
sfs_columns::Vector{Int64}: column numbers corresponding to Pn and Ps frequencies. Default values [3,5]
div_columns::Vector{Int64}: column numbers corresponding to Dn and Ds frequencies. Default values [6,7]
l_columns::Vector{Int64}: column numbers corresponding to total number of non-synonymous and synonymous sites. Not require at data. Default values [7,8]
bins::Union{Nothing,Int64}: size to bin the SFS independtly of the sample size.

Returns

Vector{Vector{Float64}}: α_x values. Estimated using the cumulative SFS
Vector{Matrix{Float64}}: Site Frequency Spectrum (non-cumulative)
Vector{Matrix{Int64}}: Synonymous and non-synonymous divergence counts

MKtest.ABCreg — Function

ABCreg(output_folder, S, P, tol, abcreg)

Performing ABC inference using ABCreg. Please, be sure the input folder contain the observed and summary statistics produc ed by MKtest.summary_statistics().

Arguments

output_folder::String : folder containing the observed data and summary estatistics. It will be used to output the posterior distributions
P::Int64 : number of parameters to infer.
S::Int64 : number of summary stastitics to perform the inference.
tol::Float64 : tolerance value. It defines the number of accepted values at ABC inference.
abcreg::String : path to ABCreg binary.

Output

posteriors:Vector{Matrix{Float64}}: posteriors distributions estimated by ABCreg.

MKtest.summary_abc — Function

summary_abc(posteriors,stat)

Posterior distributions statistics. The function estimates Min., 0.5% Perc., Median, Mean, Mode, "99.5% Perc., Max values. The mode is estimated following R package abc mode function. The argument stat define the output estimation.

Arguments

posterior::Matrix{Float64} : posterior distribution.
stat::String: output estimation.

Output

DataFrame: posterior statistics.
DataFrame: chosen statistic inference.

MKtest

MKtest.aMK — Function

aMK(param,α;dac_rm,na_rm)

Function to estimate the asymptotic value of α(x).

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
sfs::Matrix: SFS data parsed from parse_sfs().
divergence::Matrix: divergence data from parse_sfs().
threshold{Float64}: frequency cutoff to perform imputedMK.
cumulative::Bool: Perform α(x) estimation using the cumulative SFS
dac_rm::Bool: Remove any value not selected at param.dac.
na_rm::Bool: Remove any NA value at α(x)

Returns

DataFrame

aMK(param,α;dac_rm,na_rm)

Function to estimate the asymptotic value of α(x).

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
sfs::Vector: SFS data parsed from parse_sfs().
divergence::Vector: divergence data from parse_sfs().
threshold{Float64}: frequency cutoff to perform imputedMK.
cumulative::Bool: Perform α(x) estimation using the cumulative SFS
dac_rm::Bool: Remove any value not selected at param.dac.
na_rm::Bool: Remove any NA value at α(x)

Returns

DataFrame

MKtest.imputedMK — Function

imputedMK(param,sfs,divergence;m,cutoff)

Function to estimate the imputedMK (https://doi.org/10.1093/g3journal/jkac206).

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
sfs::Vector: SFS data parsed from parse_sfs().
divergence::Vector: divergence data from parse_sfs().
threshold{Float64}: frequency cutoff to perform imputedMK.

Returns

DataFrame

imputedMK(param,sfs,divergence;m,cutoff)

Function to estimate the imputedMK (https://doi.org/10.1093/g3journal/jkac206).

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
sfs::Matrix{Float64}: SFS data parsed from parse_sfs().
divergence::Matrix{Int64}: divergence data from parse_sfs().
threshold{Float64}: frequency cutoff to perform imputedMK.

Returns

DataFrame

MKtest.fwwMK — Function

fwwMK(param,sfs,divergence;m,cutoff)

Function to estimate the fwwMK (https://doi.org/10.1038/4151024a).

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
sfs::Vector: SFS data parsed from parse_sfs().
divergence::Vector: divergence data from parse_sfs().
threshold{Float64}: frequency cutoff to perform fwwMK.

Returns

DataFrame

fwwMK(param,sfs,divergence;m,cutoff)

Function to estimate the fwwMK (https://doi.org/10.1038/4151024a).

Arguments

param::parameters: mutable structure containing the variables required to solve the model.
sfs::Matrix: SFS data parsed from parse_sfs().
divergence::Matrix: divergence data from parse_sfs().
threshold{Float64}: frequency cutoff to perform fwwMK.

Returns

DataFrame

MKtest.standardMK — Function

standardMK(sfs,divergence;m)

Function to estimate the original α value.

Arguments

sfs::Matrix{Float64}: SFS array
divergence::Array: divegence count array

Output

DataFrame

standardMK(sfs,divergence;m)

Function to estimate the original α value.

Arguments

sfs::Matrix{Float64}: SFS array
divergence::Array: divegence count array

Output

DataFrame

MKtest.grapes — Function

grapes(sfs,divergence,model)

Run grapes using SFS and divergence data. The function will install grapes using Conda from genomedk channel.

Arguments

sfs::Vector{Matrix{Float64}}: SFS data parsed from parse_sfs()
divergence::Vector{Matrix{Int64}}: divergence data parsed from parse_sfs()
model::String: grapes model (GammaZero, GammaExpo, DisplGamma, ScaledBeta, FGMBesselK and all)
n::Int64: smaller sample size to project the SFS
nearly_neutral::Int64: Sadp threshold of above which a mutation is considered adaptive
FWW_threshold::Float64: minimal allele frequency in FWW α
nb_rand_start::Int64: number of random starting values in model optimization (default=0); setting positive values will slow down the program but decrease the probability of being trapped in local optima.
anc_to_rec_Ne_ratio::Float64: divergence/polymorphism Ne ratio
no_div_data::Bool: only use divergence data to estimate DFE
no_div_param::Bool: implements so-called [-A] version in Galtier (2016); also called a DFE by Tataru et al. (2017), and indicated with * in Rousselle et al. (2018); irrelevant if model = GammaZero; (default=false)
no_syn_orient_error::Bool: force equal synonymous and non-synonymous mis-orientation rate(default=false)
fold::Bool: fold the SFS
fixed_param::String: this option should be used if one does not want to optimize every parameter, but rather have some parameters fixed to predefined values; parameter names and predefined values are passed via a control file; see example at the bottom of this file (default=none).

Output

DataFrame: grapes model estimation.