Model parameters

adap is the only variable exported from MKtest module. It is a Mutable structure contaning the variables required to solve the analytical approach. Any value can be easly changed. Remember adap should be change before the execution, in other case, $\alpha_{(x)}$ will be solve with the default values. To change all the values at once, you can use MKtest.parameters in order to set specific models in a new variable.

MKtest.parametersType

Mutable structure containing the variables required to solve the analytical approach. All the functions are solve using the internal values of the structure. You should declare a mutable structure to the perform the analytical estimations.

Parameters

  • gam_flanking::Int64: Selection coefficient for deleterious alleles
  • gL::Int64: Selection coefficient for weakly benefical alleles
  • gH::Int64: Selection coefficient for strongly benefical alleles
  • al_low::Float64: Proportion of α due to weak selection
  • al_tot::Float64: α
  • θ_flanking::Float64: Mutation rate defining BGS strength
  • θ_coding::Float64: Mutation rate on coding region
  • al::Float64: DFE shape parameter
  • be::Float64: DFE scale parameter
  • B::Float64: BGS strength
  • B_bins::Array{Float64,1}: BGS values to simulate
  • ppos_l::Float64: Fixation probabily of weakly beneficial alleles
  • ppos_h::Float64: Fixation probabily of strongly beneficial alleles
  • N::Int64: Population size
  • n::Int64: Sample size
  • Lf::Int64: Flanking region length
  • ρ::Float64: Recombination rate
  • TE::Float64
source
MKtest.analytical_alphaFunction
analytical_alpha(param)

Analytical α(x) estimation. Solve α(x) generally. We used the expected fixation rates and SFSto approach the asympotic value accouting for background selection, weakly and strong positive selection. α(x) can be estimated taking into account the role of positive selected alleles or not. In this way we explore the role of linkage to deleterious alleles in the coding region.

\[\mathbb{E}[\alpha_{x}] = 1 - \left(\frac{\mathbb{E}[D_{s}]}{\mathbb{E}[D_{N}]}\frac{\mathbb{E}[P_{N}]}{\mathbb{E}[P_{S}]}\right)\]

Arguments

  • param::parameters

Returns

  • Tuple{Vector{Float64},Vector{Float64}}: α(x) accounting for weak adaptation, α(x) non-accouting for weak adaptation.
source
MKtest.BrFunction
Br(Lmax,theta)

Expected reduction in nucleotide diversity. Explored at Charlesworth B., 1994:

\[\frac{\pi}{\pi_{0}} = e^{\frac{4\mu L}{2rL+t}}\]

Arguments

  • param::parameters
  • theta::Float64

Returns

  • Float64: expected reduction in diversity given a non-coding length, mutation rate and defined recombination.
source
Missing docstring.

Missing docstring for MKtest.set_θ. Check Documenter's build log for details.

MKtest.binom_opFunction
binom_op(param)

Binomial convolution to sample the allele frequencies probabilites depending on background selection values and sample size.

Arguments

  • param::parameters
  • B_bins::Vector{Float64}

Returns

  • Array{Float64,2}: convoluted SFS for each B value defined in the model (B_bins). The estimations are saved at convolutedBn.bn.
source
MKtest.ΦFunction
Φ(param,s)

Reduction in fixation probabilty due to background selection and linkage. The formulas used have been subjected to several theoretical works (Charlesworth B., 1994, Hudson et al., 1995, Nordborg et al. 1995, Barton NH., 1995).

The fixation probabilty of selected alleles are reduce by a factor $\phi$:

\[\phi(t,s) = e^{[\frac{-2\mu}{t(1+\frac{rL}{t}+\frac{2s}{t})}]}\]

Multiplying across all deleterious linkes sites, we find:

\[\Phi = \prod_{1}^{L} = \phi(t,s) = e^{\frac{-2t\mu(\psi[1,\frac{r+2s+L}{r}] - \psi[1,\frac{r(L+1)+2s+t}{r}])}{r^2}}\]

\[\phi(t,s) = e^{\frac{-2t\mu(\psi[1,\frac{r+2s+L}{r}] - \psi[1,\frac{r(L+1)+2s+t}{r}])}{r^2}}\]

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model
  • gamma::Int64: selection coefficient.

Returns

  • Float64: expected rate of positive fixations under background selection.
source

MKtest estimation

Fixations

MKtest.fix_neutFunction
  • fix_neut(parameters)

Expected neutral fixations rate reduce by B value.

\[\mathbb{E}[D_{s}] = (1 - p_{-} - p_{+}) B \frac{1}{2N}\]

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model

Returns

  • Float64: expected rate of neutral fixations.
source
MKtest.fix_negFunction
fix_neg(parameters,ppos)

Expected fixation rate from negative DFE.

\[\mathbb{E}[D_{n-}] = p_{-}\left(2^-\alpha\beta^\alpha\left(-\zeta[\alpha,\frac{2+\beta}{2}] + \zeta[\alpha,1/2(2-\frac{1}{N+\beta})]\right)\right)\]

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • ppos::Float64: positive selected alleles probabilty.

Returns

  • Float64: expected rate of fixations from negative DFE.
source
MKtest.p_fixFunction
p_fix(parameters,gam)

Expected positive fixation rate.

\[\mathbb{E}[D_{n+}] = p_{+} \cdot B \cdot (1 - e^{(-2s)})\]

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • gam::Int64: positive selection coefficient.

Returns

  • Float64: expected rate of positive fixation.
source
MKtest.fix_pos_simFunction
fix_pos_sim(parameters,gamma,ppos)

Expected positive fixations rate reduced due to the impact of background selection and linkage. The probabilty of fixation of positively selected alleles is reduced by a factor Φ across all deleterious linked sites Analytical.Φ.

\[\mathbb{E}[D_{n+}] = \Phi \cdot \mathbb{E}[D_{n+}]\]

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • gam::Int64: positive selection coefficient.
  • ppos::Float64: positive selected alleles probabilty.

Returns

  • Float64: expected rate of positive fixations under background selection
source

Polymorphism

MKtest.sfs_neutFunction
sfs_neut(param,binom)

Expected rate of neutral allele frequency reduce by background selection. The spectrum depends on the number of individual []

\[\mathbb{E}[Ps_{(x)}] = \sum{x^{*}=x}{x^{*}=1}f_{B}(x)\]

Input

  • param::parameters: mutable structure containing the model.
  • SparseMatrixCSC{Float64, Int64}: convoluted SFS given the B value.

Return:

  • Vector{Float64}: expected rate of neutral alleles frequencies.
source
Missing docstring.

Missing docstring for MKtest.sfs_pos. Check Documenter's build log for details.

MKtest.sfs_negFunction
sfs_neg(param,p,binom)

Expected rate of positive selected allele frequency reduce by background selection. Spectrum drawn on a gamma DFE. It depends on the number of individuals.

Arguments

  • param::parameters: mutable structure containing the model
  • p::Float64: positive selected alleles probabilty.
  • binom::SparseMatrixCSC{Float64, Int64}: convoluted SFS given the B value.

Return:

  • Array{Float64}: expected negative selected alleles frequencies.
source
Missing docstring.

Missing docstring for MKtest.cumulative_sfs. Check Documenter's build log for details.

MKtest.reduce_sfsFunction
reduce_sfs(sfs_tmp,bins)

Function to reduce the SFS into N bins.

Arguments

  • sfs_tmp::Vector{Float64}: SFS vector.
  • bins::Int64: bin size to collapse the SFS.

Output

  • Vector{Float64}: Binned SFS vector.
source

Rates

MKtest.ratesFunction
rates(param,gH,gL,gam_flanking,gam_dfe,alpha,iterations,output)

Function to solve randomly N scenarios. The function will create N models, defined by parameters(), to solve analytically fixation rates and the expected SFS for each model. The rates will be used to compute summary statistics required at ABC inference. The function output a HDF5 file containing the solved models, the selected DAC and the analytical solutions.

Arguments

  • param::parameters: mutable structure containing the model.
  • gH::Array{Int64,1}: Range of strong selection coefficients.
  • gL::Union{Array{Int64,1},Nothing}: Range of weak selection coefficients.
  • gam_flanking::Array{Int64,1}: Range of deleterious selection coefficients at the flanking region.
  • gam_dfe::Array{Int64,1}: Range of deleterious selection coefficients at the coding region.
  • alpha::Vector{Float64}: Range of α value to solve.
  • iterations::Int64: Number of solutions.
  • output_file::String: File to output HDF5 file.

Returns

  • DataFrame: models solved.
  • Output_file: HDF5 file containing models solved and rates.
source

Summary statistics

MKtest.poisson_fixationFunction
poisson_fixation(observed_values,λds, λdn)

Divergence sampling from Poisson distribution. The expected neutral and selected fixations are subset through their relative expected rates (fix_neut, fix_neg_b, fix_pos_sim). Empirical values are used are used to simulate the locus L along a branch of time T from which the expected Ds and Dn raw count estimated given the mutation rate ($\mu$). Random number generation is used to subset samples arbitrarily given the success rate $\lambda$ in the distribution.

\[\mathbb{E}[D_N] = X \in Poisson\left(\lambda = D \times \left[\frac{\mathbb{E}[D_+] + \mathbb{E}[D_-]}{\mathbb{E}[D_+] + \mathbb{E}[D_-] + \mathbb{E}[D_0]}\right]\right)\]

\[\mathbb{E}[D_S] = X \in Poisson\left(\lambda = D \times \left[\frac{\mathbb{E}[D_0]}{\mathbb{E}[D_+] + \mathbb{E}[D_-] + \mathbb{E}[D_0]}\right]\right)\]

Arguments

  • observed_values::Array: Array containing the total observed divergence.
  • λds::Float64: expected neutral fixations rate.
  • λdn::Float64: expected selected fixations rate.

Returns

  • Array{Int64,1} containing the expected count of neutral and selected fixations.
source
MKtest.poisson_polymorphismFunction
poisson_polymorphism(observed_values,λps,λpn)

Polymorphism sampling from Poisson distributions. The total expected neutral and selected polimorphism are subset through the relative expected rates at the frequency spectrum (fix_neut, sfs_neut,). Empirical sfs are used to simulate the locus L along a branch of time T from which the expected Ps and Pn raw count are estimated given the mutation rate ($\mu$). Random number generation is used to subset samples arbitrarily from the whole sfs given each frequency success rate $\lambda$ in the distribution.

The success rate managing the Poisson distribution by the observed count each frequency. We considered both sampling variance and process variance is affecting the number of variable alleles we sample from SFS. This variance arises from the random mutation-fixation process along the branch. To incorporate this variance we do one sample per frequency-bin and use the total sampled variation and the SFS for the summary statistics.

\[\mathbb{E}[P_N] = \sum_{x=0}^{x=1} X \in Poisson\left(\lambda = SFS_{(x)} \times \left[\frac{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}]}{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}] + \mathbb{E}[P_{0(x)}]}\right]\right)\]

\[\mathbb{E}[P_S] = \sum_{x=0}^{x=1} X \in Poisson\left(\lambda = SFS_{(x)} \times \left[\frac{\mathbb{E}[P_{0(x)}]}{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}] + \mathbb{E}[P_{0(x)}]}\right]\right)\]

Arguments

  • observed_values::Array{Int64,1}: Array containing the total observed divergence.
  • λps::Array{Float64,1}: expected neutral site frequency spectrum rate.
  • λpn::Array{Float64,1}: expected selected site frequency spectrum rate.

Returns

  • Array{Int64,2} containing the expected total count of neutral and selected polymorphism.
source
MKtest.sampling_summariesFunction
sampled_alpha(observed_values,λds, λdn)

Ouput the expected values from the Poisson sampling process. Please check poisson_fixation and poisson_polymorphism to understand the samplingn process. α(x) is estimated through the expected values of Dn, Ds, Pn and Ps.

Arguments

  • param::parameters: Array containing the total observed divergence.
  • d::Array: observed divergence.
  • afs::Array: observed polymorphism.
  • λdiv::Array{Float64,2}: expected fixations rate.
  • λdiv::Array{Float64,2}: expected site frequency spectrum rates.

Returns

αsummaries,expdn,expds,exppn,exp_ps,ssAlpha

  • Array{Int64,2} containing α(x) values.

  • Array{Int64,1} expected non-synonymous divergence.

  • Array{Int64,1} expected synonymous divergence.

  • Array{Int64,1} expected non-synonymous polymorphism.

  • Array{Int64,1} expected synonymous polymorphism.

  • Array{Int64,1} expected synonymous polymorphism.

  • Array{Int64,1} expected synonymous polymorphism.

  • Array{Int64,2} containing α(x) binned values.

    samplingsummaries(gammaL,gammaH,pposl,ppos_h,observedData,nopos)

source
MKtest.summary_statisticsFunction
summary_statistics(param,h5_file,sfs,divergence,output_folder,summstat_size)

Estimate summary statistics using observed data and analytical rates. output_folder will be used to output summary statistics

Arguments

  • param::parameters : Mutable structure containing the models
  • h5_file::String : HDF5 containing solved models, fixation and polymorphic rates
  • sfs::Vector: SFS data parsed from parse_sfs().
  • divergence::Vector: divergence data from parse_sfs().
  • output_folder::String : path to save summary statistics and observed data.
  • summstat_size::Int64 : number of summary statistics to sample.
source

Inference tools

MKtest.parse_sfsFunction
parse_sfs(param;data,gene_list,sfs_columns,div_columns,l_columns,bins,isolines)

Function to parse polymorphism and divergence by subset of genes. The input data is based on supplementary material described at Uricchio et al. 2019. Please be sure the file is tabulated.

GeneIdPnDAF seppareted by commasPsDAF separated by commasDnDs
XXXX0,0,0,0,0,0,0,0,00,0,0,0,0,0,0,0,000
XXXX0,0,0,0,0,0,0,0,00,0,0,0,0,0,0,0,000
XXXX0,0,0,0,0,0,0,0,00,0,0,0,0,0,0,0,000

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • data::S: path to polymorphic and divergence raw data.
  • gene_list:S: path to gene list.
  • sfs_columns::Vector{Int64}: column numbers corresponding to Pn and Ps frequencies. Default values [3,5]
  • div_columns::Vector{Int64}: column numbers corresponding to Dn and Ds frequencies. Default values [6,7]
  • l_columns::Vector{Int64}: column numbers corresponding to total number of non-synonymous and synonymous sites. Not require at data. Default values [7,8]
  • bins::Union{Nothing,Int64}: size to bin the SFS independtly of the sample size.

Returns

  • Vector{Vector{Float64}}: α_x values. Estimated using the cumulative SFS
  • Vector{Matrix{Float64}}: Site Frequency Spectrum (non-cumulative)
  • Vector{Matrix{Int64}}: Synonymous and non-synonymous divergence counts
source
MKtest.ABCregFunction
ABCreg(output_folder, S, P, tol, abcreg)

Performing ABC inference using ABCreg. Please, be sure the input folder contain the observed and summary statistics produc ed by MKtest.summary_statistics().

Arguments

  • output_folder::String : folder containing the observed data and summary estatistics. It will be used to output the posterior distributions
  • P::Int64 : number of parameters to infer.
  • S::Int64 : number of summary stastitics to perform the inference.
  • tol::Float64 : tolerance value. It defines the number of accepted values at ABC inference.
  • abcreg::String : path to ABCreg binary.

Output

  • posteriors:Vector{Matrix{Float64}}: posteriors distributions estimated by ABCreg.
source
MKtest.summary_abcFunction
summary_abc(posteriors,stat)

Posterior distributions statistics. The function estimates Min., 0.5% Perc., Median, Mean, Mode, "99.5% Perc., Max values. The mode is estimated following R package abc mode function. The argument stat define the output estimation.

Arguments

  • posterior::Matrix{Float64} : posterior distribution.
  • stat::String: output estimation.

Output

  • DataFrame: posterior statistics.
  • DataFrame: chosen statistic inference.
source

MKtest

MKtest.aMKFunction
aMK(param,α;dac_rm,na_rm)

Function to estimate the asymptotic value of α(x).

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • sfs::Matrix: SFS data parsed from parse_sfs().
  • divergence::Matrix: divergence data from parse_sfs().
  • threshold{Float64}: frequency cutoff to perform imputedMK.
  • cumulative::Bool: Perform α(x) estimation using the cumulative SFS
  • dac_rm::Bool: Remove any value not selected at param.dac.
  • na_rm::Bool: Remove any NA value at α(x)

Returns

  • DataFrame
source
aMK(param,α;dac_rm,na_rm)

Function to estimate the asymptotic value of α(x).

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • sfs::Vector: SFS data parsed from parse_sfs().
  • divergence::Vector: divergence data from parse_sfs().
  • threshold{Float64}: frequency cutoff to perform imputedMK.
  • cumulative::Bool: Perform α(x) estimation using the cumulative SFS
  • dac_rm::Bool: Remove any value not selected at param.dac.
  • na_rm::Bool: Remove any NA value at α(x)

Returns

  • DataFrame
source
MKtest.imputedMKFunction
imputedMK(param,sfs,divergence;m,cutoff)

Function to estimate the imputedMK (https://doi.org/10.1093/g3journal/jkac206).

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • sfs::Vector: SFS data parsed from parse_sfs().
  • divergence::Vector: divergence data from parse_sfs().
  • threshold{Float64}: frequency cutoff to perform imputedMK.

Returns

  • DataFrame
source
imputedMK(param,sfs,divergence;m,cutoff)

Function to estimate the imputedMK (https://doi.org/10.1093/g3journal/jkac206).

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • sfs::Matrix{Float64}: SFS data parsed from parse_sfs().
  • divergence::Matrix{Int64}: divergence data from parse_sfs().
  • threshold{Float64}: frequency cutoff to perform imputedMK.

Returns

  • DataFrame
source
MKtest.fwwMKFunction
fwwMK(param,sfs,divergence;m,cutoff)

Function to estimate the fwwMK (https://doi.org/10.1038/4151024a).

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • sfs::Vector: SFS data parsed from parse_sfs().
  • divergence::Vector: divergence data from parse_sfs().
  • threshold{Float64}: frequency cutoff to perform fwwMK.

Returns

  • DataFrame
source
fwwMK(param,sfs,divergence;m,cutoff)

Function to estimate the fwwMK (https://doi.org/10.1038/4151024a).

Arguments

  • param::parameters: mutable structure containing the variables required to solve the model.
  • sfs::Matrix: SFS data parsed from parse_sfs().
  • divergence::Matrix: divergence data from parse_sfs().
  • threshold{Float64}: frequency cutoff to perform fwwMK.

Returns

  • DataFrame
source
MKtest.standardMKFunction
standardMK(sfs,divergence;m)

Function to estimate the original α value.

Arguments

  • sfs::Matrix{Float64}: SFS array
  • divergence::Array: divegence count array

Output

  • DataFrame
source
standardMK(sfs,divergence;m)

Function to estimate the original α value.

Arguments

  • sfs::Matrix{Float64}: SFS array
  • divergence::Array: divegence count array

Output

  • DataFrame
source
MKtest.grapesFunction
grapes(sfs,divergence,model)

Run grapes using SFS and divergence data. The function will install grapes using Conda from genomedk channel.

Arguments

  • sfs::Vector{Matrix{Float64}}: SFS data parsed from parse_sfs()
  • divergence::Vector{Matrix{Int64}}: divergence data parsed from parse_sfs()
  • model::String: grapes model (GammaZero, GammaExpo, DisplGamma, ScaledBeta, FGMBesselK and all)
  • n::Int64: smaller sample size to project the SFS
  • nearly_neutral::Int64: Sadp threshold of above which a mutation is considered adaptive
  • FWW_threshold::Float64: minimal allele frequency in FWW α
  • nb_rand_start::Int64: number of random starting values in model optimization (default=0); setting positive values will slow down the program but decrease the probability of being trapped in local optima.
  • anc_to_rec_Ne_ratio::Float64: divergence/polymorphism Ne ratio
  • no_div_data::Bool: only use divergence data to estimate DFE
  • no_div_param::Bool: implements so-called [-A] version in Galtier (2016); also called a DFE by Tataru et al. (2017), and indicated with * in Rousselle et al. (2018); irrelevant if model = GammaZero; (default=false)
  • no_syn_orient_error::Bool: force equal synonymous and non-synonymous mis-orientation rate(default=false)
  • fold::Bool: fold the SFS
  • fixed_param::String: this option should be used if one does not want to optimize every parameter, but rather have some parameters fixed to predefined values; parameter names and predefined values are passed via a control file; see example at the bottom of this file (default=none).

Output

  • DataFrame: grapes model estimation.
source