Model parameters
adap is the only variable exported from MKtest module. It is a Mutable structure contaning the variables required to solve the analytical approach. Any value can be easly changed. Remember adap should be change before the execution, in other case, $\alpha_{(x)}$ will be solve with the default values. To change all the values at once, you can use MKtest.parameters
in order to set specific models in a new variable.
MKtest.parameters
— TypeMutable structure containing the variables required to solve the analytical approach. All the functions are solve using the internal values of the structure. You should declare a mutable structure to the perform the analytical estimations.
Parameters
gam_flanking::Int64
: Selection coefficient for deleterious allelesgL::Int64
: Selection coefficient for weakly benefical allelesgH::Int64
: Selection coefficient for strongly benefical allelesal_low::Float64
: Proportion of α due to weak selectional_tot::Float64
: αθ_flanking::Float64
: Mutation rate defining BGS strengthθ_coding::Float64
: Mutation rate on coding regional::Float64
: DFE shape parameterbe::Float64
: DFE scale parameterB::Float64
: BGS strengthB_bins::Array{Float64,1}
: BGS values to simulateppos_l::Float64
: Fixation probabily of weakly beneficial allelesppos_h::Float64
: Fixation probabily of strongly beneficial allelesN::Int64
: Population sizen::Int64
: Sample sizeLf::Int64
: Flanking region lengthρ::Float64
: Recombination rateTE::Float64
MKtest.analytical_alpha
— Functionanalytical_alpha(param)
Analytical α(x) estimation. Solve α(x) generally. We used the expected fixation rates and SFSto approach the asympotic value accouting for background selection, weakly and strong positive selection. α(x) can be estimated taking into account the role of positive selected alleles or not. In this way we explore the role of linkage to deleterious alleles in the coding region.
\[\mathbb{E}[\alpha_{x}] = 1 - \left(\frac{\mathbb{E}[D_{s}]}{\mathbb{E}[D_{N}]}\frac{\mathbb{E}[P_{N}]}{\mathbb{E}[P_{S}]}\right)\]
Arguments
param::parameters
Returns
Tuple{Vector{Float64},Vector{Float64}}
: α(x) accounting for weak adaptation, α(x) non-accouting for weak adaptation.
MKtest.Br
— FunctionBr(Lmax,theta)
Expected reduction in nucleotide diversity. Explored at Charlesworth B., 1994:
\[\frac{\pi}{\pi_{0}} = e^{\frac{4\mu L}{2rL+t}}\]
Arguments
param::parameters
theta::Float64
Returns
Float64
: expected reduction in diversity given a non-coding length, mutation rate and defined recombination.
Missing docstring for MKtest.set_θ
. Check Documenter's build log for details.
MKtest.binom_op
— Functionbinom_op(param)
Binomial convolution to sample the allele frequencies probabilites depending on background selection values and sample size.
Arguments
param::parameters
B_bins::Vector{Float64}
Returns
Array{Float64,2}
: convoluted SFS for each B value defined in the model (B_bins). The estimations are saved at convolutedBn.bn.
MKtest.Φ
— FunctionΦ(param,s)
Reduction in fixation probabilty due to background selection and linkage. The formulas used have been subjected to several theoretical works (Charlesworth B., 1994, Hudson et al., 1995, Nordborg et al. 1995, Barton NH., 1995).
The fixation probabilty of selected alleles are reduce by a factor $\phi$:
\[\phi(t,s) = e^{[\frac{-2\mu}{t(1+\frac{rL}{t}+\frac{2s}{t})}]}\]
Multiplying across all deleterious linkes sites, we find:
\[\Phi = \prod_{1}^{L} = \phi(t,s) = e^{\frac{-2t\mu(\psi[1,\frac{r+2s+L}{r}] - \psi[1,\frac{r(L+1)+2s+t}{r}])}{r^2}}\]
\[\phi(t,s) = e^{\frac{-2t\mu(\psi[1,\frac{r+2s+L}{r}] - \psi[1,\frac{r(L+1)+2s+t}{r}])}{r^2}}\]
Arguments
param::parameters
: mutable structure containing the variables required to solve the modelgamma::Int64
: selection coefficient.
Returns
Float64
: expected rate of positive fixations under background selection.
MKtest estimation
Fixations
MKtest.fix_neut
— Function- fix_neut(parameters)
Expected neutral fixations rate reduce by B value.
\[\mathbb{E}[D_{s}] = (1 - p_{-} - p_{+}) B \frac{1}{2N}\]
Arguments
param::parameters
: mutable structure containing the variables required to solve the model
Returns
Float64
: expected rate of neutral fixations.
MKtest.fix_neg
— Functionfix_neg(parameters,ppos)
Expected fixation rate from negative DFE.
\[\mathbb{E}[D_{n-}] = p_{-}\left(2^-\alpha\beta^\alpha\left(-\zeta[\alpha,\frac{2+\beta}{2}] + \zeta[\alpha,1/2(2-\frac{1}{N+\beta})]\right)\right)\]
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.ppos::Float64
: positive selected alleles probabilty.
Returns
Float64
: expected rate of fixations from negative DFE.
MKtest.p_fix
— Functionp_fix(parameters,gam)
Expected positive fixation rate.
\[\mathbb{E}[D_{n+}] = p_{+} \cdot B \cdot (1 - e^{(-2s)})\]
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.gam::Int64
: positive selection coefficient.
Returns
Float64
: expected rate of positive fixation.
MKtest.fix_pos_sim
— Functionfix_pos_sim(parameters,gamma,ppos)
Expected positive fixations rate reduced due to the impact of background selection and linkage. The probabilty of fixation of positively selected alleles is reduced by a factor Φ across all deleterious linked sites Analytical.Φ
.
\[\mathbb{E}[D_{n+}] = \Phi \cdot \mathbb{E}[D_{n+}]\]
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.gam
::Int64: positive selection coefficient.ppos::Float64
: positive selected alleles probabilty.
Returns
Float64
: expected rate of positive fixations under background selection
Polymorphism
MKtest.sfs_neut
— Functionsfs_neut(param,binom)
Expected rate of neutral allele frequency reduce by background selection. The spectrum depends on the number of individual []
\[\mathbb{E}[Ps_{(x)}] = \sum{x^{*}=x}{x^{*}=1}f_{B}(x)\]
Input
param::parameters
: mutable structure containing the model.SparseMatrixCSC{Float64, Int64}
: convoluted SFS given the B value.
Return:
Vector{Float64}
: expected rate of neutral alleles frequencies.
Missing docstring for MKtest.sfs_pos
. Check Documenter's build log for details.
MKtest.sfs_neg
— Functionsfs_neg(param,p,binom)
Expected rate of positive selected allele frequency reduce by background selection. Spectrum drawn on a gamma DFE. It depends on the number of individuals.
Arguments
param::parameters
: mutable structure containing the modelp::Float64
: positive selected alleles probabilty.binom::SparseMatrixCSC{Float64, Int64}
: convoluted SFS given the B value.
Return:
Array{Float64}
: expected negative selected alleles frequencies.
Missing docstring for MKtest.cumulative_sfs
. Check Documenter's build log for details.
MKtest.reduce_sfs
— Functionreduce_sfs(sfs_tmp,bins)
Function to reduce the SFS into N bins.
Arguments
sfs_tmp::Vector{Float64}
: SFS vector.bins::Int64
: bin size to collapse the SFS.
Output
Vector{Float64}
: Binned SFS vector.
Rates
MKtest.rates
— Functionrates(param,gH,gL,gam_flanking,gam_dfe,alpha,iterations,output)
Function to solve randomly N scenarios. The function will create N models, defined by parameters()
, to solve analytically fixation rates and the expected SFS for each model. The rates will be used to compute summary statistics required at ABC inference. The function output a HDF5 file containing the solved models, the selected DAC and the analytical solutions.
Arguments
param::parameters
: mutable structure containing the model.gH::Array{Int64,1}
: Range of strong selection coefficients.gL::Union{Array{Int64,1},Nothing}
: Range of weak selection coefficients.gam_flanking::Array{Int64,1}
: Range of deleterious selection coefficients at the flanking region.gam_dfe::Array{Int64,1}
: Range of deleterious selection coefficients at the coding region.alpha::Vector{Float64}
: Range of α value to solve.iterations::Int64
: Number of solutions.output_file::String
: File to output HDF5 file.
Returns
DataFrame
: models solved.Output_file
: HDF5 file containing models solved and rates.
Summary statistics
MKtest.poisson_fixation
— Functionpoisson_fixation(observed_values,λds, λdn)
Divergence sampling from Poisson distribution. The expected neutral and selected fixations are subset through their relative expected rates (fix_neut
, fix_neg_b
, fix_pos_sim
). Empirical values are used are used to simulate the locus L along a branch of time T from which the expected Ds and Dn raw count estimated given the mutation rate ($\mu$). Random number generation is used to subset samples arbitrarily given the success rate $\lambda$ in the distribution.
\[\mathbb{E}[D_N] = X \in Poisson\left(\lambda = D \times \left[\frac{\mathbb{E}[D_+] + \mathbb{E}[D_-]}{\mathbb{E}[D_+] + \mathbb{E}[D_-] + \mathbb{E}[D_0]}\right]\right)\]
\[\mathbb{E}[D_S] = X \in Poisson\left(\lambda = D \times \left[\frac{\mathbb{E}[D_0]}{\mathbb{E}[D_+] + \mathbb{E}[D_-] + \mathbb{E}[D_0]}\right]\right)\]
Arguments
observed_values::Array
: Array containing the total observed divergence.λds::Float64
: expected neutral fixations rate.λdn::Float64
: expected selected fixations rate.
Returns
Array{Int64,1}
containing the expected count of neutral and selected fixations.
MKtest.poisson_polymorphism
— Functionpoisson_polymorphism(observed_values,λps,λpn)
Polymorphism sampling from Poisson distributions. The total expected neutral and selected polimorphism are subset through the relative expected rates at the frequency spectrum (fix_neut
, sfs_neut
,). Empirical sfs are used to simulate the locus L along a branch of time T from which the expected Ps and Pn raw count are estimated given the mutation rate ($\mu$). Random number generation is used to subset samples arbitrarily from the whole sfs given each frequency success rate $\lambda$ in the distribution.
The success rate managing the Poisson distribution by the observed count each frequency. We considered both sampling variance and process variance is affecting the number of variable alleles we sample from SFS. This variance arises from the random mutation-fixation process along the branch. To incorporate this variance we do one sample per frequency-bin and use the total sampled variation and the SFS for the summary statistics.
\[\mathbb{E}[P_N] = \sum_{x=0}^{x=1} X \in Poisson\left(\lambda = SFS_{(x)} \times \left[\frac{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}]}{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}] + \mathbb{E}[P_{0(x)}]}\right]\right)\]
\[\mathbb{E}[P_S] = \sum_{x=0}^{x=1} X \in Poisson\left(\lambda = SFS_{(x)} \times \left[\frac{\mathbb{E}[P_{0(x)}]}{\mathbb{E}[P_{+(x)}] + \mathbb{E}[P_{-(x)}] + \mathbb{E}[P_{0(x)}]}\right]\right)\]
Arguments
observed_values::Array{Int64,1}
: Array containing the total observed divergence.λps::Array{Float64,1}
: expected neutral site frequency spectrum rate.λpn::Array{Float64,1}
: expected selected site frequency spectrum rate.
Returns
Array{Int64,2}
containing the expected total count of neutral and selected polymorphism.
MKtest.sampling_summaries
— Functionsampled_alpha(observed_values,λds, λdn)
Ouput the expected values from the Poisson sampling process. Please check poisson_fixation
and poisson_polymorphism
to understand the samplingn process. α(x) is estimated through the expected values of Dn, Ds, Pn and Ps.
Arguments
param::parameters
: Array containing the total observed divergence.d::Array
: observed divergence.afs::Array
: observed polymorphism.λdiv::Array{Float64,2}
: expected fixations rate.λdiv::Array{Float64,2}
: expected site frequency spectrum rates.
Returns
αsummaries,expdn,expds,exppn,exp_ps,ssAlpha
Array{Int64,2}
containing α(x) values.Array{Int64,1}
expected non-synonymous divergence.Array{Int64,1}
expected synonymous divergence.Array{Int64,1}
expected non-synonymous polymorphism.Array{Int64,1}
expected synonymous polymorphism.Array{Int64,1}
expected synonymous polymorphism.Array{Int64,1}
expected synonymous polymorphism.Array{Int64,2}
containing α(x) binned values.samplingsummaries(gammaL,gammaH,pposl,ppos_h,observedData,nopos)
MKtest.summary_statistics
— Functionsummary_statistics(param,h5_file,sfs,divergence,output_folder,summstat_size)
Estimate summary statistics using observed data and analytical rates. output_folder will be used to output summary statistics
Arguments
param::parameters
: Mutable structure containing the modelsh5_file::String
: HDF5 containing solved models, fixation and polymorphic ratessfs::Vector
: SFS data parsed from parse_sfs().divergence::Vector
: divergence data from parse_sfs().output_folder::String
: path to save summary statistics and observed data.summstat_size::Int64
: number of summary statistics to sample.
Inference tools
MKtest.parse_sfs
— Functionparse_sfs(param;data,gene_list,sfs_columns,div_columns,l_columns,bins,isolines)
Function to parse polymorphism and divergence by subset of genes. The input data is based on supplementary material described at Uricchio et al. 2019. Please be sure the file is tabulated.
GeneId | Pn | DAF seppareted by commas | Ps | DAF separated by commas | Dn | Ds |
---|---|---|---|---|---|---|
XXXX | 0 | ,0,0,0,0,0,0,0,0 | 0 | ,0,0,0,0,0,0,0,0 | 0 | 0 |
XXXX | 0 | ,0,0,0,0,0,0,0,0 | 0 | ,0,0,0,0,0,0,0,0 | 0 | 0 |
XXXX | 0 | ,0,0,0,0,0,0,0,0 | 0 | ,0,0,0,0,0,0,0,0 | 0 | 0 |
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.data::S
: path to polymorphic and divergence raw data.gene_list:S
: path to gene list.sfs_columns::Vector{Int64}
: column numbers corresponding to Pn and Ps frequencies. Default values [3,5]div_columns::Vector{Int64}
: column numbers corresponding to Dn and Ds frequencies. Default values [6,7]l_columns::Vector{Int64}
: column numbers corresponding to total number of non-synonymous and synonymous sites. Not require at data. Default values [7,8]bins::Union{Nothing,Int64}
: size to bin the SFS independtly of the sample size.
Returns
Vector{Vector{Float64}}
: α_x values. Estimated using the cumulative SFSVector{Matrix{Float64}}
: Site Frequency Spectrum (non-cumulative)Vector{Matrix{Int64}}
: Synonymous and non-synonymous divergence counts
MKtest.ABCreg
— FunctionABCreg(output_folder, S, P, tol, abcreg)
Performing ABC inference using ABCreg. Please, be sure the input folder contain the observed and summary statistics produc ed by MKtest.summary_statistics().
Arguments
output_folder::String
: folder containing the observed data and summary estatistics. It will be used to output the posterior distributionsP::Int64
: number of parameters to infer.S::Int64
: number of summary stastitics to perform the inference.tol::Float64
: tolerance value. It defines the number of accepted values at ABC inference.abcreg::String
: path to ABCreg binary.
Output
posteriors:Vector{Matrix{Float64}}
: posteriors distributions estimated by ABCreg.
MKtest.summary_abc
— Functionsummary_abc(posteriors,stat)
Posterior distributions statistics. The function estimates Min., 0.5% Perc., Median, Mean, Mode, "99.5% Perc., Max values. The mode is estimated following R package abc mode function. The argument stat
define the output estimation.
Arguments
posterior::Matrix{Float64}
: posterior distribution.stat::String
: output estimation.
Output
DataFrame
: posterior statistics.DataFrame
: chosen statistic inference.
MKtest
MKtest.aMK
— FunctionaMK(param,α;dac_rm,na_rm)
Function to estimate the asymptotic value of α(x).
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.sfs::Matrix
: SFS data parsed from parse_sfs().divergence::Matrix
: divergence data from parse_sfs().threshold{Float64}
: frequency cutoff to perform imputedMK.cumulative::Bool
: Perform α(x) estimation using the cumulative SFSdac_rm::Bool
: Remove any value not selected at param.dac.na_rm::Bool
: Remove any NA value at α(x)
Returns
DataFrame
aMK(param,α;dac_rm,na_rm)
Function to estimate the asymptotic value of α(x).
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.sfs::Vector
: SFS data parsed from parse_sfs().divergence::Vector
: divergence data from parse_sfs().threshold{Float64}
: frequency cutoff to perform imputedMK.cumulative::Bool
: Perform α(x) estimation using the cumulative SFSdac_rm::Bool
: Remove any value not selected at param.dac.na_rm::Bool
: Remove any NA value at α(x)
Returns
DataFrame
MKtest.imputedMK
— FunctionimputedMK(param,sfs,divergence;m,cutoff)
Function to estimate the imputedMK (https://doi.org/10.1093/g3journal/jkac206).
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.sfs::Vector
: SFS data parsed from parse_sfs().divergence::Vector
: divergence data from parse_sfs().threshold{Float64}
: frequency cutoff to perform imputedMK.
Returns
DataFrame
imputedMK(param,sfs,divergence;m,cutoff)
Function to estimate the imputedMK (https://doi.org/10.1093/g3journal/jkac206).
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.sfs::Matrix{Float64}
: SFS data parsed from parse_sfs().divergence::Matrix{Int64}
: divergence data from parse_sfs().threshold{Float64}
: frequency cutoff to perform imputedMK.
Returns
DataFrame
MKtest.fwwMK
— FunctionfwwMK(param,sfs,divergence;m,cutoff)
Function to estimate the fwwMK (https://doi.org/10.1038/4151024a).
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.sfs::Vector
: SFS data parsed from parse_sfs().divergence::Vector
: divergence data from parse_sfs().threshold{Float64}
: frequency cutoff to perform fwwMK.
Returns
DataFrame
fwwMK(param,sfs,divergence;m,cutoff)
Function to estimate the fwwMK (https://doi.org/10.1038/4151024a).
Arguments
param::parameters
: mutable structure containing the variables required to solve the model.sfs::Matrix
: SFS data parsed from parse_sfs().divergence::Matrix
: divergence data from parse_sfs().threshold{Float64}
: frequency cutoff to perform fwwMK.
Returns
DataFrame
MKtest.standardMK
— FunctionstandardMK(sfs,divergence;m)
Function to estimate the original α value.
Arguments
sfs::Matrix{Float64}
: SFS arraydivergence::Array
: divegence count array
Output
DataFrame
standardMK(sfs,divergence;m)
Function to estimate the original α value.
Arguments
sfs::Matrix{Float64}
: SFS arraydivergence::Array
: divegence count array
Output
DataFrame
MKtest.grapes
— Functiongrapes(sfs,divergence,model)
Run grapes using SFS and divergence data. The function will install grapes using Conda from genomedk channel.
Arguments
sfs::Vector{Matrix{Float64}}
: SFS data parsed from parse_sfs()divergence::Vector{Matrix{Int64}}
: divergence data parsed from parse_sfs()model::String
: grapes model (GammaZero, GammaExpo, DisplGamma, ScaledBeta, FGMBesselK and all)n::Int64
: smaller sample size to project the SFSnearly_neutral::Int64
: Sadp threshold of above which a mutation is considered adaptiveFWW_threshold::Float64
: minimal allele frequency in FWW αnb_rand_start::Int64
: number of random starting values in model optimization (default=0); setting positive values will slow down the program but decrease the probability of being trapped in local optima.anc_to_rec_Ne_ratio::Float64
: divergence/polymorphism Ne rationo_div_data::Bool
: only use divergence data to estimate DFEno_div_param::Bool
: implements so-called [-A] version in Galtier (2016); also called a DFE by Tataru et al. (2017), and indicated with * in Rousselle et al. (2018); irrelevant if model = GammaZero; (default=false)no_syn_orient_error::Bool
: force equal synonymous and non-synonymous mis-orientation rate(default=false)fold::Bool
: fold the SFSfixed_param::String
: this option should be used if one does not want to optimize every parameter, but rather have some parameters fixed to predefined values; parameter names and predefined values are passed via a control file; see example at the bottom of this file (default=none).
Output
DataFrame
: grapes model estimation.