Estimating fixation and polymorphic rates considering generalized model of selection and linkage
Before executing the rate estimation, you need to load Distributed module and add some threads
using Distributed
addprocs(7)
Then you need to declare the Analytical
module in all the threads using @everywhere
macro. Otherwise, the Analytical
module will perform the estimation just using the main core
@everywhere using Analytical, ParallelUtilities
using CSV, DataFrames, JLD2, ProgressMeter
Declare a variable containing some basic information about your model. We used a sample size of 661 to perform later analysis over TGP data. The selected Derived Alleles Counts (DAC) will be used to compute summary statistics and perform ABC inference. It is possible to subset any of the selected DAC values when computing summary statistics.
adap = Analytical.parameters(n=661,dac=[1,2,4,5,10,20,50,100,200,400,500,661,925,1000])
Note that adap
contains about mutation rate, recombination rates, DFE, BGS and probabilities of fixations. To check all the arguments you can access to the function documentation using @doc Analytical.parameter
@doc Analytical.parameter
Mutable structure containing the variables required to solve the analytical approach. All the functions are solve using the internal values of the structure. For this reason, adap is the
only exported variable. Adap should be change before the perform the analytical approach, in other case, $\alpha_{(x)}$ will be solve with the default values.
Parameters
≡≡≡≡≡≡≡≡≡≡≡≡
• gamNeg::Int64: Selection coefficient for deleterious alleles
• gL::Int64: Selection coefficient for weakly benefical alleles
• gH::Int64: Selection coefficient for strongly benefical alleles
• alLow::Float64: Proportion of α due to weak selection
• alTot::Float64: α
• thetaF::Float64: Mutation rate defining BGS strength
• thetaMidNeutral::Float64: Mutation rate on coding region
• al::Float64: DFE shape parameter
• be::Float64: DFE scale parameter
• B::Float64: BGS strength
• bRange::Array{Float64,1}: BGS values to simulate
• pposL::Float64: Fixation probabily of weakly beneficial alleles
• pposH::Float64: Fixation probabily of strongly beneficial alleles
• N::Int64: Population size
• n::Int64: Sample size
• Lf::Int64: Flanking region length
• rho::Float64: Recombination rate
• TE::Float64
We will estimate the empirical adaptation rate using TGP data using the estimated DFE parameters at Boyko et al (2008). Nonetheless, shape and scale DFE parameter are flexible in our model.
adap.al = 0.184
adap.be = abs(0.184/-457)
Before to automatize the fixation and polimorphic rates estimation, you must to convolute the binomial distribution to obtain the downsampled SFS
convoluted_samples = Analytical.binomial_dict()
Analytical.binomOp!(adap,convoluted_samples.bn)
Note the Analytical.binomOp!
make inplace estimation at convoluted_samples
given the BGS range defined at adap.bRange
@docs Analytical.binomOp!
binomOp(param)
Binomial convolution to sample the allele frequencies probabilites depeding on background selection values, and sample size.
Arguments
≡≡≡≡≡≡≡≡≡≡≡
• param::parameters
• convoluted_samples::binomial_dict
Returns
≡≡≡≡≡≡≡≡≡
• Array{Float64,2}: convoluted SFS given for each B value defined in the model. Results saved at param.bn.
Now the variable adap
contains sample size, DAC and DFE information. The function Analytical.rates
will perform the analytical estimation of N independent models regarding DFE, BGS, mutation rate, and recombination rate. In the following example, we used the function Analytical.rates
to input the prior distributions. The function will randomize the input values to solve N independent estimation regarding our model.
@doc Analytical.rates
rates(param::parameters,iterations::Int64,divergence::Array,sfs::Array)
Function to solve randomly N scenarios. The function will create N models, defined by Analytical.parameters(), to estimate analytically fixation and polymorphic rates for each model. The
rates will be used to compute summary statistics required at ABC. The function output a HDF5 file containing the solved models, the selected DAC and the analytical rates.
If rho and/or theta are set to nothing, the function will input random values given the range 0.0005:0.0005:0.01. Otherwise you can fix the values.
If gL is set to nothing, the function will not account the role of the weakly selected alleles in the estimation.
Arguments
≡≡≡≡≡≡≡≡≡≡≡
• param::parameters: mutable structure containing the model
• convoluted_samples::binomial_dict : structure containing the binomial convolution
• gH::Array{Int64,1} : Range of strong selection coefficients
• gL::Union{Array{Int64,1},Nothing}: Range of weak selection coefficients
• gamNeg::Array{Int64,1} : Range of deleterious selection coefficients
• theta::Union{Float64,Nothing} : Population-scaled mutation rate on coding region
• rho::Union{Float64,Nothing} : Population-scaled recombination rate
• shape::Float64=0.184 : DFE shape parameter
• iterations::Int64 : Number of solutions
• output::String : File to output HDF5 file
Returns
≡≡≡≡≡≡≡≡≡
• Array: summary statistics
• Output: HDF5 file containing models solved and rates.
@time df = Analytical.rates(param = adap,convoluted_samples=convoluted_samples,gH=collect(200:2000),gL=collect(1:10),gamNeg=collect(-2000:-200),iterations = 10^5,shape=adap.al,output="analysis/rates.jld2");
The function will create a HDF5 file containing the solved models, fixation rates, polymorphic rates, and the selected DAC. This information will be used later to estimate summary statistics.
Note that Analytical.rates
is the most resource and time-consuming function. In our case, the function will estimate 10^5 independent models. Each model solves the estimation for all possible BGS values. We used BGS values from 0.1 to 0.999 in 5% increments. In total, the example will produce 3.7 million estimates. We have used a hierarchical data structure (HDF5) to facilitate model parameters and rates storage.
The following example took about 1.5 hours to execute on the hardware described at section Infering the rate and strength of adaptation
If you have a system with few resources, it is possible to download pre-computed TGP and DGN data rates. Please go to the next section to continue the inference using the pre-computed rates.