Skip to content

bogdanlab/BEAVR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BEAVR

BEAVR is a Bayesian framework designed to perform local polygenicity estimation within a region.

Basic dependencies

  • Python2.7 (we recommend using anaconda)
  • numpy
  • scikit-learn

Installation

git clone https://github.com/bogdanlab/BEAVR.git

Running with example data

cd BEAVR/
cd scripts/

Change the MASTER_PATH varible to reflect the directory where you installed BEAVR. See below:

#!/usr/bin/env sh

MASTER_PATH=/u/home/r/ruthjohn/BEAVR
SCRIPT_DIR=${MASTER_PATH}/scripts
LOCI_DIR=${MASTER_PATH}/data
SRC_DIR=${MASTER_PATH}/src

Simply, run the test script:

./run_beavr.sh

Running with your own data

Data

As input, BEAVR requires a file with SNP-SNP correlations, or LD (no header). Additionally, we require a file with standardized effects from gwas. Each region requires its own set of files. For now, we require that the header of the standardized effect sizes be labeled as BETA_STD (we are currently working to allow users to specify their own header); all other columns will be ignored.

For details on how to standardize the effects from GWAS, please see: https://huwenboshi.github.io/data%20management/2017/11/23/tips-for-formatting-gwas-summary-stats.html

The first few lines of each file are shown below:

.ld file

1.000000000000000000e+00 0.200000000000000000e+00 0.300000000000000
0.300000000000000000e+00 1.000000000000000000e+00 0.100000000000000
0.500000000000000000e+00 0.000500000000000000e+00 1.000000000000000

.gwas file

BETA_STD BETA_TRUE n snp z BETA_STD_I
0.000692574347539 0.0 500000 rs116911124 0.489724017621 0.000692574347539
-0.000478672356416 -0.0 500000 rs131533 -0.338472469188 -0.000478672356416

Step 1: Inverse transform gwas effects and LD

To avoid additional computations, we do an inverse transformation on the GWAS effects and LD matrix. Since this only needs to be performed once, we do these computations in a separate step. Note that this will automatically append an additional column called BETA_STD_I onto your GWAS file.

We provide scripts below:

# transform LD
python ${SCRIPT_DIR}/helper/half_ld.py.py \
       --ld_file $LD_FILE
       --ld_out $LD_NEG_HALF_FILE

# transform betas
python ${SCRIPT_DIR}/helper/transform_betas.py  \
          --gwas_file $LOCUS_FILE  \
          --ld_neg_half $LD_NEG_HALF_FILE

Step 2: Estimate local polygenicity

Below is a summary of the current flags and options for running BEAVR:

--seed int for setting the seed (note that because this is an MCMC, there is stochasticity in the inference)
--N sample size from GWAS
--id string that will be the prefix for output files
--its how many MCMC iterations (we recommend 1000)
--ld_half_file full path to transformed LD file (see above for more details)
--gwas_file full path to GWAS file with transformed effects (see above for more details)
--H_gw pre-computed heritability for the region
--M_gw number of SNPs in the region
--outdir full path to designated output directory
--dp speedup flag (this enables speedup)

An example command follows:

        python ${SRC_DIR}/main_new.py \
              --seed $SEED \
              --N $N \
              --id $PREFIX \
              --its $ITS \
              --ld_half_file $LD_HALF_FILE \
              --gwas_file $LOCUS_FILE  \
              --outdir $LOCI_DIR \
  	          --H_gw $H2 \
              --M_gw $M_GW \
 	            --dp

Step 3: Interpreting the output

A log file will be created with the .log extension. An example output looks like the following:

- - - - - - - - - - UNITY v3.0 - - - - - - - - -
Run id: test
Heritability: 0.5000
Sample size: 500000
Iterations: 1000
Seed: 2019
Getting effect sizes from: /u/home/r/ruthjohn/BEAVR/data/p_0.01_h2_0.5_N_500000_ld_1_M_372_1.gwas
Using ld  from dir: /u/home/r/ruthjohn/BEAVR/data/identity_372.ld
Outputing simulated GWAS to dir: /u/home/r/ruthjohn/BEAVR/data
- - - - - - - - - - - - - - - - - - - - - - - -
Accept percent: 9.0000
Estimate p: 0.006885
SD p: 0.005308

The first set of information reports all of the options the user has specified. The second set of lines are described as:

Accept percent - acceptance percentage of the MCMC
Estimate p - posterior mean of the local polygenicity 
SD p - standard deviation from the posterior distribution

Bugs/Errors/Contact

The BEAVR software is still in beta version and we expect quite a few improvements in the workflow. For immediate questions/comments/concerns, please submit a Github issue so that all users can see the question/answer. For specific questions, feel free to email: [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published