extended download function to sample data, split existing vep & non-cds downloads into separate files; moved update_vep_table() function into new save_data() method
re-arranged order of loaded modules in NGS.pm to avoid redefined var warning; moved vep results from template var to vars var; replace BioSphere::api_version with a Moo 'has' api_version to avoid error in get_cache_types() when called from NGS.pm
trapped dbi errors in try/catch block, so handled same as other VEP.pm errors; replaced NGS::MooX::Types with Local::MooX::Types; changed existing_variation col in vep_data & sample_data tables to TEXT due to entry >255 chars
simplified args passed in NGS.pm method calls - src_data & data_file are equivalent so replaced all refs to former with latter; simplified config block in _parse()
simplified args passed in NGS.pm method calls - src_data & data_file are equivalent so replaced all refs to former with latter; simplified config block in _parse()
11 years ago
README.txt
http://www.ensembl.org/info/docs/variation/vep/vep_script.html
Setup/update VEP - delete scripts/variant_effect_predictor dir
extract variant_effect_predictor.tar.gz into scripts dir
run scripts/variant_effect_predictor/INSTALL.pl - skip cache.
Cache manually installed/updated (needs to match VEP version) from:
ftp://ftp.ensembl.org/pub/release-<xx>/variation/VEP/
see README.txt in NGS/script dir for more details
Cache directory structure:
vep dir containing core & refseq dirs, each with homo_sapiens/xx cache
* on dev box - symlink /home/raj/.vep -> /media/sf_WIN_DRIVE/vep
* on deployment - /home/raj/.vep
to run from command-line:
perl script/variant_effect_predictor.pl -config=script/vep.ini \
-i=t/data/myeloid_variants.vep -o=output.txt --polyphen=b --sift=b \
--check_existing --coding_only --regulatory --dir=refseq/core
or:
perl script/variant_effect_predictor.pl -config=/home/raj/apps/NGS/script/vep.ini \
-o=output.txt --dir=/home/raj/.vep/refseq --polyphen=b --sift=b --check_existing \
--coding_only --regulatory -i=/tmp/20_06_14.vep
PERFORMANCE:
a) 4055-row vep input with unsorted chromosome order on 163.160.171.48:
fork=0 255 sec
fork=2 580 sec
fork=4 620 sec
b) 1028-row vep input with alpha-numerically sorted chromosome order on 163.160.171.48:
fork=0 95 sec
fork=2 65 sec (optimal)
fork=3 68 sec
fork=4 70 sec
fork=5 75 sec
fork=6 75 sec
performance much WORSE on dev server with fork enabled
splitting a 1110-row vep input into separate chromosome input files slightly
REDUCED performance (77sec vs 66sec) on dev