500k WGS FAQs
Last updated
Was this helpful?
Last updated
Was this helpful?
This FAQ addresses questions related to the new data dispensing functionality that allows users to select which elements of the data to dispense. If you would like more information on the new 500k WGS data release, visit the UK Biobank FAQ.
You can subscribe at https://status.dnanexus.com/
Currently the refresh feature is unavailable to ensure that the maximum number of users can get access to the new data as soon as possible via dispensal.
We recommend that users dispense a new project to get the 500k WGS data, and migrate data analysis workflows from existing projects to the new project. We will enable the “refresh” feature again in the future and send notifications out once it is available.
We recommend that each research application dispense data to only one project to be considerate to other researchers who would like to access the data.
Each dispense request will take about 4-8 hours once your project starts dispensing. However, due to the large number of people interested in 500k WGS data and the size of this data, you might experience a long waiting time for your project dispensal to start due to the queue of requests. Please do not dispense more than one project.
Your request to dispense data may be queued behind that of other users. The system will service your request in the order it was received. We appreciate your patience during that time. https://dnanexus.gitbook.io/uk-biobank-rap/frequently-asked-questions#i-created-a-project-but-its-stuck-at-0-.
You will need to create a new project in order to access the data. Note that you will not be able to refresh an already existing project. On the project creation screen, users will now see a new section with the different data types available to dispense. For a faster dispensal time, only select what data you’ll need. You will have the option to dispense the data on project creation or later in the project settings of that new project.
If you are interested in accessing the updated phenotypic, health care and proteomics data, select structured tabular data. This option is selected by default, but can be unselected if the data is not necessary for your project.
If you are interested in accessing the updated imaging data or the population-level WGS pVCF data, select unstructured bulk data files. This option will dispense population-level WGS pVCF data (600,000 files), but not individual-level WGS data such as CRAM or gVCF files. This was decided in order to streamline the new project experience for all users. If your research requires access to the individual-level WGS data (18 million files), return back to the project once the initial dispensing is completed and request an additional dispensing of these data files.
Due to the size of the dispensal we recommend waiting until demand for the WGS has decreased.
Due to the size of the data (18 million files), we recommend waiting until the demand for the WGS has reduced. If your research requires access to the individual-level WGS data, you will have to request "Additional Bulk Data Files" after your first request has been completed. You can make the request in your project settings by selecting the “Dispense More Data” button.
You can create an empty project without dispensing data by deselecting both checkboxes on the project creation screen.
See the below table for details.
Exome sequences
Exome OQFE variant call files (VCFs)
Exome OQFE variant call file (VCFs) indices
Exome OQFE CRAM files
Exome OQFE CRAM indices
Exome sequences - Previous exome releases
Exome OQFE variant call files (VCFs) - interim 200k release
Exome OQFE variant call file (VCFs) indices - interim 200k release
Exome OQFE CRAM files - interim 200k release
Exome OQFE CRAM indices - interim 200k release
Exome sequences - Alternative exome processing
Exome variant call files (DRAGEN) (VCFs)
Exome variant call file (DRAGEN) (VCFs) indices
Whole genome sequences - GATK and GraphTyper
Whole genome GATK variant call files (VCFs) and indices [500k release]
Whole genome GATK CRAM files and indices [500k release]
BQSR - GATK BaseRecalibrator [500k release]
Concatenated QC Metrics [500k release]
Genotype Concordance [500k release]
Genotype Concordance - Contingency Metrics [500k release]
Genotype Concordance - Detail Metrics [500k release]
Genotype Concordance - Summary Metrics (Picard) [500k release]
Sample Contamination (ReadHaps) [500k release]
Sample Contamination (verifyBamID) - depthSM [500k release]
Sample Contamination (verifyBamID) - selfSM [500k release]
Whole genome sequences - Dragen WGS
Whole genome CNV call files (DRAGEN) [500k release]
Whole genome CNV supplementary files (DRAGEN) [500k release]
Whole genome CRAM files (DRAGEN) [500k release]
Whole genome CYP2D6 genotype calls (DRAGEN) [500k release]
Whole genome STR call files (DRAGEN) [500k release]
Whole genome STR supplementary files (DRAGEN) [500k release]
Whole genome SV call files (DRAGEN) [500k release]
Whole genome SV supplementary files (DRAGEN) [500k release]
Whole genome diagnostics files (DRAGEN) [500k release]
Whole genome supplementary files (DRAGEN) [500k release]
Whole genome variant call files (GVCFs) (DRAGEN) [500k release]
Whole genome variant call files (VCFs) (DRAGEN) [500k release]
Genotypes - Geotyping process and sample
CEL files
Whole genome sequences - Previous WGS releases - WGS pilot studies
BGI WGS CRAM files
Broad WGS CRAM files
They can be found at the two locations below:
/Bulk/GATK and GraphTyper WGS/GraphTyper population level WGS variants, pVCF format [500k release]/
/Bulk/DRAGEN WGS/DRAGEN population level WGS variants, pVCF format [500k release]