500k WGS FAQ

This FAQ addresses questions related to the new data dispensing functionality that allows users to select which elements of the data to dispense. If you would like more information on the new 500k WGS data release, visit the UK Biobank FAQ.

How can I follow the status regarding platform maintenance?

You can subscribe at https://status.dnanexus.com/

Can I “refresh” existing projects to get the 500k WGS data?

Currently the refresh feature is unavailable to ensure that the maximum number of users can get access to the new data as soon as possible via dispensal.

We recommend that users dispense a new project to get the 500k WGS data, and migrate data analysis workflows from existing projects to the new project. We will enable the “refresh” feature again in the future and send notifications out once it is available.

How many projects can I dispense data to?

We recommend that each research application dispense data to only one project to be considerate to other researchers who would like to access the data.

How long will the dispensal process take?

Each dispense request will take about 4-8 hours once your project starts dispensing. However, due to the large number of people interested in 500k WGS data and the size of this data, you might experience a long waiting time for your project dispensal to start due to the queue of requests. Please do not dispense more than one project.

I created a project but it's stuck at "0%".

Your request to dispense data may be queued behind that of other users. The system will service your request in the order it was received. We appreciate your patience during that time. https://dnanexus.gitbook.io/uk-biobank-rap/frequently-asked-questions#i-created-a-project-but-its-stuck-at-0-.

How do I select what data I’d like to dispense?

You will need to create a new project in order to access the data. Note that you will not be able to refresh an already existing project. On the project creation screen, users will now see a new section with the different data types available to dispense. For a faster dispensal time, only select what data you’ll need. You will have the option to dispense the data on project creation or later in the project settings of that new project.

What data should I select for dispensal?

  1. If you are interested in accessing the updated phenotypic, health care and proteomics data, select structured tabular data. This option is selected by default, but can be unselected if the data is not necessary for your project.

  2. If you are interested in accessing the updated imaging data or the population-level WGS pVCF data, select unstructured bulk data files. This option will dispense population-level WGS pVCF data (600,000 files), but not individual-level WGS data such as CRAM or gVCF files. This was decided in order to streamline the new project experience for all users. If your research requires access to the individual-level WGS data (18 million files), return back to the project once the initial dispensing is completed and request an additional dispensing of these data files.

    1. Due to the size of the dispensal we recommend waiting until demand for the WGS has decreased.

How do I dispense the individual-level data?

Due to the size of the data (18 million files), we recommend waiting until the demand for the WGS has reduced. If your research requires access to the individual-level WGS data, you will have to request "Additional Bulk Data Files" after your first request has been completed. You can make the request in your project settings by selecting the “Dispense More Data” button.

Can I create a project without having to dispense data?

You can create an empty project without dispensing data by deselecting both checkboxes on the project creation screen.

What fields & data require the “Dispense More Data” step?

See the below table for details.

Field CategoryField IDField Title

Exome sequences

Exome OQFE variant call files (VCFs)

Exome OQFE variant call file (VCFs) indices

Exome OQFE CRAM files

Exome OQFE CRAM indices

Exome sequences - Previous exome releases

Exome OQFE variant call files (VCFs) - interim 200k release

Exome OQFE variant call file (VCFs) indices - interim 200k release

Exome OQFE CRAM files - interim 200k release

Exome OQFE CRAM indices - interim 200k release

Exome sequences - Alternative exome processing

Exome variant call files (DRAGEN) (VCFs)

Exome variant call file (DRAGEN) (VCFs) indices

Whole genome sequences - GATK and GraphTyper

Whole genome GATK variant call files (VCFs) and indices [500k release]

Whole genome GATK CRAM files and indices [500k release]

BQSR - GATK BaseRecalibrator [500k release]

Concatenated QC Metrics [500k release]

Genotype Concordance [500k release]

Genotype Concordance - Contingency Metrics [500k release]

Genotype Concordance - Detail Metrics [500k release]

Genotype Concordance - Summary Metrics (Picard) [500k release]

Sample Contamination (ReadHaps) [500k release]

Sample Contamination (verifyBamID) - depthSM [500k release]

Sample Contamination (verifyBamID) - selfSM [500k release]

Whole genome sequences - Dragen WGS

Whole genome CNV call files (DRAGEN) [500k release]

Whole genome CNV supplementary files (DRAGEN) [500k release]

Whole genome CRAM files (DRAGEN) [500k release]

Whole genome CYP2D6 genotype calls (DRAGEN) [500k release]

Whole genome STR call files (DRAGEN) [500k release]

Whole genome STR supplementary files (DRAGEN) [500k release]

Whole genome SV call files (DRAGEN) [500k release]

Whole genome SV supplementary files (DRAGEN) [500k release]

Whole genome diagnostics files (DRAGEN) [500k release]

Whole genome supplementary files (DRAGEN) [500k release]

Whole genome variant call files (GVCFs) (DRAGEN) [500k release]

Whole genome variant call files (VCFs) (DRAGEN) [500k release]

Genotypes - Geotyping process and sample

CEL files

Whole genome sequences - Previous WGS releases - WGS pilot studies

BGI WGS CRAM files

Broad WGS CRAM files

Where can I find the population-level files from the 500k WGS release after my dispensal is completed?

They can be found at the two locations below:

  1. /Bulk/GATK and GraphTyper WGS/GraphTyper population level WGS variants, pVCF format [500k release]/

  2. /Bulk/DRAGEN WGS/DRAGEN population level WGS variants, pVCF format [500k release]

Last updated