UK Biobank Data on the Research Analysis Platform
Learn how UK Biobank data is organized and named on the Research Analysis Platform. Learn how to find and access bulk files and tabular data.

Overview

This recorded webinar provides an in-depth overview of the UK Biobank dataset and its component elements.

Topics Covered

Use these links to skip directly to coverage of specific topics covered in the overview webinar:

How the Data is Organized

EIDs and Data-fields

UK Biobank contains data collected from approximately 500,000 volunteer participants. Within an access application, each participant is identified by a unique, 7-digit number, or EID. An EID is typically a number between 1,000,000 and 6,000,000.
Note that each access application receives a different set of randomized EIDs, unique to the application. This EID randomization process - also known as "pseudonymization" - is managed by UK Biobank and is automatically applied to the data by the Research Analysis Platform. (For a given access application, data on the Research Analysis Platform contain the same EIDs as data directly downloaded from UK Biobank's website.)
When you create a project on the Research Analysis Platform, the system contacts UK Biobank to get your application's EIDs, then uses them to pseudonymize your dataset. The pseudonymized EIDs are used, for example, to populate the "eid" column in the database, to name per-participant files, to generate the EID-specific content of FAM files for genotyping fields, and to adjust pVCF headers.
All data in the UK Biobank resource are organized into data-fields. Your access application is approved for a precise subset of those data-fields.
The UK Biobank Showcase provides an in-depth look into the types of data stored in the UK Biobank, how it's collected, and how it's organized. You can find more information about data-fields, broken down by type, on the UK Biobank Field Listing page.

Project Data

When you create a project on the UK Biobank Research Analysis Platform, the system dispenses the data corresponding to the data-fields listed in the access application associated with the project.
  • Bulk data-fields are dispensed as files. See Bulk Data Files below for more.
  • Tabular data-fields and linked health data are placed into a Spark SQL database and an associated dataset. See Tabular Data below for more.
The dispensed data correspond to a specific data release version. See Data Release Versions below for more.

Bulk Data Files

Overview

Within your project, the Bulk folder contains files associated with UK Biobank data-fields of type "bulk." These are particularly large and/or complex items, such as genotyping array data, genome sequencing data, imaging data, and fitness data.

Folder Conventions

The Bulk folder uses the following subfolder structure:
  • There is a subfolder for each UK Biobank bulk field category. For example, whole genome CRAM files are stored in the subfolder named Whole genome sequences. These categories are defined by the UK Biobank, specifically for the Research Analysis Platform.
  • Within each category subfolder, there is a subfolder for each bulk field (or group of related fields). For example, a subfolder named Whole genome CRAM files would contain files for that field.
  • Within each field folder, files related to an individual participant are grouped in subfolders named using the prefix of the participant's EID. Typically these are two-digit names, ranging between "10" and "60."
In certain cases, the system may dispense related files of different types into the same folder, to improve usability. For example, whole genome CRAM indices files (field ID #23194) would be dispensed into the same folder as whole genome CRAM files (field ID #23193), rather than into their own folder. Similarly, the folder /Bulk/Brain MRI/dMRI includes data from fields #20218 ("Multiband diffusion brain images - DICOM") and #20250 ("Multiband diffusion brain images - NIFTI").
For a full list of folders, see Bulk Fields in the Latest Release below.

Filename Conventions

The Research Analysis Platform uses the following naming conventions for bulk data files:
  • Files that contain data on an individual participant are named in this fashion: <EID>_<FIELD-ID>_<INSTANCE-ID>_<ARRAY-ID>.<SUFFIX> For example, whole genome CRAM files (field ID #23193) are named like so: <EID>_23193_0_0.cram Some exceptions apply to this rule. When a field is meant as a companion to a main field, such as a CRAI index accompanying a CRAM file, or a TBI index accompanying a VCF file, the system uses the prefix of the main field. For example, whole genome CRAM indices (field ID #23194) are named like so: <EID>_23193_0_0.cram.crai
  • Files that contain data on a cohort of participants (such as PLINK, BGEN or pVCF files) are named in this fashion: ukb<FIELD-ID>_c<CHROM>_b<BLOCK>_v<VERSION>.<SUFFIX> Where <CHROM> represents the chromosome (such as "1", "2" or "X"), <BLOCK> represents an index (starting from "0") for datasets that have been split into multiple pieces, and <VERSION> represents a dataset version assigned by UK Biobank.

Pseudonymization of pVCF headers

The Research Analysis Platform pseudonymizes the content of pVCF headers for the following fields:
Field id
Description
23146
Population level exome OQFE variants, pVCF format - interim 300k release
23148
Population level exome OQFE variants, pVCF format - interim 450k release
23156
Population level exome OQFE variants, pVCF format - interim 200k release
23195
Whole genome GraphTyper joint call pVCF (deprecated)
23196
Whole genome GATK joint call pVCF
23352
Whole genome GraphTyper joint call pVCF
23353
Whole genome GraphTyper SV data
When accessing pVCF files in these fields, the header is pseudonymized. The sample ids in the header are EIDs that correspond to the access application. If a participant has withdrawn, the corresponding sample id is marked as "W000001" (for the first encountered sample that belongs to a withdrawn participant), "W000002" (for the second encountered sample that belongs to a withdrawn participant), etc. Overall, the non-withdrawn EIDs in the pVCF header are expected to match the set of application EIDs used elsewhere, such as in the "eid" column of the pheno data and the FAM files of genotyping array fields. This allows you to conduct analyses that combine phenotypic, genotyping array, and whole exome pVCF (or whole genome pVCF) data without having to translate any EIDs.

File Properties

The Research Analysis Platform supports file properties. These are key-value pairs of strings that are attached to files. When bulk files are dispensed to a project, the Platform adds some initial file properties, as below:
Key
Value
Which files have this property?
eid
The corresponding participant EID
Files that correspond to a single participant.
field_id
The corresponding data-field id.
All files.
instance_id
The corresponding instance id (typically a visit to an assessment centre).
Files that correspond to data-fields with multiple instances.
array_id
The corresponding array index.
Files that correspond to array data-fields.
resource_id
The corresponding UK Biobank resource id.
Auxiliary files to a resource on the UK Biobank Showcase.
These properties are searchable both via the Web UI and CLI. Refer to the following section for an example.

Working with Bulk Data Files

See these instructions for in-depth guidance on searching and analyzing UK Biobank bulk data files.

Tabular Data

Database and Dataset

Tabular data-fields and linked health data are stored in a SQL database. This database is based on Spark SQL technology, a modern and more scalable technology than that used by classic relational database systems (RDBMS). This database is located on the root folder of your project, and is typically named in accord with this pattern:
app<APPLICATION-ID>_<CREATION-TIME> (e.g. app68444_20210727225440)
In the same folder, there is an associated dataset named after the database but with the .dataset suffix appended. This dataset is a higher-level construct, using technology that is unique to the Research Analysis Platform. It combines the low-level SQL columns with field-level metadata from the UK Biobank Showcase, and presents a collection of rich fields that can be explored visually in the Cohort Browser, or programmatically in JupyterLab. For general information on the underlying technology, see the DNAnexus Platform documentation overview of Datasets.

Browsing Dataset Fields Using the Cohort Browser

To launch the Cohort Browser, navigate to the project's root folder and click the dataset (or tick the dataset and click Explore Data).
To explore what fields are available in your dataset, click Add Tile. The system will present all available fields, organized in a folder structure inspired by the UK Biobank Showcase. You can search this list by folder name, field name, or field value (for categorical fields).
Click a field to see more information. The Data Field Details pane contains the field title (such as Type of accommodation lived in | Instance 0), and the Link label contains the field name (such as p670_i0). These field names and titles can be used to retrieve data programmatically using JupyterLab.
Using the Cohort Browser features, including the "Export sample IDs" option or the "Download" option in the Data Preview tab, will not lead to any charges.
The Cohort Browser can be used to further explore the data, create charts, or define and compare cohorts. Refer to the following DNAnexus Platform documentation entries:
If your access application has been approved for field #23146 and/or #23148, the Cohort Browser will automatically include a "GENOMICS" section, where you can browse variants in your cohort. The data backing the section depends on the dataset version dispensed: 23148 for version 7 and later, 23146 for previous versions. These variants are sourced from the pVCF files of field #23146, after annotating with snpEff GRCh38.92, dbSNP b154 and gnomAD r2.1.1. You can also use these variants to apply genomic filters. Refer to the following DNAnexus Platform documentation entries:

Analyzing Tabular Data as a File

If you are used to working with tabular data as a TSV file - a format used by UK Biobank in distributing tabular data directly via its website - see Accessing Phenotypic Data as a File.

Analyzing Tabular Data Using Spark in JupyterLab

Apache Spark is a modern, scalable framework for parallel processing of big data. Follow these instructions to analyze UK Biobank tabular data using Spark in JupyterLab.

Data Release Versions

The Research Analysis Platform holds a copy of all UK Biobank data. All projects are created using this copy.
As UK Biobank updates the data on its end, the copy held by the Research Analysis Platform is periodically updated to reflect these upstream updates. Whenever this happens, this change will be indicated by a new data release version.
Data release version
Tabular participant data
Bulk Data
Released on the Research Analysis Platform
v8.1
Same as v6.1
This release includes all fields from previous releases, with these updates:
As part of this release, for fields 23191, 23193, 23194, 23197, 23346, 23348, 23349, 23350, and 23351, data were added for an additional 50k participants, bringing the total for these fields to 200k participants.
As part of this release, fields 23370, 23371, 23372, 23373, 23376, 23377, 23378, 23379, 23380, 23381, 23382, 23383, 23384, fields were updated to contain data for an additional 200k participants (over and above the 200k participants of fields 23191 etc.) These fields are reserved for the WGS consortium to house data prior to becoming public.
Nov 15 2021
v7.1
Same as v6.1
This release includes all fields from previous releases, plus these additional fields:
23148, 23149, 23150
Note: Starting from this release, FAM files for field 23145 and SAMPLE files for field 23147 contain gender info.
Note: As part of this release, for fields 23141, 23142, 23143, and 23144, data were added for an additional 150k participants, and data were updated for 44 existing participants.
Oct 29 2021
v6.1
All fields from previous releases
Sept 22 2021
v5.0
Same as v4.0
This release includes all fields from previous releases, plus these additional fields:
Sept 3 2021
v4.0
This release includes all fields from previous releases, plus these additional fields:
23352, 23353
July 27 2021
v3.0
This release includes all fields from v1.0 and v2.0, plus these additional fields, for participants for which there were data, as of March 2021:
June 4 2021
v2.0
This release includes all fields from v1.0, as well as these additional fields:
Jan 28 2021
v1.0
This release includes only the following fields:
Nov 19 2020

About Showcase Releases

Each Showcase release includes both newly released data, and all Showcase data included in previous releases. So the June 2021 Showcase release, for example, includes all data released in all previous Showcase releases, as detailed on the UK Biobank website.

Bulk Fields in the Latest Release

The following table lists all the bulk fields, along with their folders and suffixes, as of the latest release (v8.1):
Field ID
Data Type
Subfolder
Suffixes
90004
Acceleration intensity time-series
/Bulk/Activity/Epoch/
.csv
90001
Acceleration data - cwa format
/Bulk/Activity/Raw/
.cwa
20266
Arterial spin labelling brain images - DICOM
/Bulk/Brain MRI/ASL/
.zip
20218
Multiband diffusion brain images - DICOM
/Bulk/Brain MRI/dMRI/
.zip
20250
Multiband diffusion brain images - NIFTI
/Bulk/Brain MRI/dMRI/
.zip
20225
Functional brain images - resting - DICOM
/Bulk/Brain MRI/rfMRI/
.zip
20227
Functional brain images - resting - NIFTI
/Bulk/Brain MRI/rfMRI/
.zip
25750
rfMRI full correlation matrix, dimension 25
/Bulk/Brain MRI/rfMRI/
.txt
25751
rfMRI full correlation matrix, dimension 100
/Bulk/Brain MRI/rfMRI/
.txt
25752
rfMRI partial correlation matrix, dimension 25
/Bulk/Brain MRI/rfMRI/
.txt
25753
rfMRI partial correlation matrix, dimension 100
/Bulk/Brain MRI/rfMRI/
.txt
25754
rfMRI component amplitudes, dimension 25
/Bulk/Brain MRI/rfMRI/
.txt
25755
rfMRI component amplitudes, dimension 100
/Bulk/Brain MRI/rfMRI/
.txt
20215
Scout images for brain scans - DICOM
/Bulk/Brain MRI/Scout/
.zip
20224
Phoenix - DICOM
/Bulk/Brain MRI/Scout/
.zip
20219
Susceptibility weighted brain images - DICOM
/Bulk/Brain MRI/SWI/
.zip
20251
Susceptibility weighted brain images - NIFTI
/Bulk/Brain MRI/SWI/
.zip
20216
T1 structural brain images - DICOM
/Bulk/Brain MRI/T1/
.zip
20252
T1 structural brain images - NIFTI
/Bulk/Brain MRI/T1/
.zip
20263
T1 surface model files and additional structural segmentations
/Bulk/Brain MRI/T1/
.zip
20220
T2 FLAIR structural brain images - DICOM
/Bulk/Brain MRI/T2 FLAIR/
.zip
20221
T2/PD brain images - DICOM
/Bulk/Brain MRI/T2 FLAIR/
.zip
20253
T2 FLAIR structural brain images - NIFTI
/Bulk/Brain MRI/T2 FLAIR/
.zip
20217
Functional brain images - task - DICOM
/Bulk/Brain MRI/tfMRI/
.zip
20249
Functional brain images - task - NIFTI
/Bulk/Brain MRI/tfMRI/
.zip
25747
Eprime advisor file
/Bulk/Brain MRI/tfMRI/
.adv
25748
Eprime txt file
/Bulk/Brain MRI/tfMRI/
.txt
25749
Eprime ed2 file
/Bulk/Brain MRI/tfMRI/
.ed2
20222
Carotid artery ultrasound image (left)
/Bulk/Carotid Ultrasound/Carotid artery (left)/
.zip
20223
Carotid artery ultrasound image (right)
/Bulk/Carotid Ultrasound/Carotid artery (right)/
.zip
20241
Raw carotid device data
/Bulk/Carotid Ultrasound/Raw data/
.zip
20226
Carotid artery ultrasound report
/Bulk/Carotid Ultrasound/Report/
.zip
6025
Fitness test results, including ECG data
/Bulk/Electrocardiogram/Fitness/
.xml
20205
ECG datasets
/Bulk/Electrocardiogram/Resting/
.xml
23143
Exome OQFE CRAM files
/Bulk/Exome sequences/Exome OQFE CRAM files/
.cram
23144
Exome OQFE CRAM indices
/Bulk/Exome sequences/Exome OQFE CRAM files/
.cram.crai
23153
Exome OQFE CRAM files
/Bulk/Previous exome releases/Exome OQFE CRAM files - interim 200k release/
.cram
23154
Exome OQFE CRAM indices
/Bulk/Previous exome releases/Exome OQFE CRAM files - interim 200k release/
.cram.crai
23141
Exome OQFE variant call files (VCFs)
/Bulk/Exome sequences/Exome OQFE variant call files (VCFs)/
.g.vcf.gz
23142
Exome OQFE variant call file (VCF) indices
/Bulk/Exome sequences/Exome OQFE variant call files (VCFs)/
.g.vcf.gz.tbi
23151
Exome OQFE variant call files (VCFs)
/Bulk/Previous exome releases/Exome OQFE variant call files (VCFs) - interim 200k release/
.g.vcf.gz
23152
Exome OQFE variant call file (VCF) indices
/Bulk/Previous exome releases/Exome OQFE variant call files (VCFs) - interim 200k release/
.g.vcf.gz.tbi
23147
Population level exome OQFE variants, BGEN format - interim 300k release
/Bulk/Previous exome releases/Population level exome OQFE variants, BGEN format - interim 300k release/
.bgen, .bgi, .sample
23150
Population level exome OQFE variants, BGEN format - interim 450k release
/Bulk/Exome sequences/Population level exome OQFE variants, BGEN format - interim 450k release/
.bgen,
.bgi,
.sample
23155
Population level exome OQFE variants, PLINK format
/Bulk/Previous exome releases/Population level exome OQFE variants, PLINK format - interim 200k release/
.bed, .bim, .fam
23145
Population level exome OQFE variants, PLINK format - interim 300k release
/Bulk/Previous exome releases/Population level exome OQFE variants, PLINK format - interim 300k release/
.bed, .bim, .fam, .masks, .txt, .txt.gz
23149
Population level exome OQFE variants, PLINK format - interim 450k release
/Bulk/Exome sequences/Population level exome OQFE variants, PLINK format - interim 450k release/
.bed, .bim, .fam, .masks, .txt, .txt.gz
23156
Population level exome OQFE variants, pVCF format
/Bulk/Previous exome releases/Population level exome OQFE variants, pVCF format - interim 200k release/
.vcf.gz, .vcf.gz.tbi
23146
Population level exome OQFE variants, pVCF format - interim 300k release
/Bulk/Previous exome releases/Population level exome OQFE variants, pVCF format - interim 300k release/
.vcf.gz, .vcf.gz.tbi
23148
Population level exome OQFE variants, pVCF format - interim 450k release
/Bulk/Exome sequences/Population level exome OQFE variants, pVCF format - interim 450k release/
.vcf.gz, .vcf.gz.tbi
22002
CEL files
/Bulk/Genotype Results/Genotype CEL files/
.cel
22418
Genotype calls
/Bulk/Genotype Results/Genotype calls/
.bed, .bim, .dat, .fam, .txt
22418
Genotype calls
/Bulk/Genotype Results/Genotype calls/posteriors/
.batch, .bim, .bin
22419
Genotype confidences
/Bulk/Genotype Results/Genotype confidences/
.txt
22431
Genotype copy number variants, log2ratios
/Bulk/Genotype Results/Genotype copy number variants, log2ratios
.txt
22437
Genotype copy number variants B-allele frequencies
/Bulk/Genotype Results/Genotype copy number variants B-allele frequencies/
.txt
22430
Genotype intensities
/Bulk/Genotype Results/Genotype intensities/
.bin
20210
Aortic distensibilty images - DICOM
/Bulk/Heart MRI/Aortic distensibility/
.zip
20213
Blood flow images - DICOM
/Bulk/Heart MRI/Blood flow/
.zip
20211
Cine tagging images - DICOM
/Bulk/Heart MRI/CINE tagging/
.zip
20212
Left ventricular outflow tract images - DICOM
/Bulk/Heart MRI/Left ventricular outflow tract/
.zip
20208
Long axis heart images - DICOM
/Bulk/Heart MRI/Long axis/
.zip
20207
Scout images for heart MRI - DICOM
/Bulk/Heart MRI/Scout/
.zip
20214
Experimental shMOLLI sequence images - DICOM
/Bulk/Heart MRI/ShMOLLI/
.zip
20209
Short axis heart images - DICOM
/Bulk/Heart MRI/Short axis/
.zip
22438
Haplotypes (WTCHG)
/Bulk/Imputation/Haplotypes/
.bgen, .bgi
22828
Imputation from genotype (WTCHG)
/Bulk/Imputation/UKB imputation from genotype/
.bgen, .bgi, .sample, .txt
20264
Kidney Imaging - gradient echo - DICOM
/Bulk/Kidney MRI/Gradient echo/
.zip
20243
Kidney Imaging - T1 ShMOLLI - DICOM
/Bulk/Kidney MRI/ShMOLLI/
.zip
20265
Kidney Imaging - T2 haste - DICOM
/Bulk/Kidney MRI/T2 HASTE/
.zip
20267
Kidney imaging - T2 Vibe - DICOM
/Bulk/Kidney MRI/T2 VIBE/
.zip
20203
Liver images - gradient echo - DICOM
/Bulk/Liver MRI/Gradient echo/
.zip
20254
Liver images - IDEAL protocol - DICOM
/Bulk/Liver MRI/IDEAL/
.zip
20204
Liver Imaging - T1 ShMoLLI - DICOM
/Bulk/Liver MRI/ShMOLLI/
.zip
20260
Pancreas Images - gradient echo - DICOM
/Bulk/Pancreas MRI/Gradient echo/
.zip
20202
Pancreatic fat - DICOM
/Bulk/Pancreas MRI/Pancreatic fat/
.zip
20206
Measurements of pancreas volume - DICOM
/Bulk/Pancreas MRI/Pancreatic volume/
.zip
20259
Pancreas Images - ShMoLLI - DICOM
/Bulk/Pancreas MRI/ShMOLLI/
.zip
21011
FDA data file (left)
/Bulk/Retinal Optical Coherence Tomography/FDA (left)/
.fda
21013
FDA data file (right)
/Bulk/Retinal Optical Coherence Tomography/FDA (right)/
.fda
21012
FDS data file (left)
/Bulk/Retinal Optical Coherence Tomography/FDS (left)/
.fds
21014
FDS data file (right)
/Bulk/Retinal Optical Coherence Tomography/FDS (right)/
.fds
21015
Fundus retinal eye image (left)
/Bulk/Retinal Optical Coherence Tomography/Fundus (left)/
.png
21016
Fundus retinal eye image (right)
/Bulk/Retinal Optical Coherence Tomography/Fundus (right)/
.png
21017
OCT image slices (left)
/Bulk/Retinal Optical Coherence Tomography/Slices (left)/
.zip
21018
OCT image slices (right)
/Bulk/Retinal Optical Coherence Tomography/Slices (right)/
.zip
20158
DXA images
/Bulk/Whole Body DXA/DXA/
.zip
20201
Dixon technique for internal fat - DICOM
/Bulk/Whole Body MRI/Dixon/
.zip
23181
BGI WGS CRAM files
/Bulk/Whole genome sequences/BGI WGS CRAM files/
.cram
23182
BGI WGS CRAM indices
/Bulk/Whole genome sequences/BGI WGS CRAM files/
.cram.crai
23197
BQSR - GATK BaseRecalibrator
/Bulk/Whole genome sequences/BQSR - GATK BaseRecalibrator/
.recal_table
23183
Broad WGS CRAM files
/Bulk/Whole genome sequences/Broad WGS CRAM files/
.cram
23184
Broad WGS CRAM indices
/Bulk/Whole genome sequences/Broad WGS CRAM files/
.cram.crai
23347
Concatenated QC Metrics
/Bulk/Whole genome sequences/Concatenated QC Metrics/
.qaqc_metrics
23199
Genotype Concordance - Contingency Metrics
/Bulk/Whole genome sequences/Genotype Concordance - Contingency Metrics/
.genotype_concordance_contingency_metrics
23319
Genotype Concordance - Detail Metrics
/Bulk/Whole genome sequences/Genotype Concordance - Detail Metrics/
.genotype_concordance_detail_metrics
23345
Genotype Concordance - Summary Metrics (Picard)/
/Bulk/Whole genome sequences/Genotype Concordance - Summary Metrics (Picard)/
.genotype_concordance_summary_metrics
23346
Genotype Concordance
/Bulk/Whole genome sequences/Genotype Concordance/
.nrd.stats
23350
Manta-called scored structural variant and indel candidates
/Bulk/Whole genome sequences/Manta-called scored structural variant and indel candidates/
.diploidSV.vcf.gz, .diploidSV.vcf.gz.tbi
23351
Manta-called unscored structural variant and indel candidates
/Bulk/Whole genome sequences/Manta-called unscored structural variant and indel candidates/
.candidateSV.vcf.gz, .candidateSV.vcf.gz.tbi
23198
Sample Contamination (ReadHaps)
/Bulk/Whole genome sequences/Sample Contamination (ReadHaps)/
.contamination
23348
Sample Contamination (verifyBamID) - depthSM
/Bulk/Whole genome sequences/Sample Contamination (verifyBamID) - depthSM/
.verifyBamID.depthSM
23349
Sample Contamination (verifyBamID) - selfSM
/Bulk/Whole genome sequences/Sample Contamination (verifyBamID) - selfSM/
.verifyBamID.selfSM
23193
Whole genome CRAM files
/Bulk/Whole genome sequences/Whole genome CRAM files/
.cram
23194
Whole genome CRAM files
/Bulk/Whole genome sequences/Whole genome CRAM files/
.cram.crai
23196
Whole genome GATK joint call pVCF
/Bulk/Whole genome sequences/Whole genome GATK joint call pVCF/
.vcf.gz, .vcf.gz.tbi, qc_metrics_gatk_variant_qc.tab.gz, qc_metrics_gatk_variant_qc.tab.gz.tbi, qc_metrics_GATK_version.txt, qc_metrics_README.pdf
23195
Whole genome GraphTyper joint call pVCF (deprecated)
/Bulk/Whole genome sequences/Whole genome GraphTyper joint call pVCF (deprecated)/
.vcf.gz, .vcf.gz.tbi,
qc_metrics_graphtyper_variant_qc.tab.gz,
qc_metrics_graphtyper_variant_qc.tab.gz.tbi,
qc_metrics_README.pdf
23352
Whole genome GraphTyper joint call pVCF
/Bulk/Whole genome sequences/Whole genome GraphTyper joint call pVCF/
.vcf.gz, .vcf.gz.tbi,
qc_metrics_graphtyper_v2.7.1_qc.tab.gz,
qc_metrics_graphtyper_v2.7.1_qc.tab.gz.tbi,
qc_metrics_graphtyper_v2.7.1_README.pdf
23353
Whole genome GraphTyper SV data
/Bulk/Whole genome sequences/Whole genome GraphTyper SV data/
.vcf.gz, .vcf.gz.tbi
23191
Whole genome variant call files (VCFs)
/Bulk/Whole genome sequences/Whole genome variant call files (VCFs)/
.g.vcf.gz
23192
Whole genome variant call files (VCFs)
/Bulk/Whole genome sequences/Whole genome variant call files (VCFs)/
.g.vcf.gz.tbi
23370
Whole genome variant call files (VCFs)
/Bulk/Whole genome sequences/Whole genome variant call files (VCFs) (reserved)/
.g.vcf.gz
23371
Whole genome variant call files (VCFs)
/Bulk/Whole genome sequences/Whole genome variant call files (VCFs) (reserved)/
.g.vcf.gz.tbi
23372
Whole genome CRAM files
/Bulk/Whole genome sequences/Whole genome CRAM files (reserved)/
.cram
23373
Whole genome CRAM files
/Bulk/Whole genome sequences/Whole genome CRAM files (reserved)/
.cram.crai
23376
BQSR - GATK BaseRecalibrator
/Bulk/Whole genome sequences/BQSR - GATK BaseRecalibrator (reserved)/
.recal_table
23377
Sample Contamination (ReadHaps)
/Bulk/Whole genome sequences/Sample Contamination (ReadHaps) (reserved)/
.contamination
23378
Genotype Concordance - Contingency Metrics
/Bulk/Whole genome sequences/Genotype Concordance - Contingency Metrics (reserved)/
.genotype_concordance_contingency_metrics
23379
Genotype Concordance - Detail Metrics
/Bulk/Whole genome sequences/Genotype Concordance - Detail Metrics (reserved)/
.genotype_concordance_detail_metrics
23380
Genotype Concordance - Summary Metrics (Picard)
/Bulk/Whole genome sequences/Genotype Concordance - Summary Metrics (Picard) (reserved)/
.genotype_concordance_summary_metrics
23381
Genotype Concordance
/Bulk/Whole genome sequences/Genotype Concordance (reserved)/
.nrd.stats
23382
Concatenated QC Metrics
/Bulk/Whole genome sequences/
Concatenated QC Metrics (reserved)/
.qaqc_metrics
23383
Sample Contamination (verifyBamID) - depthSM
/Bulk/Whole genome sequences/Sample Contamination (verifyBamID) - depthSM (reserved)/
.verifyBamID.depthSM
23384
Sample Contamination (verifyBamID) - selfSM
/Bulk/Whole genome sequences/Sample Contamination (verifyBamID) - selfSM (reserved)/
.verifyBamID.selfSM
23385
Manta-called scored structural variant and indel candidates (Vanguard)
/Bulk/Whole genome sequences/Manta-called scored structural variant and indel candidates (Vanguard)/
.diploidSV.vcf.gz.tbi, .diploidSV.vcf.gz
23386
Manta-called unscored structural variant and indel candidates (Vanguard)
/Bulk/Whole genome sequences/Manta-called unscored structural variant and indel candidates (Vanguard)/
.candidateSV.vcf.gz.tbi, .candidateSV.vcf.gz
Last modified 20d ago