Working with Bulk Data Files
Learn how to search and analyze UK Biobank bulk data files.
This section provides a detailed breakdown of how to search for an EID in participant-specific files, such as individual VCF and CRAM files. Note that these methods won't work for cohort-wide files, such as PLINK and pVCF files.

Web UI

  1. 1.
    Turn on the filters in your project, by clicking on the filter icon.
  2. 2.
    Use the filter picker to open the Properties filter.
  3. 3.
    Select Any Properties and type "eid" (without the quotes, in lower-case letters) in the Any Key textbox.
  4. 4.
    In the Any Value textbox, enter the 7-digit EID you're trying to locate.
  5. 5.
    Select Apply.
  6. 6.
    To search across all folders, set the search scope to Entire Project.

CLI

Search for an EID as follows, replacing "1234567" with the EID you're trying to find: dx find data --property eid=1234567

Visualizing a CRAM or VCF file using IGV.js

To visualize a CRAM or VCF file in the IGV.js genome browser, follow these steps:
  • Navigate to the project containing the files you want to visualize.
  • Select the Visualize tab.
  • Select the option "IGV.js Genome Browser v2.6.6 (*.bam+bai, *.cram+crai, *vcf.gz+tbi)".
  • Select the files you want to visualize.
    • If you are looking for a specific participant, enter the EID in the Search Project textbox, to quickly locate any CRAM or VCF files related to that participant.
    • For CRAM files, you must select both the CRAM and the associated CRAI file.
    • For VCF files, you must select both the VCF and the associated TBI file.
    • IMPORTANT: Note that IGV.js cannot visualize extremely large pVCF files, such as those provided for the 200k WES, 300k WES or 150k WGS releases. If you want to visualize variants in the 150k WGS cohort, type either "qc_metrics_graphtyper" or "qc_metrics_gatk" into the Search Project textbox and select the resulting pair of *.tab.gz and *.tab.gz.tbi files.
    • Select Launch Viewer.

Analyzing Files with Swiss Army Knife

The Research Analysis Platform provides many different tools for analyzing files. The Swiss Army Knife app is a simple starting point for many common bioinformatics manipulations. Launching the app will instantiate a Linux VM on the cloud with several preinstalled tools, and run a user-provided command. For more information about this app and its possibilities, visit its entry in the Tools Library.
To launch Swiss Army Knife, navigate to your project and click Start Analysis. Select Swiss Army Knife and click Run Selected. Select the Analysis Inputs tab. You can choose between specifying explicit inputs or using a mounted project folder.
  • Explicit Inputs. Use this strategy to analyze files that will be first downloaded on the local disk of the cloud VM.
    • Click Input files. Navigate to a folder of interest (for example, Bulk > Genotype Results > Genotype calls), and tick the files of interest (for example, <Chromosome 21 file>.bed, <Chromosome 21 file>.bim and <Chromosome 21 file>.fam). Click Select as Input.
    • In the Command line textbox, enter a command, referring to files directly with their names (for example, plink --bfile <Chromosome 21 file> --maf 0.1 --out filtered_chr21)
  • Mounted project folder. Use this strategy to analyze files that will be streamed directly without first writing them on disk.
    • In the Command line textbox, enter a command, referring to any files in the project using the prefix /mnt/project (for example, plink --bfile "/mnt/project/Bulk/<Path to chromosome calls>" --maf 0.1 --out filtered_chr21
It is also possible to combine the two strategies. For example, you can provide an R script as explicit input (such as statistics.r), a command to run the script (such as Rscript statistics.r) , and inside the script you can read any project files by opening them from the /mnt/project folder (such as fields <- read.csv("/mnt/project/<Path to project files>", sep="\t"))