Analysis & data types
Learn about using different data types and how to perform different analyses.
Proteomics
Integrative analysis of UKB proteomics data
The UK Biobank Pharma Proteomics Project (UKB-PPP) was launched by a consortium of thirteen pharmaceutical companies in November 2020 with the aim to measure circulating concentrations of plasma proteins in approximately 55,000 UK Biobank participants. This project has now resulted in a first tranche of proteomics data for ~1,500 proteins that will soon be released to the broader UK Biobank research community.
Representatives from UK Biobank, Janssen, Biogen, Olink, Weill Cornell Medicine - Qatar & DNAnexus walk ways to access and analyze the data on the UK Biobank Research Analysis Platform. They discuss the new dataset, the collection and sequencing process and helpful analysis tips to get you started.
Video chapters & slides
00:00 Introduction
06:04 Naomi Allen - Intro to UK Biobank
14:22 Cindy Lawley - Olink Overview
21:01 Chris Whelan - Intro to UK Biobank Pharma Proteomics Project
30:06 Ben Sun - Working with Proteomics Data
42:50 Karsten Suhre - Crossing Proteomics with Other Data Types
52:12 Ben Busby - Using Proteomics as Features for Disease Subtyping
59:25 Q&A
Slides are available here
Analyzing UKB proteomics data
The UK Biobank Pharma Proteomics Project (UKB-PPP) was launched by a consortium of thirteen pharmaceutical companies in November 2020 with the aim to measure circulating concentrations of almost 1,500 plasma proteins in approximately 55,000 UK Biobank participants. The first tranche of proteomics data from this project is now available for the broader UK Biobank research community.
Learn how to analyze the new proteomics data on the UK Biobank Research Analysis Platform (UKB-RAP). Bioinformatics expert Alexandra Lee walks attendees through accessing the data and demonstrates a couple of use cases for working with the data.
Topics covered include:
Introduction to the new proteomics data that is now available on the UKB-RAP
Walking users through how they can access this data on the platform
Demonstrating a couple of example use cases for how users can analyze this new proteomics data including pQTL and differential expression
Video chapters
00:00 Introduction
04:58 Webinar Agenda
06:42 Helpful Resources
07:18 Learning Objectives
08:07 Introduction to UK Biobank Proteomics Data
09:13 Olink Technology
10:40 How to Access Proteomics Data on UKB-RAP
11:44 Get Phenotype Data & Protein Expression Data from the Cohort Browser
13:07 Get Protein Expression Data from Table Exporter
14:32 Get Protein Expression Data using dx extract_dataset
15:14 Generating List of Field Names for All Proteins
16:48 Sample Protein Expression Data
17:14 Protein Expression Datasets in UKB-RAP
18:57 Metadata Available in the Bulk Folder
19:44 Sample Code Used in Webinar Available on Github
20:08 Adding Analysis Scripts to UKB-RAP
20:38 Differential Expression Analysis Introduction
21:44 Approach for Differential Expression Analysis
22:40 Collect Input Data for Differential Expression Analysis
23:14 QC Input Data for Differential Expression Analysis
23:39 Perform Differential Expression Analysis Using Limma
24:23 Run Differential Expression Analysis Using JupyterLab
25:19 Resource to Re-Run Sample Differential Expression Analysis
25:59 pQTL Analysis Introduction
26:13 Performing GWAS to Identify SNPs to Compare pQTL SNPs
28:05 Introduction to REGENIE
29:46 Approach for pQTL Analysis
30:44 Matched Genotype & Protein Expression Data
31:02 QC pQTL Analysis Input Data
31:42 Run GWAS using REGENIE
32:50 Results
33:26 pQTL Resources
34:17 Conclusion & Helpful Resources
Metabolomics
Analysis of accelerometer and metabolomic data in the UKB
The rich UK Biobank dataset has extensive accelerometry and metabolic data on 100,000 participants. This lifestyle and health data matched with the large amount of genetic data presents an exciting opportunity to perform more complex analysis combining these disparate data types.
Learn how to analyze accelerometer and metabolomic data within the UK Biobank Dataset from expert speakers Rosemary Walmsley, Researcher in Reproducible Machine Learning at the University of Oxford, and Dr. Karsten Suhre, Professor of Physiology and Biophysics, Director Bioinformatics Core at Weill Cornell Medicine - Qatar. First, Rosemary discusses accelerometer data in the UK Biobank dataset, strategies for analysis, and points to repositories to get researchers started. Then Karsten walks through analyzing metabolic data and the necessary scripts and notebooks used in his workflows.
Images
Image analysis on UKB-RAP
Learn about the data formats available to start running your image analysis on the Research Analysis Platform. Ondrej Klempir, Sr. Community Engagement Scientist at DNAnexus, walks you through how to access the data, reviews basic image (pre)processing steps & shows examples of image visualization on the cloud.
Video chapters
0:00 Introduction
1:28 Learning Objectives
4:58 The World of Biomedical Informatics and UKB RAP data
6:44 Image Processing vs. Bioinformatics
8:55 Running ML with GPU on UKB RAP - JupyterLab
9:34 RAP Library of Scientific/Image Analysis Tools
10:36 ML tools
11:28 Analyze Image Derived Phenotypes via Cohort Browser
13:43 Quick facts about DICOM
15:25 Related filetype: NIFTI - stores 3D voxel data
16:31 Visualization "May Be Essential"
17:30 Visualization Options
19:27 Widget based interactive visualization
20:28 Papaya viewer
22:38 What steps people do in (neuro)imaging?
24:11 Pipeline development
27:38 Cloud-based Analysis
30:10 dicom2nifti conversion
32:14 Running FSL via docker image
34:57 FSL: brain extraction
36:39 Pydicom - Python package for working with DICOM files
38:14 Nibabel - Python package for working with NIFTI files
39:08 Tip: Create a WDL workflow/ pipeline
40:46 Q&A
Advanced image management
Interested in analyzing the extensive imaging data available on the UK Biobank Research Analysis Platform (UKB RAP)? Do you have your own data to analyze and manage? Experts from DNAnexus (UKB RAP), the FSL group at Oxford and MathWorks (makers of MATLAB and Simulink) will walk you through how to access or transfer the data, as well as label and process bulk data.
Agenda
Introduction and Short Discussion about Integrating Image data with Genomic Data (Ben Busby)
Advanced Imaging and the UKB RAP (Fidel Alfaro-Almagro)
Advanced Imaging on the DNAnexus Commercial Platform (Ondrej Klempir)
Image analysis on DNAnexus and RAP with Matlab (Rob Holt & Renee Qian)
Budgetary considerations around large scale data integration (Asha Collins)
Dementia and multimorbidity in late-life disease
Population-scale datasets offer researchers the ability to derive deeper insights into the genetic mechanisms of late-life diseases like Dementia, but understanding how to tackle analyzing the sheer amount of data remains a challenge.
Our panel of research experts review a broad range of multimodal data science approaches that researchers can use to explore the UK Biobank dataset for new discovery. They discuss linking different data types including proteomics, imaging, wearables & genetics and also describe tools powering their analysis and how to access the tools.
GWAS
GWAS on UKB-RAP using regenie
An introduction to running GWAS using Regenie on the Research Analysis Platform. DNAnexus experts demonstrate how to run the analysis using a diabetes phenotype on the 300k data.
Topics include:
Genomic file preprocessing and filtering using the Swiss Army Knife App
Building phenotype/covariate files for cohorts using Spark JupyterLab
Running regenie using the Swiss Army Knife App
You can also learn more about regenie by viewing this paper referenced in the presentation
Video chapters
00:00 Introduction
02:00 Review of GWAS
05:43 What is Regenie?
12:54 Creating Cohorts Using Cohort Browser
14:03 Building Pheno and Covariate File with Spark JupyterLab
21:57 Using the Swiss Army Knife App
27:08 Preprocessing Genotype Data from Bulk Files
30:12 Running regenie (with preprocessing steps)
48:09 Downstream Analysis of GWAS Results
Visualizing and annotating GWAS results
So you've run a GWAS analysis on RAP and obtained p-values. Now what? This session details visualizing GWAS results using the Research Analysis Platform. The webinar covers interactive exploration of GWAS results using the LocusZoom App and then dives deeper into variant annotation and visualization using both R and Python Jupyter Notebooks.
End to end target discovery with GWAS and PheWAS
Learn how to create and run your own target discovery pipeline with GWAS and PheWAS. This webinar shows an end-to-end target discovery pipeline run entirely on UKB RAP that you will be able to replicate for your own research projects on the platform.
Learn how to:
Extract data using dx extract_dataset -Make sample and variant QC
Run GWAS analysis using REGENIE app
Perform LD clumping using PLINK to cluster significant GWAS variants
Run PheWAS analysis
As this is an advanced webinar, we expect that audience has experience with:
Performing analysis on UKB RAP -Running JupyterLab on UKB RAP
Compiling and running WDL workflow
Running SAK using GUI or CLI
Video chapters
00:00 Introduction to Genomics-based Target Discovery
06:20 Step 1: Creating Cohorts
08:22 Step 2: Extract sample phenotype data
10:47 Step 3: Lift over array data
13:46 Step 4: QC Sample sample phenotype data
16:20 Step 5: QC array and impute data
19:49 Step 6: GWAS
24:56 Step 7: LD Clumping
29:16 Step 8: PheWAS
35:31 Conclusion
38:12 Q&A
Code is available on github.
Variant annotation
Flexible variant annotation
Learn how to perform large-scale variant annotation with tools that are easy to run in Jupyter Notebooks. This session covers strategies for implementing these tools and sample code for you to get up and running on your projects.
Large-scale analysis
Efficiently analyzing large-scale WGS data
Learn the tips and tricks to faster and more cost effective large-scale WGS analysis. This webinar covers how to set up reproducible analysis pipelines, what apps & tools you should incorporate and how to build your own custom workflows.
Last updated
Was this helpful?
