Analysis & data types

Learn about using different data types and how to perform different analyses.

Proteomics

Integrative analysis of UKB proteomics data

The UK Biobank Pharma Proteomics Project (UKB-PPP) was launched by a consortium of thirteen pharmaceutical companies in November 2020 with the aim to measure circulating concentrations of plasma proteins in approximately 55,000 UK Biobank participants. This project has now resulted in a first tranche of proteomics data for ~1,500 proteins that will soon be released to the broader UK Biobank research community.

Representatives from UK Biobank, Janssen, Biogen, Olink, Weill Cornell Medicine - Qatar & DNAnexus walk ways to access and analyze the data on the UK Biobank Research Analysis Platform. They discuss the new dataset, the collection and sequencing process and helpful analysis tips to get you started.

Video chapters & slides
  • 00:00 Introduction

  • 06:04 Naomi Allen - Intro to UK Biobank

  • 14:22 Cindy Lawley - Olink Overview

  • 21:01 Chris Whelan - Intro to UK Biobank Pharma Proteomics Project

  • 30:06 Ben Sun - Working with Proteomics Data

  • 42:50 Karsten Suhre - Crossing Proteomics with Other Data Types

  • 52:12 Ben Busby - Using Proteomics as Features for Disease Subtyping

  • 59:25 Q&A

Slides are available here

Analyzing UKB proteomics data

The UK Biobank Pharma Proteomics Project (UKB-PPP) was launched by a consortium of thirteen pharmaceutical companies in November 2020 with the aim to measure circulating concentrations of almost 1,500 plasma proteins in approximately 55,000 UK Biobank participants. The first tranche of proteomics data from this project is now available for the broader UK Biobank research community.

Learn how to analyze the new proteomics data on the UK Biobank Research Analysis Platform (UKB-RAP). Bioinformatics expert Alexandra Lee walks attendees through accessing the data and demonstrates a couple of use cases for working with the data.

Topics covered include:

  • Introduction to the new proteomics data that is now available on the UKB-RAP

  • Walking users through how they can access this data on the platform

  • Demonstrating a couple of example use cases for how users can analyze this new proteomics data including pQTL and differential expression

Video chapters
  • 00:00 Introduction

  • 04:58 Webinar Agenda

  • 06:42 Helpful Resources

  • 07:18 Learning Objectives

  • 08:07 Introduction to UK Biobank Proteomics Data

  • 09:13 Olink Technology

  • 10:40 How to Access Proteomics Data on UKB-RAP

  • 11:44 Get Phenotype Data & Protein Expression Data from the Cohort Browser

  • 13:07 Get Protein Expression Data from Table Exporter

  • 14:32 Get Protein Expression Data using dx extract_dataset

  • 15:14 Generating List of Field Names for All Proteins

  • 16:48 Sample Protein Expression Data

  • 17:14 Protein Expression Datasets in UKB-RAP

  • 18:57 Metadata Available in the Bulk Folder

  • 19:44 Sample Code Used in Webinar Available on Github

  • 20:08 Adding Analysis Scripts to UKB-RAP

  • 20:38 Differential Expression Analysis Introduction

  • 21:44 Approach for Differential Expression Analysis

  • 22:40 Collect Input Data for Differential Expression Analysis

  • 23:14 QC Input Data for Differential Expression Analysis

  • 23:39 Perform Differential Expression Analysis Using Limma

  • 24:23 Run Differential Expression Analysis Using JupyterLab

  • 25:19 Resource to Re-Run Sample Differential Expression Analysis

  • 25:59 pQTL Analysis Introduction

  • 26:13 Performing GWAS to Identify SNPs to Compare pQTL SNPs

  • 28:05 Introduction to REGENIE

  • 29:46 Approach for pQTL Analysis

  • 30:44 Matched Genotype & Protein Expression Data

  • 31:02 QC pQTL Analysis Input Data

  • 31:42 Run GWAS using REGENIE

  • 32:50 Results

  • 33:26 pQTL Resources

  • 34:17 Conclusion & Helpful Resources

Metabolomics

Analysis of accelerometer and metabolomic data in the UKB

The rich UK Biobank dataset has extensive accelerometry and metabolic data on 100,000 participants. This lifestyle and health data matched with the large amount of genetic data presents an exciting opportunity to perform more complex analysis combining these disparate data types.

Learn how to analyze accelerometer and metabolomic data within the UK Biobank Dataset from expert speakers Rosemary Walmsley, Researcher in Reproducible Machine Learning at the University of Oxford, and Dr. Karsten Suhre, Professor of Physiology and Biophysics, Director Bioinformatics Core at Weill Cornell Medicine - Qatar. First, Rosemary discusses accelerometer data in the UK Biobank dataset, strategies for analysis, and points to repositories to get researchers started. Then Karsten walks through analyzing metabolic data and the necessary scripts and notebooks used in his workflows.

Video chapters
  • 00:00 Introduction

  • 03:45 Rosemary Walmsley - Accelerometer Data in UK Biobank

  • 24:02 Rosemary Q&A

  • 30:36 Dr. Karsten Suhre - Analysis of Metabolic Data in the UK Biobank Dataset

  • 52:22 Karsten Q&A

Images

Image analysis on UKB-RAP

Learn about the data formats available to start running your image analysis on the Research Analysis Platform. Ondrej Klempir, Sr. Community Engagement Scientist at DNAnexus, walks you through how to access the data, reviews basic image (pre)processing steps & shows examples of image visualization on the cloud.

Video chapters
  • 0:00 Introduction

  • 1:28 Learning Objectives

  • 4:58 The World of Biomedical Informatics and UKB RAP data

  • 6:44 Image Processing vs. Bioinformatics

  • 8:55 Running ML with GPU on UKB RAP - JupyterLab

  • 9:34 RAP Library of Scientific/Image Analysis Tools

  • 10:36 ML tools

  • 11:28 Analyze Image Derived Phenotypes via Cohort Browser

  • 13:43 Quick facts about DICOM

  • 15:25 Related filetype: NIFTI - stores 3D voxel data

  • 16:31 Visualization "May Be Essential"

  • 17:30 Visualization Options

  • 19:27 Widget based interactive visualization

  • 20:28 Papaya viewer

  • 22:38 What steps people do in (neuro)imaging?

  • 24:11 Pipeline development

  • 27:38 Cloud-based Analysis

  • 30:10 dicom2nifti conversion

  • 32:14 Running FSL via docker image

  • 34:57 FSL: brain extraction

  • 36:39 Pydicom - Python package for working with DICOM files

  • 38:14 Nibabel - Python package for working with NIFTI files

  • 39:08 Tip: Create a WDL workflow/ pipeline

  • 40:46 Q&A

Advanced image management

Interested in analyzing the extensive imaging data available on the UK Biobank Research Analysis Platform (UKB RAP)? Do you have your own data to analyze and manage? Experts from DNAnexus (UKB RAP), the FSL group at Oxford and MathWorks (makers of MATLAB and Simulink) will walk you through how to access or transfer the data, as well as label and process bulk data.

Agenda

  • Introduction and Short Discussion about Integrating Image data with Genomic Data (Ben Busby)

  • Advanced Imaging and the UKB RAP (Fidel Alfaro-Almagro)

  • Advanced Imaging on the DNAnexus Commercial Platform (Ondrej Klempir)

  • Image analysis on DNAnexus and RAP with Matlab (Rob Holt & Renee Qian)

  • Budgetary considerations around large scale data integration (Asha Collins)

Video chapters
  • 00:00 Introduction & Agenda

  • 05:21 Advanced Imaging & the UKB RAP

  • 20:30 Image Analysis Tools & Features on RAP

  • 39:07 Nipype Demo

  • 47:35 Options for Working with Imaging Data on RAP

  • 52:06 Medical Image Analysis with MATLAB

  • 1:16:28 MATLAB Demo

  • 1:33:00 Managing Costs on RAP

Dementia and multimorbidity in late-life disease

Population-scale datasets offer researchers the ability to derive deeper insights into the genetic mechanisms of late-life diseases like Dementia, but understanding how to tackle analyzing the sheer amount of data remains a challenge.

Our panel of research experts review a broad range of multimodal data science approaches that researchers can use to explore the UK Biobank dataset for new discovery. They discuss linking different data types including proteomics, imaging, wearables & genetics and also describe tools powering their analysis and how to access the tools.

GWAS

GWAS on UKB-RAP using regenie

An introduction to running GWAS using Regenie on the Research Analysis Platform. DNAnexus experts demonstrate how to run the analysis using a diabetes phenotype on the 300k data.

Topics include:

  • Genomic file preprocessing and filtering using the Swiss Army Knife App

  • Building phenotype/covariate files for cohorts using Spark JupyterLab

  • Running regenie using the Swiss Army Knife App

You can also learn more about regenie by viewing this paper referenced in the presentation

Video chapters
  • 00:00 Introduction

  • 02:00 Review of GWAS

  • 05:43 What is Regenie?

  • 12:54 Creating Cohorts Using Cohort Browser

  • 14:03 Building Pheno and Covariate File with Spark JupyterLab

  • 21:57 Using the Swiss Army Knife App

  • 27:08 Preprocessing Genotype Data from Bulk Files

  • 30:12 Running regenie (with preprocessing steps)

  • 48:09 Downstream Analysis of GWAS Results

Visualizing and annotating GWAS results

So you've run a GWAS analysis on RAP and obtained p-values. Now what? This session details visualizing GWAS results using the Research Analysis Platform. The webinar covers interactive exploration of GWAS results using the LocusZoom App and then dives deeper into variant annotation and visualization using both R and Python Jupyter Notebooks.

Video chapters
  • 00:00 Introduction to GWAS

  • 05:20 Using LocusZoom

  • 16:07 LocusZoom Demo

  • 22:42 GWAS Visualization/Annotation with JupyterLab & R

  • 26:23 JupyterLab & R Demo

  • 46:54 Q&A and Helpful Resources

End to end target discovery with GWAS and PheWAS

Learn how to create and run your own target discovery pipeline with GWAS and PheWAS. This webinar shows an end-to-end target discovery pipeline run entirely on UKB RAP that you will be able to replicate for your own research projects on the platform.

Learn how to:

  • Extract data using dx extract_dataset -Make sample and variant QC

  • Run GWAS analysis using REGENIE app

  • Perform LD clumping using PLINK to cluster significant GWAS variants

  • Run PheWAS analysis

As this is an advanced webinar, we expect that audience has experience with:

  • Performing analysis on UKB RAP -Running JupyterLab on UKB RAP

  • Compiling and running WDL workflow

  • Running SAK using GUI or CLI

Video chapters
  • 00:00 Introduction to Genomics-based Target Discovery

  • 06:20 Step 1: Creating Cohorts

  • 08:22 Step 2: Extract sample phenotype data

  • 10:47 Step 3: Lift over array data

  • 13:46 Step 4: QC Sample sample phenotype data

  • 16:20 Step 5: QC array and impute data

  • 19:49 Step 6: GWAS

  • 24:56 Step 7: LD Clumping

  • 29:16 Step 8: PheWAS

  • 35:31 Conclusion

  • 38:12 Q&A

Code is available on github.

Variant annotation

Flexible variant annotation

Learn how to perform large-scale variant annotation with tools that are easy to run in Jupyter Notebooks. This session covers strategies for implementing these tools and sample code for you to get up and running on your projects.

Video chapters
  • 00:00 Introduction

  • 04:16 Intro to Variant Annotation - Rachel Karchin

  • 27:25 STAAR Pipeline: Background & Pipeline - Zilin Li

  • 43:20 STAAR Pipeline on UKB-RAP - Xihao Li

  • 49:05 Variant Annotation on UKB-RAP - Ben Busby

  • 55:07 Q&A

Large-scale analysis

Efficiently analyzing large-scale WGS data

Learn the tips and tricks to faster and more cost effective large-scale WGS analysis. This webinar covers how to set up reproducible analysis pipelines, what apps & tools you should incorporate and how to build your own custom workflows.

Video chapters
  • 00:00 Introduction & Helpful Links

  • 03:37 Overview of UKB-RAP

  • 06:10 Large-scale data on UKB-RAP

  • 08:43 Running regenie on large-scale WGS data

  • 23:27 Tips for how to work with large number of files on UKB-RAP

  • 42:43 Conclusion

Last updated

Was this helpful?