Frequently Asked Questions

Get answers to common questions about the Research Analysis Platform, and about UK Biobank data and systems.

Registration, Login, and Linking with the UK Biobank Access Management System

Who can sign up for the UK Biobank Research Analysis Platform?

The Research Analysis Platform is open to researchers who are listed as collaborators on UK Biobank-approved access applications.

How do I register for the Research Analysis Platform?

Registration is a two-step process. You must first create a Research Analysis Platform user account, and then you must link it to your UK Biobank Access Management System (AMS) account.

How do I create a Research Analysis Platform account?

If your organization has been set up for Single Sign On (SSO), follow internal procedures specific to your organization. Otherwise:

  • If you already have a DNAnexus account, you do not need to create a separate Research Analysis Platform account. You can use your existing DNAnexus account on the Research Analysis Platform.

  • If you do not have an account, visit the Research Analysis Platform homepage and select Create an account. You will need to provide your full name and email, as well as a username and password that you want to use.

Does my Research Analysis Platform info (username and email address) need to match my AMS info?

No, your username and email address on Research Analysis Platform can be different from those you use on the AMS.

I tried creating a Research Analysis Platform account and got an error "Email Already Registered."

It looks like you already have a DNAnexus account. If your organization has been set up for SSO, follow your organization's internal procedures. Otherwise visit the Research Analysis Platform Set Password page and enter your email address. You will receive an email with a password reset link, which you can use to reset the password of your account.

How do I log in to the Research Analysis Platform?

To log in:

  • If your organization has been set up for SSO, follow internal procedures.

  • Otherwise, visit the Research Analysis Platform homepage and select Log In to log in with your Research Analysis Platform account.

This process happens automatically upon first login (see "How do I log in to the Research Analysis Platform?"). You will be presented with the Research Analysis Platform Terms of Service, and once you read them (by scrolling down) and accept them, you will be taken to the AMS website, where you must enter your UK Biobank credentials.

What are my AMS credentials?

If you have forgotten your AMS username or password, you can retrieve them via the AMS Login page.

Can I access the Research Analysis Platform without an AMS account?

No. To access the Research Analysis Platform you must have an AMS account, and you must be listed as a collaborator in one of the Research Analysis Platform-approved applications.

How do I obtain an AMS account?

Create an AMS account via the AMS signup page.

I entered my valid AMS credentials but got an approval error.

You must finish the AMS registration process, and be approved by UK Biobank. For more information, see the UK Biobank User Guide.

No, an AMS account may be linked to only one Research Analysis Platform account.

No, a Research Analysis Platform account may be linked to only one AMS account.

No, this operation is not supported.

I previously linked my Research Analysis Platform account to my AMS account, but during a subsequent Research Analysis Platform login, I was asked to do so again.

Occasionally the platform may ask you to refresh your link, for security reasons. Among others, this can happen if your state on the AMS changes for any reason (e.g. if you update your contact details on the AMS).

Data Removal

How many projects should I create?

Currently, there is no limit of how many projects the users can create. However, we recommend everyone under the same research application use the same single project. This would allow better coordination when there is a new data release and also better reuse of tools, workflows, and data that users generated.

How long can each project live? Would it be removed if not used?

The projects on UKB-RAP are eligible for deletion considered inactive or unfunded and will be removed if one of the following criteria are met. This will help ensure the best user experience for active projects and will help optimize use of the platform.

  1. The project has not been accessed for the last 60 days, with no requests from those with access to the project have been made to browse project folders and files. In addition, the project contains only the dispensed UK Biobank data and does not have any derived data generated by the user or others

OR

  1. The project is billed to a wallet that has no funds available. In addition, the project contains generated data resulting in ongoing storage charges.

How to reactivate a project if I receive a warning email?

If your project is considered inactive and for any reason you would like to keep this project, please re-access the project. If the project is no longer inactive, it will not be deleted. If your project is considered unfunded and for any reason you would like to keep this project, then one of the following actions:

  • Add funds to the wallet that the project is billed to.

  • Transfer the project to another wallet that has funds.

  • Delete all generated data and ensure no user generated data remains in the project

Projects and Files

What is a project?

On the UK Biobank Research Analysis Platform, all work takes place in the context of a project. Projects allow a defined set of users to:

  • Access specific datasets

  • Conduct analyses on these datasets

How do I create a project?

See detailed instructions on the Creating a Project page.

What is an Access Application?

An access application is a research application submitted to UK Biobank by a Principal Investigator. It includes a written research proposal and a set of UK Biobank data-fields to which access is requested. UK Biobank assigns a unique numeric identifier to each application. All activity on the Research Analysis Platform needs to be done within the context of such an access application.

For more information on your access application, log into the UK Biobank Access Management System (AMS) and select the Applications tab.

When making a project, I get the error "Application does not belong to this user or is not an approved application".

Please ensure that you are listed as a collaborator in the access application on the AMS.

Can I have a project associated with multiple applications?

No, each project is tied to one application only.

After a project has been created, can I change its application?

No, the application is set at project creation time and cannot be changed.

Can I have multiple projects associated with the same application?

Yes, you can make multiple projects using the same application id.

Data has been dispensed to my project, but not all the data I am expecting is there

The most common reason for data not showing in the Research Analysis Platform is due to your UKB Access Application not being fully completed.

New Applications

For new applications, please check your application in the AMS. If the project's status is “Underway,” then your data should be ready to be dispensed to your project on the Platform.

Upgraded / Additional Data Requested

If you have applied to move your application to a tiered application, or requested further data, this will need to go through a number of steps; quotation, new MTAs and payment etc.. You will receive an email when notifying you that the MTA has been executed. MTA execution is the final step in the process. Once you have received this email, your data will be ready for dispensal. If you have already had data dispensed to a project on the Platform, you will need to have data dispensed to a new project, in order to receive any new data.

I created a project and chose to dispense data, but I don't see any data.

The process of dispensing data happens over a short period of time. When you first select the Create Project button to submit the New Project dialog, the new project will appear empty. Subsequently, it will begin to get populated with files and other data. You can monitor the process by going back to the project list, where you can see the project status, including what percentage of the data has been populated.

How long does it take for the data to appear in a new project?

The process can take as little as 20 minutes or as long as a full day, depending on the scope of the access application.

Do I need to remain logged in while the data is being dispensed in a new project?

No, the process happens in the background, even if you are logged out.

I created a project but it's stuck at "0%".

Your request to dispense data may be queued behind that of other users. The system will service your request in the order it was received. We appreciate your patience during that time.

What kind of data is dispensed when I create a project?

The system dispenses the data that correspond to the approved data-fields of the access application associated with the project. Tabular data-fields and linked health data are dispensed into a SQL database, and bulk data-fields are dispensed as files.

Can I access and use Research Analysis Platform projects on the DNAnexus Platform?

If you use the same account on both platforms, you will be able to access and use Research Analysis Platform projects on the DNAnexus Platform. Note, however, that:

  • You will only be able to access and use tools that are hosted in the London AWS region.

  • All sharing, download, and other data-use restrictions apply fully to UK Biobank data, on both platforms.

What is a data-field?

All data in the UK Biobank resource are organized into data-fields. Your access application is approved for a precise subset of those data-fields. You can find more information about data-fields, broken down by type, on the UK Biobank Field Listing page.

What data-fields is my access application approved for?

You can get more information on your access application by logging into the AMS and selecting the Applications tab on the left.

What is in the "Bulk" folder?

The "Bulk" folder contains files associated with data-fields of type "bulk". These are data items that are particularly large and/or complex and are therefore made available as files, such as genome sequencing files.

How are folders determined for bulk fields?

See this article on folder conventions, for data within the "Bulk" folder.

How are filenames determined for bulk fields?

See this article on filename conventions, for data within the "Bulk" folder.

What is a participant EID?

UK Biobank is a resource compiled from approximately 500,000 volunteer participants. Each participant is uniquely identified by a 7-digit numeric identifier (EID), typically in the 1,000,000 - 6,000,000 range. These identifiers are scrambled for each access application, hence the EIDs will not match across applications. For more information, refer to https://www.ukbiobank.ac.uk/media/5bvp0vqw/de-identification-protocol.pdf.

How are EIDs used on the Research Analysis Platform?

When you create a project on the Research Analysis Platform, the system contacts UK Biobank to get your application's EIDs, then uses them to pseudonymize your dataset. The pseudonymized EIDs are used, for example, to populate the "eid" column in the database, to name per-participant files, to generate the EID-specific content of FAM files for genotyping fields, and to adjust pVCF headers.

For a given application, is the Research Analysis Platform using the same EIDs as data on UK Biobank's website?

Yes, for a given application, data on the Research Analysis Platform contain the same EIDs as data directly downloaded from UK Biobank's website.

I am interested in a specific participant EID. What files are available?

You can also do the same using the CLI. Type:

dx find data --property eid=1234567

Note that these methods find participant-specific files (like individual VCF or CRAM files) and not cohort-wide files (like PLINK or pVCF files).

In the header of pVCF files, why are there samples named "W000001", "W000002", etc.?

These samples correspond to participants that have withdrawn, and the Research Analysis Platform uses this convention to denote them in the header, to help you exclude them from your research.

Since the Research Analysis Platform pseudonymizes pVCF headers, does that mean that different researchers see different content when accessing the same file?

The pseudonymized pVCF headers are specific to a specific access application. Researchers who work on different applications will encounter different headers for each, just as they encounter different content for the FAM files of PLINK fields.

Are the headers of gVCF or CRAM files pseudonymized?

No, the content of these files is not pseudonymized. However, the names of these files are pseudonymized accordingly. Therefore, we recommend relying on the filename prefix for determining the EID corresponding to a gVCF or CRAM file, and discarding any identifiers found in the gVCF or CRAM header.

What is in the "Showcase metadata" folder?

This folder contains all the files published by UK Biobank, as described on the UK Biobank Showcase Schema page. These files describe aspects of the UK Biobank Showcase, including all fields available in the UK Biobank resource.

The files under "Showcase metadata" are different from what's on the UK Biobank showcase website.

The files in the "Showcase metadata" folder represent the showcase metadata at the time that the data was ingested in the system, and may not reflect the latest showcase updates.

How did UK Biobank generate the bulk files? What instruments, assays or scientific workflows were used?

For information about data provenance, please consult the UK Biobank website or contact UK Biobank directly.

I found my bulk data-fields under the "Bulk" folder. Where are the rest of the non-bulk data-fields?

All other non-bulk data-fields (for which UK Biobank defines the item type as "data", "sample", or "record") are dispensed into a SQL database and associated Research Analysis Platform dataset.

Can data be downloaded, exported or otherwise egressed out of the Research Analysis Platform?

From a policy standpoint, you are responsible for complying with the Material Transfer Agreement (MTA) and with any other rules set forth by UK Biobank. As of June 2021, Annex 1 of the MTA states that "any WGS (whole genome sequence) or WES (whole exome sequence) files [..] must not be transmitted or downloaded from the research analysis platform". In addition, depending on the tier of your access application, you may or may not be allowed to egress certain other data.

To help you comply, the Platform may restrict external downloads of certain original files, using rules specific to your application tier. These restrictions are not comprehensive, and it is your responsibility to refrain from actions that would violate the MTA even if the Platform does not technically restrict those actions.

Project Sharing and Cross-Project Collaboration

How do I see who has access to a project?

In the projects list, under the "Members" column, select the number corresponding to the row of interest. Alternatively, from inside a project, select the Share icon on the upper right (next to the "Access:" label).

I just created a project, and the system says there are two users. Who is the other user?

While the project is being populated with data, the system adds a service user called "UK Biobank Robot" (username: ukb.robot).

Why is the user "ukb.robot" in my project?

The system automatically adds this service user to a project whenever the project is being edited or updated, such as when data is being dispensed in a newly created project. The system uses that user to perform any necessary data manipulations in an automated manner.

Can I remove or alter the access of the user "ukb.robot" in my project?

No, but the system will automatically remove that user once any necessary data manipulations are completed.

How do I share a project with other users?

If you are a project administrator, from inside a project select the Share icon on the upper right to launch the sharing dialog. Enter the username or email of the user you want to share the project with, select their access level, then select Add User.

I tried sharing a project with someone and got an error.

You can only share a project with Research Analysis Platform users who are listed as collaborators in the project's access application on the AMS. If you receive an error, please ensure the following:

  • The username or email you are entering exists. You cannot share a project with someone if they have not yet signed up for an account.

  • You are sharing with a linked Research Analysis Platform account. You cannot share a project with an account if they have not yet logged into the Research Analysis Platform and linked their account to the AMS (or if their link needs to be refreshed).

  • You are sharing with someone on the same application. You cannot share a project with a linked Research Analysis Platform account unless they are listed as collaborators in the project's access application on the AMS.

Can I share a project with a group of people at once?

No, you must share with each person individually, as the platform needs to enforce AMS permissions at the user level.

Can I share a project with Customer Support?

Yes. By default, Customer Support does not have access to any projects, unless you explicitly share a project with them. To do that, in the project sharing dialog enter "org-support" (without the quotes) as the username, select Viewer as the access level, and select Add User.

Can I share a project with UK Biobank staff?

Yes. The system supports a special alias that you can use to share a project with UK Biobank. In the project sharing dialog enter "org-ukb_reviewers" (without the quotes) as the username, select Viewer as the access level, and select Add User. This action shares your project with a specific UK Biobank team, managed by UK Biobank themselves. The purpose of this team is to receive your research results.

See the UK Biobank site for details on researchers' obligation to return research results to UK Biobank.

Can I share just a subset of my data, instead of the whole project?

Sharing is on a project basis. If you need to share a subset of data, such as the files in one folder, we recommend copying them to a new project and then sharing that project, as follows:

  • In the project list page, select New Project. Enter the same application id as your existing project, and deselect the option Dispense data to the project. Select Create Project. This will create a new empty project, associated with the same application as your existing project.

  • In your existing project, tick the items you want to share, and select Copy. Select the new project, then select Copy Selected.

  • Share the new project.

Are there any restrictions in copying data across projects?

You may only copy data across projects associated with the same access application. If you have uploaded a file in a project associated with one application, and you need to use it in a second project associated with a different application, you must re-upload it in the second project.

Running Analyses

Which job priority do I choose for my analysis?

You can assign each job a different priority, depending on whether you want to prioritize job execution speed or cost control. See the page Managing Job Priority.

What compute instance types are available for running my analysis?

See the Platform rate card for a full list of available AWS instance types, including detailed specs for each on number of cores, amount of RAM, storage memory type and size, and cost.

File Visualizations

How do I visualize a CRAM or VCF file using IGV.js?

To visualize a CRAM or VCF file in the IGV.js genome browser, follow these steps:

  • Navigate to the project containing the files you want to visualize.

  • Select the VISUALIZE tab.

  • Select the option "IGV.js Genome Browser v2.6.6 (*.bam+bai, *.cram+crai, *vcf.gz+tbi)".

  • Select the files you want to visualize.

    • If you are looking for a specific participant, enter the EID in the Search Project textbox, to quickly locate any CRAM or VCF files related to that participant.

    • For CRAM files, you must select both the CRAM and the associated CRAI file.

    • For VCF files, you must select both the VCF and the associated TBI file.

    • IMPORTANT: Note that IGV.js cannot visualize extremely large pVCF files, such as those provided for the 200k WES, 300k WES or 150k WGS releases. If you want to visualize variants in the 150k WGS cohort, type either "qc_metrics_graphtyper" or "qc_metrics_gatk" into the Search Project textbox and select the resulting pair of *.tab.gz and *.tab.gz.tbi files.

  • Select Launch Viewer.

Databases and Datasets

What is the database found in the root folder of the project?

This is a database containing tables, columns, and rows, that correspond to the approved data-fields of the access application associated with the project. It is a SQL database that is based on Spark SQL technology, which is a modern and more scalable technology than classic relational database technologies (RDBMS).

See this page for more information about databases and datasets.

What tables are included in the dispensed database?

The database contains the following tables:

How are column names determined for the dispensed database?

For the main UK Biobank participant tables, the column naming convention is generally as follows:

p<FIELD-ID>_i<INSTANCE-ID>_a<ARRAY-ID>

However, the following additional rules apply:

  • If a field is not instanced, the _i<INSTANCE-ID> piece is skipped altogether.

  • If a field is not arrayed, the _a<ARRAY-ID> piece is skipped altogether.

  • If a field is arrayed due to being multi-select, the field is converted into a single column of type "embedded array", and the _a<ARRAY-ID> piece is skipped altogether.

Examples:

  • Age at recruitment: p21022

  • Date of attending assessment centre: p53_i0, p53_i1, ...

  • Diagnoses - ICD10 (converted into embedded array): p41270

For all other tables (such as hospital records, GP records, death records, or COVID-19 records), the column names are identical to what UK Biobank provides in the data showcase. For more information on the columns of these tables, consult Resource #138483 (hospital records), Resource #591 (GP records), Resource #115559 (death records), Resource #3151 (COVID-19 GP records), or Resource #1758 (COVID-19 test results).

Data Releases

What is a data release version?

The Research Analysis Platform holds a copy of all UK Biobank data. All projects are created using this copy of UK Biobank data. As UK Biobank updates the data on their end, the copy held by the Research Analysis Platform is periodically updated to reflect these upstream updates. Whenever the Research Analysis Platform updates its copy of the data, it will be indicated by a new data release version.

Where can I get full detail on data included in each data release?

See Data Release Versions.

I am about to make a new project and choose the option to dispense data. What data release version will my data correspond to?

The data in your project will be dispensed out of whatever copy is held by the Research Analysis Platform at the time that you create the project. Therefore, your data will correspond to the latest data release version at that time.

I previously created a project. How is my project affected by new data releases?

Your existing project is not affected, and will continue to reflect the data release version from the time that the project was created. Data updates will not happen automatically and you have the choice to decide whether or not you want to update your project data. If you choose to update your dispensed projects, the files and tabular data in your project will be updated. See the details here.

The data in my project is not up to date. What can I do?

To learn how to get the most recent data update, see the page Updating Dispensed Data.

How can I find out what data release version was used, for an existing project?

In a project, locate and tick the dataset that was dispensed in the root folder. Click the info icon on the upper right to open the info panel. Scroll at the bottom to reveal the "Details" section. The value of the "Description" key contains the original version, e.g.

"Description" = "Dataset: app68444_202101290057.dataset Original Version: v3.0+ae7924f"

I refreshed my project after a data release, but before the new data has been unrestricted. Could I update my data by doing the data dispensal again?

After each data release, the data need to be unrestricted by the UKB before it is available for the researchers. If the user begins the process of data dispensal during this time, the project will be set to the latest version, but the restricted data will still not be not available.

The user can re-dispense data after the data has been unrestricted by UKB. UKB will typically send out an email after the initial data release to notify users when data has been unrestricted. The version numbers of data releases can be the same, however the version signature will change when restricted data becomes available. For example, the user whose current data is "v3.0+ae7924f" could re-dispense data to version "3.0+ae9999f", even though both versions start with “v3.0”. The version signature, “ae7924f” and “ae9999f”, will be different between these two data dispensal batches.

The scope of my UK Biobank access application has been expanded. Will new data automatically appear in my projects?

No. If you have been approved for new fields, this change will not apply to existing projects automatically. To get access to the new data you are approved for, you will need to perform a data update. To learn how to get the most recent data update, see the page Updating Dispensed Data.

Can I just update selected data fields using the data update feature?

No. The data update process will update all the data fields that your application is eligible for.

I just updated my project data. Why is my update progress stuck at 0%?

The update process will take some time to complete. Your request to update the data may be queued behind that of other users. The system will service your request in the order it was received. We appreciate your patience during that time.

Will the data update affect ongoing jobs?

If ongoing jobs use files that need to be removed as part of the data update, these jobs may fail. We recommend starting the update process when there are no jobs running and waiting for the update process to complete before starting any new jobs.

When doing a data update, how are previous artifacts (previously generated results, previously saved cohorts, dashboards, etc) affected?

Previously-generated result files are not affected. Cohorts and dashboards that point to the previous dataset will be evaluated against the updated database. To migrate these to the latest dataset, run the "Rebase Cohorts and Dashboards” app.

AWS Credits Program

UK Biobank has approved my credits application, what should I do to receive my approved credits?

If you have received confirmation from UK Biobank that your grant application has been approved, the next step is to create a new grant org on the Research Analysis Platform. This will enable you to receive the grant. See the next question for more information.

How do I create a grant org?

  1. Log onto the Research Analysis Platform.

  2. From the main Platform menu, select Org Admin.

  3. From the dropdown menu, select All Orgs.

  4. On the Organizations list page, click the New Organization button.

  5. A New Organization form will open in a modal window. Enter the following information in the form: Org Name: Enter a name of your choice for this organization Org ID: You can edit the default value, as long as you preserve the prefix "ukbgrant_yymm_" Note that this field has a character limit. A valid org ID would be, for example, "ukbgrant_yymm_FirstNameLastName”.

  6. Click Create Organization.

  7. The Set Up Billing modal window will open, and you will be prompted to set up billing for your new organization. Do not set up billing. Instead, click Exit to close the modal window.

What should I do if I do not see an option to create a new org?

Make sure that you've logged into your Research Analysis Platform account. Once you've done so, you should be able to create a new org, following the instructions just above.

How do I use the grant org as a project's billing account?

  1. From the main Platform menu, select Projects, then All Projects.

  2. In the projects list, find the row for the project in question.

  3. Click the vertical "..." icon at the end of the row, then select Project Settings from the More Actions menu.

  4. Locate the Billed To field, in the Billing section of the Settings screen.

  5. Click on the downward-facing caret at the right end of the Billed To field.

  6. Select the grant org from the list of available billing accounts.

When will I see credits in my Research Analysis Platform account?

Credits are issued quarterly at the beginning of March, June, September, and December. Your approval email from UK Biobank will specify when your credits will be issued, pending the creation of an org to receive the credits.

I am running out of credits in the grant org, can I set up billing on this account and add my credit card?

Do not upgrade your grant org to a billable account, or add your credit card to a grant org account. Grant orgs are only for receiving and using grants. If funds in the grant org account are running low, change the billing account used by any affected project, so that it uses a personal billing account linked to a credit card. You can also apply for additional credits. Find out more about applying for credits.

I have been approved for enhanced credits. Do I have to create a new org to receive these credits?

Yes, you must create a new grant org to receive enhanced credits.

Can I transfer funds from the grant org to my personal billing account?

No, funds from the grant org cannot be transferred to other accounts.

Why is my personal billing account still being charged after I received grant funds and created a grant org?

Check the billing account used by your project or projects. Be sure that the grant org is set as the "Billed To" account for each.

Last updated