Managing Usage and Storage Costs

Learn how to work cost-effectively on the Research Analysis Platform

Costs on the Research Analysis Platform

Platform users incur costs for:

  • Using compute resources

  • Storing data other than that dispensed to a project by UK Biobank. This includes uploaded data, or data generated in the course of work on the Platform.

  • Data egress

Users are not charged for the cost of storing UK Biobank data that has been dispensed to a project. The cost of storing this data is sponsored by Amazon Web Services.

Running Jobs in Batches

As detailed in this Science Corner article, when your analysis is particularly complex, submit jobs in small batches, to check for errors and ensure that you’re achieving the balance you’re trying to strike between speed and cost-effectiveness.

Smart Reuse: Testing Workflows Efficiently in Advance

Smart Reuse is a feature that can enable significant cost savings. Smart Reuse enables the testing of complex workflows in a maximally resource-efficient fashion, before they’re run in a production environment. For a full description of Smart Reuse and how to use it, refer to the DNAnexus Platform documentation.

Smart Reuse is available to all Research Analysis Platform users.

Choosing and Testing Compute Instances

When running a job on the Platform, you must select a compute instance to use, in executing the job. It’s difficult to make general recommendations about what instances are best in each situation, and how to balance speed and cost-efficiency.

When first running a workflow, one useful approach is to select an instance that meets your cost standard, then set a timeout for the job, to prevent the job from running too long, and thus incurring too high a usage charge.

It can be helpful to log into a running instance using dx ssh and check how the CPU and memory of a machine is being utilized using a utility such as htop. If CPUs are idle or memory is being under-utilized, you may be able to save money by selecting a smaller instance type for that job or by changing the configuration of the tool being run to use more threads (if applicable to that specific tool).

Using Job Priority Settings to Balance Cost and Speed

Each analysis job is run with a priority setting.

High priority jobs use on-demand virtual machines, i.e. compute instances that are immediately available. This costs more than running a job at low priority, which uses spot instances, i.e. virtual machines that may or may not be immediately available. Running a job at normal priority, meanwhile, ensures the the system will first try, for 15 minutes, to secure a spot instance or instances, only using more expensive on-demand instances if spot instances are unavailable.

See the “Managing Job Priority” page for more info on priority levels and how to choose the right one for your purposes.

Consult the Platform rate card for details on rates for using different types of instances.

Managing Storage Costs

Storage costs can add up, if you create or upload large files, particularly if you store them for long periods of time. For this reason, proper file management is essential to using RAP in a cost-efficient fashion. For example, if, in the course of running an analysis, you generate intermediate files, you should consider carefully whether these are worth saving. They may be useful for future analyses. But if the compute cost and effort needed to generate them is low, you might consider re-creating them rather than storing them until you need them again.

Managing Egress Costs

In some situations, you might prefer to egress data from the Platform, then analyze it on your computer or local cluster. But be aware of data egress charges. It’s almost always more cost-efficient to use the Platform for all data processing, relying on local resources only for post-processing.

App and Workflow Cost Limits

When creating an app or a global workflow for use on the Platform, you can set a cost limit, to ensure that running the app or workflow does not incur charges above a set amount. See the DNAnexus Platform documentation for details on how to set app cost limits, and workflow cost limits.

Billing Account Access and Shared Costs

Group billing can be enabled on the Platform, by a user adding others to his or her “wallet,” i.e. personal billing account, or by setting up a new organization with billing, and adding users to it.

Be aware that when users are added to a billing account, they can incur costs that must be paid by the person or entity responsible for that account. Note as well that incurring such costs will affect the billing account’s spending limit, and thus other users’ ability to run jobs billed to that account.

Spending Limits

Spending limits can be used to control spending by users on a common billing account. See this documentation on Platform spending limits, including the default limit, how to raise it as needed, and usage limitations that follow on an organization exceeding its limit.

When a Spending Limit is Exceeded

Note that when a billing account’s spending limit is exceeded, an email notification is sent to the account owner, and functionality is restricted for all those whose usage is billed to the account. They temporarily lose the ability to launch new analyses, upload data, or egress data. To restore full functionality, payment must be made to cover incurred charges or the billing account’s spending limit must be raised. For users working in a project linked to the restricted billing account, the project admin may also link the project to a different billing account, whose spending limit has not been exceeded.

Last updated