Using RStudio on the Research Analysis Platform

Learn how to use RStudio Workbench, an interactive R development environment, on the DNAnexus Platform.

Introduction

The RStudio app runs the commercial edition of RStudio Workbench, an interactive R development environment, on the DNAnexus Platform. Users of this app can analyze, visualize and gain insights into data, and interactively run commands in a cloud based terminal.

Starting a Session

  1. From the "Tools" tab in the upper menu, click on “RStudio”. This RStudio Sessions page shows the previously launched RStudio Workbench sessions, allows you to stop a running session, relaunch an ended session or to start a new session by clicking on the "New RStudio" button in the upper right.

  2. When you click on the "New RStudio" button, the RStudio Workbench setup modal will appear. Fill out the optional "Environment Name" field and override the instance type if you need a smaller or larger instance. It is strongly recommended to use the default "High" priority to avoid losing your interactive work due to spot instance interruptions. Then click “Start Environment”.

  3. You will then see your new session appear on the RStudio Session page. This entry’s status will be set to "Initializing".

  4. Once the status of the launched RStudio Workbench session changes to "Ready", click on the Session name or "Open" link next to the status to open the RStudio environment.

  5. You can stop a session from the Sessions page by hovering over the icon of the three dots on the right side of the session row. This icon also allows you to launch a new session with the same settings as a previously ended session (Figure 1).

  6. Clicking on the "i" next to the "New Rstudio" button will display more info for each session (Figure 2).

Using RStudio Workbench

Inside the working environment, the Terminal tab allows you to download DNAnexus project files to the RStudio environment using dx download, and upload RStudio files to the DNAnexus project with dx upload.

You can also execute commands with root privileges by prefixing the commands with sudo, for example, to install a wget package, use the following commands in the RStudio Terminal Window:

sudo apt-get update && sudo apt-get install wget

Any changes you make to the Rstudio environment (e.g. adding files, installing packages, building projects) are limited to the DNAnexus worker in which the current DNAnexus job is running and thus will not be saved when the job (and hence the worker) is terminated. Always save scripts and any data you want to keep by uploading them to the platform. See the “Working with Data” section for additional details.

Working With Data

Accessing Project Data: Downloading Project Files

To make your data from a DNAnexus project available for processing in the Rstudio, you need to download the data into the Rstudio worker execution environment. In the Terminal tab, run:

dx download <dnanexus_platform_file> 

where FILE is the name or ID of a file in a DNAnexus project. The file will be downloaded to your current working directory. You may download multiple files and whole folders at once; for more information please check dx download -h. To see the listing of the project files, use dx ls.

Advanced Use Case: Reading From /mnt/project

If your input files are large and you need to scan the content of the files in the DNAnexus project once or to read only a small fraction of a project file's content you may consider reading files from /mnt/project folder. A project in which the app is running is mounted in a read-only fashion at /mnt/project folder. Reading the content of the files in /mnt/project dynamically fetches the content from the DNAnexus platform, so this method uses minimal disk space in the RStudio execution environment, but uses more API calls to fetch the content.

Accessing Phenotypic Information Using Table Exporter App

You can export selected phenotypic fields for your UKB study into a TSV or CSV file using the Table Exporter app as described here. You can then dx download the CSV file to the RStudio worker execution environment and read it into RStudio using read.csv() command.

Saving Local Data to the Project

Uploading local files to the project

The app runs in a temporary worker execution environment and any outputs generated in an RStudio session will not be persisted when the job running the app stops running. If you'd like to save individual result files in a DNAnexus project, you can upload them from an Rstudio Terminal using the dx upload command, for example dx upload FILE where FILE is the file to be uploaded to the current project. You may upload multiple files and whole folders at once; for more information please check dx upload -h. The app has VIEW access to all the projects accessible to the launching user and CONTRIBUTE access to the project in which the app is running, which makes it possible to upload files from the Rstudio session to the DNAnexus project.

Backing up workspace to the project

To back up a folder to your project, use dx-backup-folder command.

As an example, to backup current folder to /.Backups/rstudio_workbench_ukbrap.testuser.2022-03-21T16-32-59.tar.gz use the following command:

dx-backup-folder

The name of the backup file in the current DNAnexus project defaults to .Backups/<rstudio_workbench_ukbrap>.<dnanexus-username>.tar.gz

To backup workspace folder to /.Backups/workspace1.tar.gz platform file use:

dx-backup-folder -d /.Backups/workspace1.tar.gz workspace/

To backup the current folder excluding R subfolder and any .RData files in any subfolder to a platform files in /small_backup/rstudio_workbench_ukbrap.testuser.2022-03-14T20-47-03.tar.gz use:

dx-backup-folder --exclude 'R' --exclude '*/.RData' -d /small_backup/

The optional --exclude and --exclude-from arguments work the same way as in the GNU tar command to exclude the specified files and directories from the backup.

Restoring a workspace

To restore the content of a previously created backup file in the current folder, while not overwriting any local files:

dx-restore-folder /.Backups/rstudio_workbench_ukbrap.testuser.2022-03-14T20-47-03.tar.gz

You can also overwrite local files with restored files by specifying the optional –overwrite command line flag, and specify a local folder you wish to restore the backup into using the --output argument. For more information please check dx-restore-folder -h.

Terminating a Session

To terminate the session, select the “Terminate” button on the upper right side of the Rstudio Environment header.

Your RStudio sessions can be viewed by selecting the “RStudio” option from the global Tools menu. You can also end a session from this page by right clicking the three vertical dots next to the “Open” option of a session that is currently running and selecting “End environment”.

Note that closing the browser does not stop the app. A running app will continue accruing compute charges until its termination. As long as the job is running, you can go back to that Rstudio environment by loading the job-xxxx.dnanexus.cloud URL mentioned above.

Last updated