Using JupyterLab on the Research Analysis Platform

These in-depth recorded webinars provide a detailed guide to leveraging the power of JupyterLab on the Research Analysis Platform.

For information on how to use HAIL with Jupyterlab, see example notebooks here.

Troubleshooting

For general tips on troubleshooting, see guide.

IssueExample error messageWhat to do

Cannot open or launch Jupyter Lab session

You cannot open DNAnexus notebooks from a protected project. Please download the notebook and open the local version

Navigate to the Settings tab. Check the Delete Access policy is set to 'Contributors & Admins'. If it is set to ‘Admins only’ it will cause the project to be considered protected. Information on protected projects can be found in the following documentation

502 Bad Gateway Something went wrong

After the job is launched to access the Jupyterlab server this may take ~ 10 - 15 minutes. Note: this wait time applies to all cloud applications including Rstudio.

Alternatively, you can try to add port_number to the address if waiting doesn't work. Currently it helps for 8080 and 8081 ports

Timeout error when working with spark object

Py4JJavaError: An error occurred while calling o1283.collectToPython.

: org.apache.spark.SparkException: Could not execute broadcast in 1000

secs. You can increase the timeout for broadcasts via

spark.sql.broadcastTimeout or disable broadcast join by setting

spark.sql.autoBroadcastJoi

Try using the latest version of Jupyterlab.

The latest version of Jupyterlab is available via the Tools > JupyterLab tab where you can launch a new environment using the New JupyterLab button or re-launching an old Jupyterlab session. By default the Jupyterlab environment will use the latest version available.

FatalError: ConnectionPoolTimeoutException: Timeout waiting for connection from pool

Try to monitor the job’s using the spark UI: https://job-xxxx.dnanexus.cloud:8081/jobs/

If the issue is limited memory, you may need to use a new instance type with a larger memory allocation

Issue accessing large dataset using spark

SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 8975519.

You have exceeded the allowable buffer limit size for kryo serialization. You should adjust the buffer using the following code:

config = pyspark.SparkConf().setAll([('spark.kryoserializer.buffer.max', '128')])

sc = pyspark.SparkContext(conf=config)

Last updated