ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
using the url_fetcher app and stored in genome_reference
folder in your project..genotype.json
extension and HLA expression level in a file with .gene.json
extension) of the analysis in the "/HLA_process" folder of the RAP-dispensed project.docker
folder which contains a Dockerfile file with the content available for download in this github repository describing the installation of samtools, bedtools, kallisto and arcasHLA tools in the Docker image.samtools view
line.restartOn
field of the executionPolicy
argument in the DNAnexus documentation. You can adjust the restart policy by providing extras.json
input to dxCompiler below as shown in dxCompiler documentation here.sort
:dx find data
, you can use --json
with dx find data
and use jq
to extract the fields you need.200K_exome_HLA_analysis
represents the name of study and will help us distinguish jobs from this analysis from other work you may be doing in the same project.original
indicates that this is the first (original) attempt at running a job. Subsequent reruns of failed jobs will be tagged with rerun{rerun_attempt}
.batch_n_{batch_number}
records a particular batch of 100 jobs.dx run
invocation for the first job, then submits it:dx watch
.200K_exome_HLA_analysis
analysis using dx command line tool to search for jobs tagged with 200K_exome_HLA_analysis
and display only the last n
jobs that we've submitted:200K_exome_HLA_analysis
tag value from the Monitor page in your project.failed
in the Monitor tab in the web browser UI or using the dx command line tool as shown below:200K_exome_HLA_analysis, batch_n_0, original
, you may resubmit the job using tag --name 200K_exome_HLA_analysis --tag batch_n_0 --tag rerun1
.<sample ID>.<type of file or processing>.<file format extension>
for ease of reviewing or troubleshooting your work. For example, in HLA typing, we name the output as 12345_6789_0.genotype.json
which represents <sample ID>.<type of file>.<file format extension>
.WDL_APPLET
code snippet above (i.e. second sample was analyzed after the analysis of the first sample was finished). Since samtools and arcasHLA tools are multi-threaded, the sequential processing of samples still resulted in high CPU utilization. If the analysis tools are not multi-threaded, you may consider processing multiple units in parallel (e.g. using xargs) for better CPU utilization.WDL_APPLET
code snippet shown above. We also recommend removing unnecessary intermediate files to save storage disk space, shown in line 18 of WDL_APPLET
.