Untitled_<DATE>.ipynb
).spark.sql("...")
function, which returns a Spark DataFrame..show(truncate=False)
on it.participant_0001, ..., participant_9999
hesin
hesin_critical
hesin_delivery
hesin_diag
hesin_maternity
hesin_oper
hesin_psych
death
death_cause
gp_clinical
gp_registrations
gp_scripts
covid19_tpp_gp_clinical
covid19_tpp_gp_scripts
covid19_emis_gp_clinical
covid19_emis_gp_scripts
covid19_result_england
covid19_result_scotland
"gp_clinical"
and"gp_clinical_v4_0_9b7a7f3"
. This naming scheme is part of the system's architecture, supporting data refreshes and participant withdrawals. "gp_clinical"
- because the versioned tables do not persist over time.allele_23146 (allele_23148), annotation_23146 (annotation_23148), assay_eid_map_23146 (assay_eid_map_23148), genotype_23146 (genotype_23148), pheno_assay_23146_link, rsid_lookup_r81_23146 (pheno_assay_23146_link, rsid_lookup_r81_23148)
p<FIELD-ID>_i<INSTANCE-ID>_a<ARRAY-ID>
_i<INSTANCE-ID>
piece is skipped altogether._a<ARRAY-ID>
piece is skipped altogether._a<ARRAY-ID>
piece is skipped altogether.p21022
p53_i0
, p53_i1
, ...p41270
df
without executing the query. The query is only evaluated when needed, potentially with additional transformations. df.count()
later will evaluate an equivalent SELECT COUNT(*)
...participant
. p21002_i0
through p21002_i3
.p21002_i0
, shown next to the Link label).participant.fields
array, or by using the function participant.find_fields
. Refer to the dxdata documentation for more information. The following example finds all fields with a matching case-insensitive keyword "weight" in their titles:retrieve_fields
function:".dnanexus.cloud/lab?"
), open a new browser tab, and paste the URL. Replace "/lab?"
with ":8081/jobs/"
and press Enter..toPandas()
. This will return a Pandas DataFrame in memory, which you can manipulate further using other pandas functions. Pandas functionality runs in the same VM as JupyterLab and does not leverage the Spark cluster.
"eid"
) as the first field name, so that it is returned as the first column. If you don't include it, the system will not return the value.
p31
(participant sex) will be returned as an integer column with values of 0 and 1. To receive decoded values, supply the coding_values="replace"
argument.
column_aliases={"p21002_i0": "weight", ...}
argument.filter_sql=cohort.sql
argument: