ehrQL output formats

Supported output formats🔗

The following output formats are supported:

Recommended🔗

.arrow — Apache Arrow format
.csv.gz — compressed CSV format

Not recommended🔗

.csv — uncompressed CSV format

The uncompressed CSV format is not recommended, because this produces much larger files than the alternative formats.

Unsupported output formats🔗

These formats were supported in cohort-extractor, but are not by ehrQL

.dta and .dta.gz — Stata formats

`arrowload` for Stata users🔗

Stata itself does not directly support .arrow. However, OpenSAFELY's Stata Docker image contains the arrowload library that can load .arrow files in Stata.

Use arrowload as:

. arrowload /path/to/arrow/file

See the full documentation via running command-line Stata via OpenSAFELY:

opensafely exec stata-mp stata

and then running

. help arrowload

Selecting an output format🔗

You select an output format when you use the --output option to specify an output filename for ehrQL. The filename extension — for example, .arrow — that you provide determines the output format file.

If you specify a filename extension that is not supported, you will get an error telling you so.

If you omit the --output option, the output is not saved to a file. Instead, the output is displayed at the command line.

Examples with `opensafely exec`🔗

`.arrow`🔗

opensafely exec ehrql:v0 generate-dataset "./dataset-definition.py" --dummy-tables "example-data/" --output "./outputs/data_extract.arrow"

`.csv.gz`🔗

opensafely exec ehrql:v0 generate-dataset "./dataset-definition.py" --dummy-tables "example-data/" --output "./outputs/data_extract.csv.gz"

Example `project.yaml`🔗

version: "3.0"

expectations:
  population_size: 1000

actions:
  extract_data:
    run: ehrql:v0 generate-dataset "./dataset_definition.py" --output "outputs/data_extract.arrow"
    outputs:
      highly_sensitive:
        population: outputs/data_extract.arrow

The population filename must be identical to the output filename specified by --output. Otherwise you will see the following error when you use opensafely run to run the project actions:

$ opensafely run run_all
=> ProjectValidationError
   Invalid project:
   1 validation error for Pipeline
   __root__
     --output in run command and outputs must match (type=value_error)

ehrQL output formats

Supported output formats🔗

Recommended🔗

Not recommended🔗

Unsupported output formats🔗

arrowload for Stata users🔗

Selecting an output format🔗

Examples with opensafely exec🔗

.arrow🔗

.csv.gz🔗

Example project.yaml🔗

`arrowload` for Stata users🔗

Examples with `opensafely exec`🔗

`.arrow`🔗

`.csv.gz`🔗

Example `project.yaml`🔗