beta.tpp schema🔗
Available on backends: TPP
This defines all the data (both primary care and externally linked) available in the OpenSAFELY-TPP backend. For more information about this backend, see the SystmOne Primary Care section.
from ehrql.tables.beta.tpp import (
addresses,
apcs,
apcs_cost,
appointments,
clinical_events,
ec,
ec_cost,
emergency_care_attendances,
hospital_admissions,
household_memberships_2020,
isaric_raw,
medications,
occupation_on_covid_vaccine_record,
ons_deaths,
opa,
opa_cost,
opa_diag,
opa_proc,
open_prompt,
patients,
practice_registrations,
sgss_covid_all_tests,
vaccinations,
wl_clockstops,
wl_clockstops_raw,
wl_openpathways,
wl_openpathways_raw,
)
many rows per patient
addresses🔗
-
address_id
🔗
integer
-
start_date
🔗
date
-
end_date
🔗
date
-
address_type
🔗
integer
-
rural_urban_classification
🔗
integer
-
imd_rounded
🔗
integer
-
- Always >= 0, <= 32800, and a multiple of 100
-
msoa_code
🔗
string
-
- Matches regular expression:
E020[0-9]{5}
- Matches regular expression:
-
has_postcode
🔗
boolean
-
care_home_is_potential_match
🔗
boolean
-
care_home_requires_nursing
🔗
boolean
-
care_home_does_not_require_nursing
🔗
boolean
-
for_patient_on(date)
🔗
-
Return each patient's registered address as it was on the supplied date.
Where there are multiple registered addresses we prefer any which have a known postcode (though we never have access to this postcode) as this is used by TPP to cross-reference other data associated with the address, such as the MSOA or index of multiple deprevation.
Where there are multiple of these we prefer the most recently registered address and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the address ID.
View method definition
spanning_addrs = addresses.where(addresses.start_date <= date).except_where( addresses.end_date < date ) ordered_addrs = spanning_addrs.sort_by( case(when(addresses.has_postcode).then(1), default=0), addresses.start_date, addresses.end_date, addresses.address_id, ) return ordered_addrs.last_for_patient()
many rows per patient
apcs🔗
many rows per patient
apcs_cost🔗
many rows per patient
appointments🔗
Appointments in primary care.
You can find out more about this table in the short data report. To view it, you will need a login for OpenSAFELY Jobs and the Project Collaborator or Project Developer role for the project. The workspace shows when the code that comprises the report was run; the code itself is in the appointments-short-data-report repository on GitHub.
Tip
Querying this table is similar to using Cohort Extractor's
patients.with_gp_consultations
function. However, that function filters by
the status of the appointment. To achieve a similar result with this table:
appointments.where(
appointments.status.is_in([
"Arrived",
"In Progress",
"Finished",
"Visit",
"Waiting",
"Patient Walked Out",
])
)
-
booked_date
🔗
date
-
The date the appointment was booked
-
start_date
🔗
date
-
The date the appointment was due to start
-
status
🔗
string
-
The status of the appointment
- Possible values:
Booked
,Arrived
,Did Not Attend
,In Progress
,Finished
,Requested
,Blocked
,Visit
,Waiting
,Cancelled by Patient
,Cancelled by Unit
,Cancelled by Other Service
,No Access Visit
,Cancelled Due To Death
,Patient Walked Out
- Possible values:
many rows per patient
clinical_events🔗
Each record corresponds to a single clinical or consultation event for a patient.
Each event is recorded twice: once with a CTv3 code, and again with the equivalent SNOMED-CT code. Each record will have only one of the ctv3_code or snomedct_code columns set and the other will be null. This allows you to query the table using either a CTv3 codelist or SNOMED-CT codelist and all records using the other coding system will be effectively ignored.
Note that event codes do not change in this table. If an event code in the coding system becomes inactive, the event will still be coded to the inactive code. As such, codelists should include all relevant inactive codes.
Detailed information on onward referrals is not currently available. A subset of referrals are recorded in the clinical events table but this data will be incomplete.
many rows per patient
ec🔗
many rows per patient
ec_cost🔗
many rows per patient
emergency_care_attendances🔗
-
id
🔗
integer
-
arrival_date
🔗
date
-
discharge_destination
🔗
SNOMED-CT code
-
diagnosis_01
🔗
SNOMED-CT code
-
diagnosis_02
🔗
SNOMED-CT code
-
diagnosis_03
🔗
SNOMED-CT code
-
diagnosis_04
🔗
SNOMED-CT code
-
diagnosis_05
🔗
SNOMED-CT code
-
diagnosis_06
🔗
SNOMED-CT code
-
diagnosis_07
🔗
SNOMED-CT code
-
diagnosis_08
🔗
SNOMED-CT code
-
diagnosis_09
🔗
SNOMED-CT code
-
diagnosis_10
🔗
SNOMED-CT code
-
diagnosis_11
🔗
SNOMED-CT code
-
diagnosis_12
🔗
SNOMED-CT code
-
diagnosis_13
🔗
SNOMED-CT code
-
diagnosis_14
🔗
SNOMED-CT code
-
diagnosis_15
🔗
SNOMED-CT code
-
diagnosis_16
🔗
SNOMED-CT code
-
diagnosis_17
🔗
SNOMED-CT code
-
diagnosis_18
🔗
SNOMED-CT code
-
diagnosis_19
🔗
SNOMED-CT code
-
diagnosis_20
🔗
SNOMED-CT code
-
diagnosis_21
🔗
SNOMED-CT code
-
diagnosis_22
🔗
SNOMED-CT code
-
diagnosis_23
🔗
SNOMED-CT code
-
diagnosis_24
🔗
SNOMED-CT code
many rows per patient
hospital_admissions🔗
one row per patient
household_memberships_2020🔗
Inferred household membership as of 2020-02-01, as determined by TPP using an as yet undocumented algorithm.
many rows per patient
isaric_raw🔗
A subset of the ISARIC data.
These columns are deliberately all taken as strings while in a preliminary phase. They will later change to more appropriate data types.
Descriptions taken from: CCP_REDCap_ISARIC_data_dictionary_codebook.pdf
-
age
🔗
string
-
Age
-
age_factor
🔗
string
-
TODO
-
calc_age
🔗
string
-
Calculated age (comparing date of birth with date of enrolment). May be inaccurate if a date of February 29 is used.
-
sex
🔗
string
-
Sex at birth.
-
ethnic___1
🔗
string
-
Ethnic group: Arab.
-
ethnic___2
🔗
string
-
Ethnic group: Black.
-
ethnic___3
🔗
string
-
Ethnic group: East Asian.
-
ethnic___4
🔗
string
-
Ethnic group: South Asian.
-
ethnic___5
🔗
string
-
Ethnic group: West Asian.
-
ethnic___6
🔗
string
-
Ethnic group: Latin American.
-
ethnic___7
🔗
string
-
Ethnic group: White.
-
ethnic___8
🔗
string
-
Ethnic group: Aboriginal/First Nations.
-
ethnic___9
🔗
string
-
Ethnic group: Other.
-
ethnic___10
🔗
string
-
Ethnic group: N/A.
-
covid19_vaccine
🔗
string
-
Has the patient received a Covid-19 vaccine (open label licenced product)?
-
covid19_vaccined
🔗
date
-
Date first vaccine given (Covid-19) if known.
-
covid19_vaccine2d
🔗
date
-
Date second vaccine given (Covid-19) if known.
-
covid19_vaccined_nk
🔗
string
-
First vaccine given (Covid-19) but date not known.
-
corona_ieorres
🔗
string
-
Suspected or proven infection with pathogen of public health interest.
-
coriona_ieorres2
🔗
string
-
Proven or high likelihood of infection with pathogen of public health interest.
-
coriona_ieorres3
🔗
string
-
Proven infection with pathogen of public health interest.
-
inflammatory_mss
🔗
string
-
Adult or child who meets case definition for inflammatory multi-system syndrome (MIS-C/MIS-A).
-
cestdat
🔗
date
-
Onset date of first/earliest symptom.
-
chrincard
🔗
string
-
Chronic cardiac disease, including congenital heart disease (not hypertension).
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
hypertension_mhyn
🔗
string
-
Hypertension (physician diagnosed).
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
chronicpul_mhyn
🔗
string
-
Chronic pulmonary disease (not asthma).
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
asthma_mhyn
🔗
string
-
Asthma (physician diagnosed).
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
renal_mhyn
🔗
string
-
Chronic kidney disease.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
mildliver
🔗
string
-
Mild liver disease.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
modliv
🔗
string
-
Moderate or severe liver disease
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
chronicneu_mhyn
🔗
string
-
Chronic neurological disorder.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
malignantneo_mhyn
🔗
string
-
Malignant neoplasm.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
chronichaemo_mhyn
🔗
string
-
Chronic haematologic disease.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
aidshiv_mhyn
🔗
string
-
AIDS/HIV.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
obesity_mhyn
🔗
string
-
Obesity (as defined by clinical staff).
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
diabetes_type_mhyn
🔗
string
-
Diabetes and type.
- Possible values:
NO
,1
,2
,N/K
- Possible values:
-
diabetescom_mhyn
🔗
string
-
Diabetes with complications.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
diabetes_mhyn
🔗
string
-
Diabetes without complications.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
rheumatologic_mhyn
🔗
string
-
Rheumatologic disorder.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
dementia_mhyn
🔗
string
-
Dementia.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
malnutrition_mhyn
🔗
string
-
Malnutrition.
- Possible values:
YES
,NO
,Unknown
- Possible values:
-
smoking_mhyn
🔗
string
-
Smoking.
- Possible values:
Yes
,Never Smoked
,Former Smoker
,N/K
- Possible values:
-
hostdat
🔗
date
-
Admission date at this facility.
-
hooccur
🔗
string
-
Transfer from other facility?
-
hostdat_transfer
🔗
date
-
Admission date at previous facility.
-
hostdat_transfernk
🔗
string
-
Admission date at previous facility not known.
-
readm_cov19
🔗
string
-
Is the patient being readmitted with Covid-19?
-
dsstdat
🔗
date
-
Date of enrolment.
-
dsstdtc
🔗
date
-
Outcome date.
many rows per patient
medications🔗
many rows per patient
occupation_on_covid_vaccine_record🔗
-
is_healthcare_worker
🔗
boolean
many rows per patient
ons_deaths🔗
-
date
🔗
date
-
Patient's date of death. Only deaths registered from February 2019 are recorded.
-
place
🔗
string
-
- Possible values:
Care Home
,Elsewhere
,Home
,Hospice
,Hospital
,Other communal establishment
- Possible values:
-
underlying_cause_of_death
🔗
ICD-10 code
-
cause_of_death_01
🔗
ICD-10 code
-
cause_of_death_02
🔗
ICD-10 code
-
cause_of_death_03
🔗
ICD-10 code
-
cause_of_death_04
🔗
ICD-10 code
-
cause_of_death_05
🔗
ICD-10 code
-
cause_of_death_06
🔗
ICD-10 code
-
cause_of_death_07
🔗
ICD-10 code
-
cause_of_death_08
🔗
ICD-10 code
-
cause_of_death_09
🔗
ICD-10 code
-
cause_of_death_10
🔗
ICD-10 code
-
cause_of_death_11
🔗
ICD-10 code
-
cause_of_death_12
🔗
ICD-10 code
-
cause_of_death_13
🔗
ICD-10 code
-
cause_of_death_14
🔗
ICD-10 code
-
cause_of_death_15
🔗
ICD-10 code
many rows per patient
opa🔗
many rows per patient
opa_cost🔗
many rows per patient
opa_diag🔗
-
opa_ident
🔗
integer
-
TODO
- Never
NULL
- Never
-
primary_diagnosis_code
🔗
ICD-10 code
-
TODO
-
primary_diagnosis_code_read
🔗
CTV3 (Read v3) code
-
TODO
-
secondary_diagnosis_code_1
🔗
ICD-10 code
-
TODO
-
secondary_diagnosis_code_1_read
🔗
CTV3 (Read v3) code
-
TODO
-
appointment_date
🔗
date
-
TODO
-
referral_request_received_date
🔗
date
-
TODO
many rows per patient
opa_proc🔗
-
opa_ident
🔗
integer
-
TODO
- Never
NULL
- Never
-
primary_procedure_code
🔗
OPCS-4 code
-
TODO
-
primary_procedure_code_read
🔗
CTV3 (Read v3) code
-
TODO
-
procedure_code_1
🔗
OPCS-4 code
-
TODO
-
procedure_code_2_read
🔗
CTV3 (Read v3) code
-
TODO
-
appointment_date
🔗
date
-
TODO
-
referral_request_received_date
🔗
date
-
TODO
many rows per patient
open_prompt🔗
This table contains responses to questions from the OpenPROMPT project.
You can find out more about this table in the associated short data report. To view it, you will need a login for Level 4. The workspace shows when the code that comprises the report was run; the code itself is in the airmid-short-data-report repository on GitHub.
-
ctv3_code
🔗
CTV3 (Read v3) code
-
The response to the question, as a CTV3 code. Alternatively, if the question does not admit a CTV3 code as the response, then the question, as a CTV3 code.
- Never
NULL
- Never
-
snomedct_code
🔗
SNOMED-CT code
-
The response to the question, as a SNOMED CT code. Alternatively, if the question does not admit a SNOMED CT code as the response, then the question, as a SNOMED CT code.
-
creation_date
🔗
date
-
The date the survey was administered
- Never
NULL
- Never
-
consultation_date
🔗
date
-
The response to the question, as a date, if the question admits a date as the response. Alternatively, the date the survey was administered.
- Never
NULL
- Never
-
consultation_id
🔗
integer
-
The ID of the survey
- Never
NULL
- Never
-
numeric_value
🔗
float
-
The response to the question, as a number
one row per patient
patients🔗
Patients in primary care.
Representativeness🔗
You can find out more about the representativeness of these data in the OpenSAFELY-TPP backend in:
The OpenSAFELY Collaborative, Colm D. Andrews, Anna Schultze, Helen J. Curtis, William J. Hulme, John Tazare, Stephen J. W. Evans, et al. 2022. "OpenSAFELY: Representativeness of Electronic Health Record Platform OpenSAFELY-TPP Data Compared to the Population of England." Wellcome Open Res 2022, 7:191. https://doi.org/10.12688/wellcomeopenres.18010.1
Orphan records🔗
If a practice becomes aware that a patient has moved house, then the practice deducts, or removes, the patient's records from their register. If the patient doesn't register with a new practice within a given amount of time (normally from four to eight weeks), then the patient's records are permanently deducted and are orphan records. There are roughly 1.6 million orphan records.
-
age_on(date)
🔗
-
Patient's age as an integer, in whole elapsed calendar years, as it would be on the given date.
This method takes no account of whether the patient is alive on the given date. In particular, it may return negative values if the given date is before the patient's date of birth.
View method definition
return (date - patients.date_of_birth).years
many rows per patient
practice_registrations🔗
Each record corresponds to a patient's registration with a practice.
Only patients with a full GMS (General Medical Services) registration are included.
We have registration history for:
- all patients currently registered at a TPP practice
- all patients registered at a TPP practice any time from 1 Jan 2009 onwards:
- who have since de-registered
- who have since died
A patient can be registered with zero, one, or more than one practices at a given time. For instance, students are often registered with a practice at home and a practice at university.
-
start_date
🔗
date
-
Date patient joined practice.
- Never
NULL
- Never
-
end_date
🔗
date
-
Date patient left practice.
-
practice_pseudo_id
🔗
integer
-
Pseudonymised practice identifier.
- Never
NULL
- Never
-
practice_stp
🔗
string
-
ONS code of practice's STP (Sustainability and Transformation Partnership). STPs have been replaced by ICBs (Integrated Care Boards), and ICB codes will be available soon.
- Matches regular expression:
E540000[0-9]{2}
- Matches regular expression:
-
practice_nuts1_region_name
🔗
string
-
Name of the NUTS level 1 region of England to which the practice belongs. For more information see: https://www.ons.gov.uk/methodology/geography/ukgeographies/eurostat
- Possible values:
North East
,North West
,Yorkshire and The Humber
,East Midlands
,West Midlands
,East
,London
,South East
,South West
- Possible values:
-
for_patient_on(date)
🔗
-
Return each patient's practice registration as it was on the supplied date.
Where a patient is registered with multiple practices we prefer the most recent registration and then, if there are multiple of these, the one with the longest duration. If there's stil an exact tie we choose arbitrarily based on the practice ID.
View method definition
spanning_regs = practice_registrations.where(practice_registrations.start_date <= date).except_where( practice_registrations.end_date < date ) ordered_regs = spanning_regs.sort_by( practice_registrations.start_date, practice_registrations.end_date, practice_registrations.practice_pseudo_id, ) return ordered_regs.last_for_patient()
many rows per patient
sgss_covid_all_tests🔗
many rows per patient
vaccinations🔗
many rows per patient
wl_clockstops🔗
National Waiting List Clock Stops
This dataset contains all completed referral-to-treatment (RTT) pathways with a "clock stop" date between May 2021 and May 2022. Patients referred for non-emergency consultant-led treatment are on RTT pathways. The "clock start" date is the date of the first referral that starts the pathway. The "clock stop" date is when the patient either: receives treatment; declines treatment; enters a period of active monitoring; no longer requires treatment; or dies. The time spent waiting is the difference in these two dates.
A patient may have multiple rows if they have multiple completed RTT pathways; however, there is only one row per unique pathway. Because referral identifiers aren't necessarily unique between hospitals, unique RTT pathways can be identified using a combination of:
pseudo_organisation_code_patient_pathway_identifier_issuer
pseudo_patient_pathway_identifier
pseudo_referral_identifier
referral_to_treatment_period_start_date
For more information, see "Consultant-led Referral to Treatment Waiting Times Rules and Guidance".
-
activity_treatment_function_code
🔗
string
-
The treatment function
- Matches regular expression:
[a-zA-Z0-9]{3}
- Matches regular expression:
-
priority_type_code
🔗
string
-
The priority type
- Possible values:
routine
,urgent
,two week wait
- Possible values:
-
pseudo_organisation_code_patient_pathway_identifier_issuer
🔗
string
-
pseudo_patient_pathway_identifier
🔗
string
-
pseudo_referral_identifier
🔗
string
-
referral_request_received_date
🔗
date
-
The date the referral was received, for the referral that started the original pathway
-
referral_to_treatment_period_end_date
🔗
date
-
Clock stop for the completed pathway
-
referral_to_treatment_period_start_date
🔗
date
-
Clock start for the completed pathway
-
source_of_referral_for_outpatients
🔗
string
-
waiting_list_type
🔗
string
-
The waiting list type on completion of the pathway
- Possible values:
ORTT
,IRTT
- Possible values:
-
week_ending_date
🔗
date
-
The Sunday of the week that the pathway relates to
many rows per patient
wl_clockstops_raw🔗
National Waiting List Clock Stops
Unlike wl_clockstops
,
the columns in this table have the same data types as the columns in the associated
database table. The three "pseudo" columns are small
exceptions, as they are converted from binary columns to string columns.
-
activity_treatment_function_code
🔗
string
-
priority_type_code
🔗
string
-
pseudo_organisation_code_patient_pathway_identifier_issuer
🔗
string
-
pseudo_patient_pathway_identifier
🔗
string
-
pseudo_referral_identifier
🔗
string
-
referral_request_received_date
🔗
string
-
referral_to_treatment_period_end_date
🔗
string
-
referral_to_treatment_period_start_date
🔗
string
-
source_of_referral_for_outpatients
🔗
string
-
waiting_list_type
🔗
string
-
week_ending_date
🔗
string
many rows per patient
wl_openpathways🔗
National Waiting List Open Pathways
This dataset contains all people on open (incomplete) RTT or not current RTT (non-RTT) pathways as of May 2022. It is a snapshot of everyone still awaiting treatment as of May 2022 (i.e., the clock hasn't stopped). Patients referred for non-emergency consultant-led treatment are on RTT pathways, while patients referred for non-consultant-led treatment are on non-RTT pathways. For each pathway, there is one row for every week that the patient is still waiting. Because referral identifiers aren't necessarily unique between hospitals, unique RTT pathways can be identified using a combination of:
pseudo_organisation_code_patient_pathway_identifier_issuer
pseudo_patient_pathway_identifier
pseudo_referral_identifier
referral_to_treatment_period_start_date
For more information, see "Consultant-led Referral to Treatment Waiting Times Rules and Guidance".
-
activity_treatment_function_code
🔗
string
-
The treatment function
- Matches regular expression:
[a-zA-Z0-9]{3}
- Matches regular expression:
-
current_pathway_period_start_date
🔗
date
-
Latest clock start for this pathway period
-
priority_type_code
🔗
string
-
The priority type
- Possible values:
routine
,urgent
,two week wait
- Possible values:
-
pseudo_organisation_code_patient_pathway_identifier_issuer
🔗
string
-
pseudo_patient_pathway_identifier
🔗
string
-
pseudo_referral_identifier
🔗
string
-
referral_request_received_date
🔗
date
-
The date the referral was received, for the referral that started the original pathway
-
referral_to_treatment_period_end_date
🔗
date
-
If the pathway is open, then
NULL
-
referral_to_treatment_period_start_date
🔗
date
-
Latest clock start for this pathway. If the pathway is not a current pathway, then
NULL
. -
source_of_referral
🔗
string
-
National referral source code for the referral that created the original pathway
- Matches regular expression:
[a-zA-Z0-9]{2}
- Matches regular expression:
-
waiting_list_type
🔗
string
-
- Possible values:
ORTT
,IRTT
,ONON
,INON
- Possible values:
-
week_ending_date
🔗
date
-
The Sunday of the week that the pathway relates to
many rows per patient
wl_openpathways_raw🔗
National Waiting List Open Pathways
Unlike wl_openpathways
,
the columns in this table have the same data types as the columns in the associated
database table. The three "pseudo" columns are small
exceptions, as they are converted from binary columns to string columns.
-
activity_treatment_function_code
🔗
string
-
current_pathway_period_start_date
🔗
string
-
priority_type_code
🔗
string
-
pseudo_organisation_code_patient_pathway_identifier_issuer
🔗
string
-
pseudo_patient_pathway_identifier
🔗
string
-
pseudo_referral_identifier
🔗
string
-
referral_request_received_date
🔗
string
-
referral_to_treatment_period_end_date
🔗
string
-
referral_to_treatment_period_start_date
🔗
string
-
source_of_referral
🔗
string
-
waiting_list_type
🔗
string
-
week_ending_date
🔗
string