Decoding ER7¶
hl7types decodes ER7 wire strings into fully typed Pydantic models through a multi-stage
pipeline.
Segment splitting¶
The first step of the encoder is splitting the ER7 wire string into individual segment strings. The HL7 v2
specification mandates a carriage return (\r) as the segment terminator. In
practice the decoder is tolerant of \n and \r\n as well, but any system producing or
consuming HL7 over a real interface will use \r.
Encoding character detection¶
Before any field can be parsed, the decoder needs to know which characters serve as delimiters. These are declared in the opening delimiter segment, always MSH, FHS, or BHS.
The field separator is read directly from position 3 of the segment string (not from a parsed
field, it cannot be, since it defines how fields are parsed). The remaining four encoding
characters, component separator, repetition separator, escape character, and subcomponent
separator, are read from the next field, conventionally ^~\&.
The defaults are:
|: field separator^: component separator~: repetition separator\: escape character&: subcomponent separator
Truncation character (v2.7+)¶
HL7 v2.7 introduced a sixth encoding character, the truncation character, appended to MSH.2
to give a five-character string such as ^~\&#. When present, it marks the point at which a
field value has been truncated by the sender. The decoder strips any trailing truncation
characters from field values after unescaping.
The truncation character is only permitted in messages that declare version 2.7 or later in
MSH.12. Passing a five-character MSH.2 with an earlier version raises a ValueError.
Four-character MSH.2 strings are accepted at any version and produce no truncation
behaviour.
When encoding back to ER7, the truncation character is included in MSH.2 automatically if
it was present in the original message. Any literal occurrence of the truncation character
in a field value is escaped to prevent ambiguity.
Non-standard delimiters are fully supported. If a message arrives with a different field separator the decoder detects it from the raw bytes and applies it consistently throughout.
Message type resolution¶
When no message class is provided to decode_er7, the decoder resolves one automatically by
inspecting MSH:
MSH.9 carries the message type and trigger event (e.g.
ADT^A01^ADT_A01). The decoder prefers the third component (the explicit structure name) if present, otherwise constructs it from the first two components.MSH.12 carries the HL7 version (e.g.
2.5.1), which determines which version sub-package to import from.
The resolved class is then imported dynamically, so passing an ER7 wire string containing
MSH.9 = ADT^A01 and MSH.12 = 2.5.1 will automatically load
hl7types.hl7.v2_5_1.messages.ADT_A01. An unknown version or message type raises a
ValueError at this point rather than at field access time.
Field and component parsing¶
With encoding characters known and a model class resolved, each segment is tokenised and matched
against the model’s field positions. Field positions are derived from the Pydantic
serialization_alias on each field, MSH.9 maps to position 9, PID.3 to position 3,
and so on.
The parsing hierarchy mirrors the HL7 wire hierarchy:
Fields are split on the field separator (
|).Repetitions within a field are split on the repetition separator (
~). Trailing empty repetitions are dropped rather than preserved as empty strings.Components within a repetition are split on the component separator (
^).Subcomponents within a component are split on the subcomponent separator (
&).
Whether a field is expected to be a scalar, a composite datatype (another HL7Model), or a
list of either is determined entirely by the type annotations on the model, no runtime
configuration is needed.
Escape sequences¶
Escape sequences in field values are resolved after tokenisation. The five standard sequences are:
\F\to|(field separator)\S\to^(component separator)\T\to&(subcomponent separator)\R\to~(repetition separator)\E\to\(escape character)
This behaviour is derived from HAPI (the reference Java HL7 implementation). Unknown escape
sequences, such as \H\, \N\, or \.br\, which are presentation-layer formatting
hints are left untouched rather than raising an error.
See Validation for details on field-level validation and strict verses lenient decoding.