Binary Wire Format
PostgreSQL clients can request query results in two formats: text (human-readable strings) and binary (compact, machine-readable bytes). Text is the default and works universally, but many modern drivers — tokio-postgres, sqlx, asyncpg, node-postgres — use the extended query protocol (Parse/Bind/Execute) and request binary format for performance-sensitive types like timestamps, integers, and UUIDs.
HatiData's proxy supports both. When a client's Bind message requests binary result columns, the proxy encodes values in the exact binary representation that PostgreSQL clients expect.
How It Works
The encoding sits between query execution and result delivery. DuckDB executes queries and produces text-formatted values. When the client requests binary format, the proxy re-encodes those values before sending them over the wire.
Client HatiData Proxy DuckDB
| | |
|-- Parse (SQL) ------------->| |
|<------------ ParseComplete -| |
| | |
|-- Bind (result_formats) --->| stores format codes |
|<------------ BindComplete --| in portal |
| | |
|-- Describe (Portal) ------->| sets RowDescription.format |
|<------ RowDescription ------| per column (0=text, 1=binary)|
| | |
|-- Execute ----------------->|------------- SQL ------------>|
| |<---- text values ------------|
| | |
| | re-encode to binary |
|<-------------- DataRow -----| where requested |
|<------ CommandComplete -----| |
Format Code Semantics
The Bind message includes result format codes following the PostgreSQL v3 protocol spec:
| Format codes sent | Meaning |
|---|---|
| 0 codes | All columns text (default) |
| 1 code | Apply that format to every column |
| N codes (one per column) | Per-column format selection |
A format code of 0 means text; 1 means binary. Clients can mix formats within a single query result.
Supported Types
The proxy supports binary encoding for 13 PostgreSQL type OIDs:
| Type | OID | Binary Format | Size |
|---|---|---|---|
BOOL | 16 | 0x01 (true) or 0x00 (false) | 1 byte |
INT2 (smallint) | 21 | Signed 16-bit big-endian | 2 bytes |
INT4 (integer) | 23 | Signed 32-bit big-endian | 4 bytes |
INT8 (bigint) | 20 | Signed 64-bit big-endian | 8 bytes |
FLOAT4 (real) | 700 | IEEE 754 single-precision big-endian | 4 bytes |
FLOAT8 (double) | 701 | IEEE 754 double-precision big-endian | 8 bytes |
DATE | 1082 | Signed 32-bit days since 2000-01-01 | 4 bytes |
TIMESTAMP | 1114 | Signed 64-bit microseconds since 2000-01-01 00:00:00 | 8 bytes |
TIMESTAMPTZ | 1184 | Signed 64-bit microseconds since 2000-01-01 00:00:00 UTC | 8 bytes |
UUID | 2950 | 16 raw bytes | 16 bytes |
BYTEA | 17 | Raw bytes (hex-decoded if \x prefixed) | Variable |
TEXT | 25 | UTF-8 bytes (same as text format) | Variable |
VARCHAR | 1043 | UTF-8 bytes (same as text format) | Variable |
Unsupported Types
These types are not supported in binary format and automatically fall back to text:
| Type | OID | Why |
|---|---|---|
NUMERIC / DECIMAL | 1700 | Complex variable-length BCD encoding |
INTERVAL | 1186 | 16-byte composite (microseconds + days + months) |
ARRAY types | Various | Recursive encoding with element OIDs |
JSON / JSONB | 114 / 3802 | Version-prefixed binary format |
STRUCT / MAP | DuckDB-specific | No PostgreSQL binary equivalent |
When a client requests binary for an unsupported type, the proxy does two things:
- RowDescription advertises
format = 0(text) for that column, overriding the client's request. - DataRow sends the value as text bytes.
This means the client sees the correction in the RowDescription and knows to decode that column as text. The result set can mix binary and text columns freely — this is explicitly allowed by the PostgreSQL protocol.
Edge Cases
TIMESTAMPTZ and Timezone Normalization
PostgreSQL binary TIMESTAMPTZ is always stored as microseconds since 2000-01-01 00:00:00 UTC. When DuckDB returns a text value with a timezone offset (e.g., 2026-01-15 10:30:00+05:30), the proxy:
- Parses the timezone offset (
+05:30= +19,800 seconds) - Subtracts the offset to normalize to UTC
- Encodes the UTC microseconds in binary
This means 2026-01-15 10:30:00+05:30 becomes the same binary value as 2026-01-15 05:00:00+00:00. The client's driver handles display-time timezone conversion.
Supported offset formats: Z, +HH, +HH:MM, +HHMM, -HH, -HH:MM.
DATE Epoch
PostgreSQL's binary date epoch is 2000-01-01 (not Unix 1970-01-01). The proxy converts correctly:
2000-01-01=0(4 bytes, all zeros)1999-12-31=-12026-03-09=9564
Float Special Values
IEEE 754 special values are handled correctly in binary encoding:
NaNencodes as quiet NaN (0x7FC00000for float4,0x7FF8000000000000for float8)Infinityand-Infinityencode as IEEE 754 infinity- Rust's
f32::parse()andf64::parse()produce the correct bit patterns, which match what PostgreSQL clients expect
NULL Values
NULL values are always encoded as a -1 length prefix in the DataRow message, regardless of whether the column format is text or binary. No type-specific encoding is needed for NULLs.
Describe-less Execute
The PostgreSQL protocol allows clients to skip the Describe step (Parse, Bind, Execute without Describe). In this case, no RowDescription is sent, so the client never sees the proxy's correction of unsupported types back to text format. The proxy still sends text bytes for unsupported types — this is the only spec-compliant option since Execute must not send RowDescription.
In practice, all major drivers (tokio-postgres, sqlx, asyncpg, node-postgres) send Describe before Execute, so they receive the corrected RowDescription and handle format fallbacks correctly.
Performance
The binary encoding path is designed for zero regression on the common case:
- Simple query protocol (psql, most BI tools): Entirely untouched. All results are text, same as before.
- Extended query, text format: When format codes are all text (or absent), results pass through without any re-encoding.
- Extended query, binary format: Values are re-encoded from text to binary. This adds per-row overhead for the text-to-binary conversion, but eliminates the much larger cost of client-side text parsing for types like timestamps and integers.
For a typical query returning 10,000 rows with 5 integer columns and 2 timestamp columns, the binary path saves the client from parsing 70,000 text strings into native types — a net performance win despite the server-side re-encoding.
Driver-Specific Notes
tokio-postgres (Rust)
Requests binary format for all supported types by default. HatiData handles this correctly. No configuration needed.
let row = client.query_one("SELECT created_at FROM events LIMIT 1", &[]).await?;
let ts: chrono::NaiveDateTime = row.get(0); // binary TIMESTAMP decoded correctly
sqlx (Rust)
Uses the extended query protocol with binary format. Verifies that RowDescription format codes match its expectations before decoding. HatiData's Describe handler advertises the correct format per column, so sqlx works without issues.
let row: (i64, String, chrono::NaiveDateTime) =
sqlx::query_as("SELECT id, name, created_at FROM users LIMIT 1")
.fetch_one(&pool)
.await?;
asyncpg (Python)
Requests binary format for all result columns. Handles the format correction in RowDescription transparently — if a column is advertised as text, asyncpg decodes it as text even though binary was requested.
row = await conn.fetchrow("SELECT id, created_at FROM events LIMIT 1")
print(row['created_at']) # datetime object, decoded from binary TIMESTAMP
psycopg2 / psycopg3 (Python)
Uses the simple query protocol by default (all text). Binary format is not requested unless explicitly configured with cursor_factory=RealDictCursor and binary mode. No change in behavior.
node-postgres (Node.js)
Uses text format by default. Binary format can be enabled per-query with { rowMode: 'array', types: ... }. When binary is requested, HatiData encodes correctly.
Related Concepts
- Query Pipeline — The full execution pipeline that produces the text values before binary encoding
- Postgres Drivers & BI Tools — Connection setup for all major drivers and BI tools
- Concurrency Model — How concurrent connections are managed