Skip to main content

Binary Wire Format

PostgreSQL clients can request query results in two formats: text (human-readable strings) and binary (compact, machine-readable bytes). Text is the default and works universally, but many modern drivers — tokio-postgres, sqlx, asyncpg, node-postgres — use the extended query protocol (Parse/Bind/Execute) and request binary format for performance-sensitive types like timestamps, integers, and UUIDs.

HatiData's proxy supports both. When a client's Bind message requests binary result columns, the proxy encodes values in the exact binary representation that PostgreSQL clients expect.

How It Works

The encoding sits between query execution and result delivery. DuckDB executes queries and produces text-formatted values. When the client requests binary format, the proxy re-encodes those values before sending them over the wire.

Client                      HatiData Proxy                    DuckDB
| | |
|-- Parse (SQL) ------------->| |
|<------------ ParseComplete -| |
| | |
|-- Bind (result_formats) --->| stores format codes |
|<------------ BindComplete --| in portal |
| | |
|-- Describe (Portal) ------->| sets RowDescription.format |
|<------ RowDescription ------| per column (0=text, 1=binary)|
| | |
|-- Execute ----------------->|------------- SQL ------------>|
| |<---- text values ------------|
| | |
| | re-encode to binary |
|<-------------- DataRow -----| where requested |
|<------ CommandComplete -----| |

Format Code Semantics

The Bind message includes result format codes following the PostgreSQL v3 protocol spec:

Format codes sentMeaning
0 codesAll columns text (default)
1 codeApply that format to every column
N codes (one per column)Per-column format selection

A format code of 0 means text; 1 means binary. Clients can mix formats within a single query result.

Supported Types

The proxy supports binary encoding for 13 PostgreSQL type OIDs:

TypeOIDBinary FormatSize
BOOL160x01 (true) or 0x00 (false)1 byte
INT2 (smallint)21Signed 16-bit big-endian2 bytes
INT4 (integer)23Signed 32-bit big-endian4 bytes
INT8 (bigint)20Signed 64-bit big-endian8 bytes
FLOAT4 (real)700IEEE 754 single-precision big-endian4 bytes
FLOAT8 (double)701IEEE 754 double-precision big-endian8 bytes
DATE1082Signed 32-bit days since 2000-01-014 bytes
TIMESTAMP1114Signed 64-bit microseconds since 2000-01-01 00:00:008 bytes
TIMESTAMPTZ1184Signed 64-bit microseconds since 2000-01-01 00:00:00 UTC8 bytes
UUID295016 raw bytes16 bytes
BYTEA17Raw bytes (hex-decoded if \x prefixed)Variable
TEXT25UTF-8 bytes (same as text format)Variable
VARCHAR1043UTF-8 bytes (same as text format)Variable

Unsupported Types

These types are not supported in binary format and automatically fall back to text:

TypeOIDWhy
NUMERIC / DECIMAL1700Complex variable-length BCD encoding
INTERVAL118616-byte composite (microseconds + days + months)
ARRAY typesVariousRecursive encoding with element OIDs
JSON / JSONB114 / 3802Version-prefixed binary format
STRUCT / MAPDuckDB-specificNo PostgreSQL binary equivalent

When a client requests binary for an unsupported type, the proxy does two things:

  1. RowDescription advertises format = 0 (text) for that column, overriding the client's request.
  2. DataRow sends the value as text bytes.

This means the client sees the correction in the RowDescription and knows to decode that column as text. The result set can mix binary and text columns freely — this is explicitly allowed by the PostgreSQL protocol.

Edge Cases

TIMESTAMPTZ and Timezone Normalization

PostgreSQL binary TIMESTAMPTZ is always stored as microseconds since 2000-01-01 00:00:00 UTC. When DuckDB returns a text value with a timezone offset (e.g., 2026-01-15 10:30:00+05:30), the proxy:

  1. Parses the timezone offset (+05:30 = +19,800 seconds)
  2. Subtracts the offset to normalize to UTC
  3. Encodes the UTC microseconds in binary

This means 2026-01-15 10:30:00+05:30 becomes the same binary value as 2026-01-15 05:00:00+00:00. The client's driver handles display-time timezone conversion.

Supported offset formats: Z, +HH, +HH:MM, +HHMM, -HH, -HH:MM.

DATE Epoch

PostgreSQL's binary date epoch is 2000-01-01 (not Unix 1970-01-01). The proxy converts correctly:

  • 2000-01-01 = 0 (4 bytes, all zeros)
  • 1999-12-31 = -1
  • 2026-03-09 = 9564

Float Special Values

IEEE 754 special values are handled correctly in binary encoding:

  • NaN encodes as quiet NaN (0x7FC00000 for float4, 0x7FF8000000000000 for float8)
  • Infinity and -Infinity encode as IEEE 754 infinity
  • Rust's f32::parse() and f64::parse() produce the correct bit patterns, which match what PostgreSQL clients expect

NULL Values

NULL values are always encoded as a -1 length prefix in the DataRow message, regardless of whether the column format is text or binary. No type-specific encoding is needed for NULLs.

Describe-less Execute

The PostgreSQL protocol allows clients to skip the Describe step (Parse, Bind, Execute without Describe). In this case, no RowDescription is sent, so the client never sees the proxy's correction of unsupported types back to text format. The proxy still sends text bytes for unsupported types — this is the only spec-compliant option since Execute must not send RowDescription.

In practice, all major drivers (tokio-postgres, sqlx, asyncpg, node-postgres) send Describe before Execute, so they receive the corrected RowDescription and handle format fallbacks correctly.

Performance

The binary encoding path is designed for zero regression on the common case:

  • Simple query protocol (psql, most BI tools): Entirely untouched. All results are text, same as before.
  • Extended query, text format: When format codes are all text (or absent), results pass through without any re-encoding.
  • Extended query, binary format: Values are re-encoded from text to binary. This adds per-row overhead for the text-to-binary conversion, but eliminates the much larger cost of client-side text parsing for types like timestamps and integers.

For a typical query returning 10,000 rows with 5 integer columns and 2 timestamp columns, the binary path saves the client from parsing 70,000 text strings into native types — a net performance win despite the server-side re-encoding.

Driver-Specific Notes

tokio-postgres (Rust)

Requests binary format for all supported types by default. HatiData handles this correctly. No configuration needed.

let row = client.query_one("SELECT created_at FROM events LIMIT 1", &[]).await?;
let ts: chrono::NaiveDateTime = row.get(0); // binary TIMESTAMP decoded correctly

sqlx (Rust)

Uses the extended query protocol with binary format. Verifies that RowDescription format codes match its expectations before decoding. HatiData's Describe handler advertises the correct format per column, so sqlx works without issues.

let row: (i64, String, chrono::NaiveDateTime) =
sqlx::query_as("SELECT id, name, created_at FROM users LIMIT 1")
.fetch_one(&pool)
.await?;

asyncpg (Python)

Requests binary format for all result columns. Handles the format correction in RowDescription transparently — if a column is advertised as text, asyncpg decodes it as text even though binary was requested.

row = await conn.fetchrow("SELECT id, created_at FROM events LIMIT 1")
print(row['created_at']) # datetime object, decoded from binary TIMESTAMP

psycopg2 / psycopg3 (Python)

Uses the simple query protocol by default (all text). Binary format is not requested unless explicitly configured with cursor_factory=RealDictCursor and binary mode. No change in behavior.

node-postgres (Node.js)

Uses text format by default. Binary format can be enabled per-query with { rowMode: 'array', types: ... }. When binary is requested, HatiData encodes correctly.

Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.