Binary Wire Format

PostgreSQL clients can request query results in two formats: text (human-readable strings) and binary (compact, machine-readable bytes). Text is the default and works universally, but many modern drivers — tokio-postgres, sqlx, asyncpg, node-postgres — use the extended query protocol (Parse/Bind/Execute) and request binary format for performance-sensitive types like timestamps, integers, and UUIDs.

HatiData's proxy supports both. When a client's Bind message requests binary result columns, the proxy encodes values in the exact binary representation that PostgreSQL clients expect.

How It Works

The encoding sits between query execution and result delivery. DuckDB executes queries and produces text-formatted values. When the client requests binary format, the proxy re-encodes those values before sending them over the wire.

Client                      HatiData Proxy                    DuckDB
  |                              |                               |
  |-- Parse (SQL) ------------->|                               |
  |<------------ ParseComplete -|                               |
  |                              |                               |
  |-- Bind (result_formats) --->|  stores format codes          |
  |<------------ BindComplete --|  in portal                    |
  |                              |                               |
  |-- Describe (Portal) ------->|  sets RowDescription.format   |
  |<------ RowDescription ------|  per column (0=text, 1=binary)|
  |                              |                               |
  |-- Execute ----------------->|------------- SQL ------------>|
  |                              |<---- text values ------------|
  |                              |                               |
  |                              |  re-encode to binary          |
  |<-------------- DataRow -----|  where requested               |
  |<------ CommandComplete -----|                               |

Format Code Semantics

The Bind message includes result format codes following the PostgreSQL v3 protocol spec:

Format codes sent	Meaning
0 codes	All columns text (default)
1 code	Apply that format to every column
N codes (one per column)	Per-column format selection

A format code of 0 means text; 1 means binary. Clients can mix formats within a single query result.

Supported Types

The proxy supports binary encoding for 13 PostgreSQL type OIDs:

Type	OID	Binary Format	Size
`BOOL`	16	`0x01` (true) or `0x00` (false)	1 byte
`INT2` (smallint)	21	Signed 16-bit big-endian	2 bytes
`INT4` (integer)	23	Signed 32-bit big-endian	4 bytes
`INT8` (bigint)	20	Signed 64-bit big-endian	8 bytes
`FLOAT4` (real)	700	IEEE 754 single-precision big-endian	4 bytes
`FLOAT8` (double)	701	IEEE 754 double-precision big-endian	8 bytes
`DATE`	1082	Signed 32-bit days since 2000-01-01	4 bytes
`TIMESTAMP`	1114	Signed 64-bit microseconds since 2000-01-01 00:00:00	8 bytes
`TIMESTAMPTZ`	1184	Signed 64-bit microseconds since 2000-01-01 00:00:00 UTC	8 bytes
`UUID`	2950	16 raw bytes	16 bytes
`BYTEA`	17	Raw bytes (hex-decoded if `\x` prefixed)	Variable
`TEXT`	25	UTF-8 bytes (same as text format)	Variable
`VARCHAR`	1043	UTF-8 bytes (same as text format)	Variable

Unsupported Types

These types are not supported in binary format and automatically fall back to text:

Type	OID	Why
`NUMERIC` / `DECIMAL`	1700	Complex variable-length BCD encoding
`INTERVAL`	1186	16-byte composite (microseconds + days + months)
`ARRAY` types	Various	Recursive encoding with element OIDs
`JSON` / `JSONB`	114 / 3802	Version-prefixed binary format
`STRUCT` / `MAP`	DuckDB-specific	No PostgreSQL binary equivalent

When a client requests binary for an unsupported type, the proxy does two things:

RowDescription advertises format = 0 (text) for that column, overriding the client's request.
DataRow sends the value as text bytes.

This means the client sees the correction in the RowDescription and knows to decode that column as text. The result set can mix binary and text columns freely — this is explicitly allowed by the PostgreSQL protocol.

Edge Cases

TIMESTAMPTZ and Timezone Normalization

PostgreSQL binary TIMESTAMPTZ is always stored as microseconds since 2000-01-01 00:00:00 UTC. When DuckDB returns a text value with a timezone offset (e.g., 2026-01-15 10:30:00+05:30), the proxy:

Parses the timezone offset (+05:30 = +19,800 seconds)
Subtracts the offset to normalize to UTC
Encodes the UTC microseconds in binary

This means 2026-01-15 10:30:00+05:30 becomes the same binary value as 2026-01-15 05:00:00+00:00. The client's driver handles display-time timezone conversion.

Supported offset formats: Z, +HH, +HH:MM, +HHMM, -HH, -HH:MM.

DATE Epoch

PostgreSQL's binary date epoch is 2000-01-01 (not Unix 1970-01-01). The proxy converts correctly:

2000-01-01 = 0 (4 bytes, all zeros)
1999-12-31 = -1
2026-03-09 = 9564

Float Special Values

IEEE 754 special values are handled correctly in binary encoding:

NaN encodes as quiet NaN (0x7FC00000 for float4, 0x7FF8000000000000 for float8)
Infinity and -Infinity encode as IEEE 754 infinity
Rust's f32::parse() and f64::parse() produce the correct bit patterns, which match what PostgreSQL clients expect

NULL Values

NULL values are always encoded as a -1 length prefix in the DataRow message, regardless of whether the column format is text or binary. No type-specific encoding is needed for NULLs.

Describe-less Execute

The PostgreSQL protocol allows clients to skip the Describe step (Parse, Bind, Execute without Describe). In this case, no RowDescription is sent, so the client never sees the proxy's correction of unsupported types back to text format. The proxy still sends text bytes for unsupported types — this is the only spec-compliant option since Execute must not send RowDescription.

In practice, all major drivers (tokio-postgres, sqlx, asyncpg, node-postgres) send Describe before Execute, so they receive the corrected RowDescription and handle format fallbacks correctly.

Performance

The binary encoding path is designed for zero regression on the common case:

Simple query protocol (psql, most BI tools): Entirely untouched. All results are text, same as before.
Extended query, text format: When format codes are all text (or absent), results pass through without any re-encoding.
Extended query, binary format: Values are re-encoded from text to binary. This adds per-row overhead for the text-to-binary conversion, but eliminates the much larger cost of client-side text parsing for types like timestamps and integers.

For a typical query returning 10,000 rows with 5 integer columns and 2 timestamp columns, the binary path saves the client from parsing 70,000 text strings into native types — a net performance win despite the server-side re-encoding.

Driver-Specific Notes

tokio-postgres (Rust)

Requests binary format for all supported types by default. HatiData handles this correctly. No configuration needed.

let row = client.query_one("SELECT created_at FROM events LIMIT 1", &[]).await?;
let ts: chrono::NaiveDateTime = row.get(0); // binary TIMESTAMP decoded correctly

sqlx (Rust)

Uses the extended query protocol with binary format. Verifies that RowDescription format codes match its expectations before decoding. HatiData's Describe handler advertises the correct format per column, so sqlx works without issues.

let row: (i64, String, chrono::NaiveDateTime) =
    sqlx::query_as("SELECT id, name, created_at FROM users LIMIT 1")
        .fetch_one(&pool)
        .await?;

asyncpg (Python)

Requests binary format for all result columns. Handles the format correction in RowDescription transparently — if a column is advertised as text, asyncpg decodes it as text even though binary was requested.

row = await conn.fetchrow("SELECT id, created_at FROM events LIMIT 1")
print(row['created_at'])  # datetime object, decoded from binary TIMESTAMP

psycopg2 / psycopg3 (Python)

Uses the simple query protocol by default (all text). Binary format is not requested unless explicitly configured with cursor_factory=RealDictCursor and binary mode. No change in behavior.

node-postgres (Node.js)

Uses text format by default. Binary format can be enabled per-query with { rowMode: 'array', types: ... }. When binary is requested, HatiData encodes correctly.

Query Pipeline — The full execution pipeline that produces the text values before binary encoding
Postgres Drivers & BI Tools — Connection setup for all major drivers and BI tools
Concurrency Model — How concurrent connections are managed

How It Works​

Format Code Semantics​

Supported Types​

Unsupported Types​

Edge Cases​

TIMESTAMPTZ and Timezone Normalization​

DATE Epoch​

Float Special Values​

NULL Values​

Describe-less Execute​

Performance​

Driver-Specific Notes​

tokio-postgres (Rust)​

sqlx (Rust)​

asyncpg (Python)​

psycopg2 / psycopg3 (Python)​

node-postgres (Node.js)​

Related Concepts​

Stay in the loop