Skip to main content

Arrow Query API

The Arrow Query API provides a high-performance HTTP endpoint that returns query results in Apache Arrow stream format. This is the recommended path for analytical workloads, ML feature pipelines, and any use case where columnar data is preferred over row-based Postgres wire protocol results.

Endpoint

POST /v1/query/arrow

Base URL: http://<proxy-host>:<proxy-port> (default: http://localhost:5439)


Request

Headers

HeaderRequiredDescription
AuthorizationYesBearer <api_key>
Content-TypeYesapplication/json
X-Agent-IdNoOverride the agent ID for this query (default: derived from API key)

Body

{
"sql": "SELECT * FROM orders WHERE total > 100 LIMIT 1000",
"parameters": [],
"timeout_ms": 30000
}
FieldTypeRequiredDescription
sqlstringYesThe SQL query to execute
parametersarrayNoPositional parameters for prepared statements
timeout_msintegerNoQuery timeout in milliseconds (default: 30000)

Example Request

curl -X POST http://localhost:5439/v1/query/arrow \
-H "Authorization: Bearer hd_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY customer_id ORDER BY revenue DESC LIMIT 100"
}' \
--output result.arrow

Response

Success (200 OK)

The response body is an Arrow stream (binary). Response headers include metadata:

HeaderDescription
Content-Typeapplication/vnd.apache.arrow.stream
X-Row-CountTotal number of rows in the result
X-Column-CountNumber of columns in the result
X-Query-IdUnique query identifier for audit
X-Credits-ConsumedQuery credits consumed
X-Cache-Hittrue if the transpilation cache was hit
X-Duration-MsServer-side query execution time in milliseconds

Error Responses

StatusBodyDescription
400{"error": "SQL parse error: ..."}Invalid SQL
401{"error": "Invalid API key"}Authentication failed
403{"error": "Permission denied"}Agent lacks required permissions
408{"error": "Query timeout"}Query exceeded timeout_ms
429{"error": "Rate limit exceeded"}Too many requests
500{"error": "Internal server error"}Server error

Python Client

PyArrow

import pyarrow as pa
import pyarrow.ipc as ipc
import requests

def arrow_query(sql: str, host: str = "localhost", port: int = 5439, api_key: str = "") -> pa.Table:
"""Execute a SQL query and return an Arrow Table."""
response = requests.post(
f"http://{host}:{port}/v1/query/arrow",
json={"sql": sql},
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
)
response.raise_for_status()

reader = ipc.open_stream(response.content)
table = reader.read_all()

# Metadata from response headers
row_count = response.headers.get("X-Row-Count")
duration_ms = response.headers.get("X-Duration-Ms")
print(f"Rows: {row_count}, Duration: {duration_ms}ms")

return table

Polars

import polars as pl

def polars_query(sql: str, **kwargs) -> pl.DataFrame:
"""Execute a SQL query and return a Polars DataFrame."""
return pl.from_arrow(arrow_query(sql, **kwargs))

df = polars_query(
"SELECT * FROM orders WHERE order_date >= '2025-01-01'",
api_key="hd_live_your_api_key",
)
print(df.describe())

Pandas

import pandas as pd

def pandas_query(sql: str, **kwargs) -> pd.DataFrame:
"""Execute a SQL query and return a Pandas DataFrame."""
table = arrow_query(sql, **kwargs)
return table.to_pandas(types_mapper=pd.ArrowDtype)

TypeScript Client

import { tableFromIPC } from 'apache-arrow';

async function arrowQuery(sql: string): Promise<any> {
const response = await fetch('http://localhost:5439/v1/query/arrow', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HATIDATA_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ sql }),
});

if (!response.ok) {
throw new Error(`Query failed: ${response.status}`);
}

const buffer = await response.arrayBuffer();
const table = tableFromIPC(new Uint8Array(buffer));

return {
table,
rowCount: Number(response.headers.get('X-Row-Count')),
durationMs: Number(response.headers.get('X-Duration-Ms')),
};
}

const { table, rowCount, durationMs } = await arrowQuery(
'SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY customer_id'
);
console.log(`${rowCount} rows in ${durationMs}ms`);

Parameterized Queries

Use positional parameters to prevent SQL injection:

response = requests.post(
f"http://localhost:5439/v1/query/arrow",
json={
"sql": "SELECT * FROM orders WHERE customer_id = $1 AND total > $2",
"parameters": ["cust-123", 100.0],
},
headers={"Authorization": f"Bearer {api_key}"},
)

Parameters are bound server-side before query execution. All standard parameter types are accepted: strings, integers, floats, booleans, and timestamps.


Semantic Search via Arrow

The Arrow endpoint supports semantic_match() and semantic_rank() functions:

table = arrow_query("""
SELECT
memory_id,
content,
semantic_rank(content, 'revenue growth patterns') AS relevance
FROM _hatidata_agent_memory
WHERE agent_id = 'analyst-agent'
AND semantic_match(content, 'revenue growth patterns', 0.7)
ORDER BY relevance DESC
LIMIT 50
""")

df = pl.from_arrow(table)
print(df.head(10))

Performance Notes

AspectDetail
Format overheadArrow is a zero-copy format -- no serialization/deserialization overhead
CompressionNot compressed by default; use gzip Accept-Encoding for network savings
Max result size100 MB default (configurable)
ConcurrencySame semaphore as Postgres wire protocol queries
CachingTranspilation cache applies; query engine result cache applies

For datasets larger than the max response size, use LIMIT/OFFSET pagination or export to Parquet.


Stay in the loop

Product updates, engineering deep-dives, and agent-native insights. No spam.