Arrow Query API

The Arrow Query API provides a high-performance HTTP endpoint that returns query results in Apache Arrow stream format. This is the recommended path for analytical workloads, ML feature pipelines, and any use case where columnar data is preferred over row-based Postgres wire protocol results.

Endpoint

POST /v1/query/arrow

Base URL: http://<proxy-host>:<proxy-port> (default: http://localhost:5439)

Request

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api_key>`
`Content-Type`	Yes	`application/json`
`X-Agent-Id`	No	Override the agent ID for this query (default: derived from API key)

Body

{
  "sql": "SELECT * FROM orders WHERE total > 100 LIMIT 1000",
  "parameters": [],
  "timeout_ms": 30000
}

Field	Type	Required	Description
`sql`	string	Yes	The SQL query to execute
`parameters`	array	No	Positional parameters for prepared statements
`timeout_ms`	integer	No	Query timeout in milliseconds (default: 30000)

Example Request

curl -X POST http://localhost:5439/v1/query/arrow \
  -H "Authorization: Bearer hd_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY customer_id ORDER BY revenue DESC LIMIT 100"
  }' \
  --output result.arrow

Response

Success (200 OK)

The response body is an Arrow stream (binary). Response headers include metadata:

Header	Description
`Content-Type`	`application/vnd.apache.arrow.stream`
`X-Row-Count`	Total number of rows in the result
`X-Column-Count`	Number of columns in the result
`X-Query-Id`	Unique query identifier for audit
`X-Credits-Consumed`	Query credits consumed
`X-Cache-Hit`	`true` if the transpilation cache was hit
`X-Duration-Ms`	Server-side query execution time in milliseconds

Error Responses

Status	Body	Description
400	`{"error": "SQL parse error: ..."}`	Invalid SQL
401	`{"error": "Invalid API key"}`	Authentication failed
403	`{"error": "Permission denied"}`	Agent lacks required permissions
408	`{"error": "Query timeout"}`	Query exceeded `timeout_ms`
429	`{"error": "Rate limit exceeded"}`	Too many requests
500	`{"error": "Internal server error"}`	Server error

Python Client

PyArrow

import pyarrow as pa
import pyarrow.ipc as ipc
import requests

def arrow_query(sql: str, host: str = "localhost", port: int = 5439, api_key: str = "") -> pa.Table:
    """Execute a SQL query and return an Arrow Table."""
    response = requests.post(
        f"http://{host}:{port}/v1/query/arrow",
        json={"sql": sql},
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        },
    )
    response.raise_for_status()

    reader = ipc.open_stream(response.content)
    table = reader.read_all()

    # Metadata from response headers
    row_count = response.headers.get("X-Row-Count")
    duration_ms = response.headers.get("X-Duration-Ms")
    print(f"Rows: {row_count}, Duration: {duration_ms}ms")

    return table

Polars

import polars as pl

def polars_query(sql: str, **kwargs) -> pl.DataFrame:
    """Execute a SQL query and return a Polars DataFrame."""
    return pl.from_arrow(arrow_query(sql, **kwargs))

df = polars_query(
    "SELECT * FROM orders WHERE order_date >= '2025-01-01'",
    api_key="hd_live_your_api_key",
)
print(df.describe())

Pandas

import pandas as pd

def pandas_query(sql: str, **kwargs) -> pd.DataFrame:
    """Execute a SQL query and return a Pandas DataFrame."""
    table = arrow_query(sql, **kwargs)
    return table.to_pandas(types_mapper=pd.ArrowDtype)

TypeScript Client

import { tableFromIPC } from 'apache-arrow';

async function arrowQuery(sql: string): Promise<any> {
  const response = await fetch('http://localhost:5439/v1/query/arrow', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.HATIDATA_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ sql }),
  });

  if (!response.ok) {
    throw new Error(`Query failed: ${response.status}`);
  }

  const buffer = await response.arrayBuffer();
  const table = tableFromIPC(new Uint8Array(buffer));

  return {
    table,
    rowCount: Number(response.headers.get('X-Row-Count')),
    durationMs: Number(response.headers.get('X-Duration-Ms')),
  };
}

const { table, rowCount, durationMs } = await arrowQuery(
  'SELECT customer_id, SUM(total) AS revenue FROM orders GROUP BY customer_id'
);
console.log(`${rowCount} rows in ${durationMs}ms`);

Parameterized Queries

Use positional parameters to prevent SQL injection:

response = requests.post(
    f"http://localhost:5439/v1/query/arrow",
    json={
        "sql": "SELECT * FROM orders WHERE customer_id = $1 AND total > $2",
        "parameters": ["cust-123", 100.0],
    },
    headers={"Authorization": f"Bearer {api_key}"},
)

Parameters are bound server-side before query execution. All standard parameter types are accepted: strings, integers, floats, booleans, and timestamps.

Semantic Search via Arrow

The Arrow endpoint supports semantic_match() and semantic_rank() functions:

table = arrow_query("""
    SELECT
        memory_id,
        content,
        semantic_rank(content, 'revenue growth patterns') AS relevance
    FROM _hatidata_agent_memory
    WHERE agent_id = 'analyst-agent'
      AND semantic_match(content, 'revenue growth patterns', 0.7)
    ORDER BY relevance DESC
    LIMIT 50
""")

df = pl.from_arrow(table)
print(df.head(10))

Performance Notes

Aspect	Detail
Format overhead	Arrow is a zero-copy format -- no serialization/deserialization overhead
Compression	Not compressed by default; use gzip Accept-Encoding for network savings
Max result size	100 MB default (configurable)
Concurrency	Same semaphore as Postgres wire protocol queries
Caching	Transpilation cache applies; query engine result cache applies

For datasets larger than the max response size, use LIMIT/OFFSET pagination or export to Parquet.

Arrow & Polars Integration -- Setup and usage patterns
Arrow Query Recipes -- Advanced recipes
SQL Functions & Types -- Full SQL reference
Query Pipeline -- How queries flow through the proxy
Postgres Drivers -- Row-based alternative

Endpoint​

Request​

Headers​

Body​

Example Request​

Response​

Success (200 OK)​

Error Responses​

Python Client​

PyArrow​

Polars​

Pandas​

TypeScript Client​

Parameterized Queries​

Semantic Search via Arrow​

Performance Notes​

Related Concepts​

Stay in the loop