Consumption patterns
This guide covers how to read data from ClickHouse in a connector, BI tool, or analytics integration. It pairs with Ingestion patterns and the integration development best practices index.
For language-specific query APIs, see the client reference pages: Java/JDBC, Python, Go, JavaScript.
Schema discovery
Listing databases
Query system.databases and exclude the built-in namespaces that users never interact with:
Listing tables
total_rows is an estimate for some engines (e.g. ReplicatedMergeTree) and exact for others. It can be NULL for views — handle NULL in your display layer.
Listing columns
The type column gives the full unwrapped type string including LowCardinality and Nullable wrappers (e.g. LowCardinality(Nullable(String))). This is intentional — use it as your source of truth for type mapping. See Stripping type modifiers.
Prefer system tables over INFORMATION_SCHEMA
INFORMATION_SCHEMA exists for SQL-standard compatibility but has several gaps that break connector logic:
- Type fidelity:
INFORMATION_SCHEMA.COLUMNS.DATA_TYPEstrips modifiers.LowCardinality(Nullable(String))may appear as justcharacter varying. Usesystem.columns.typeinstead. - Row counts:
INFORMATION_SCHEMA.TABLES.TABLE_ROWSis always NULL. Usesystem.tables.total_rows. - Foreign keys:
KEY_COLUMN_USAGEandREFERENTIAL_CONSTRAINTSare always empty. ClickHouse has no foreign key constraints. - Primary keys via JDBC:
DatabaseMetaData.getPrimaryKeys()is approximate. ClickHouse has no primary key constraint in the relational sense — the ORDER BY columns act as the physical ordering key. Querysystem.tables.sorting_keyfor the actual sort key string andsystem.columns.is_in_sorting_keyfor per-column flags.
Sorting key vs primary key
ClickHouse MergeTree tables have two related but distinct concepts:
- Sorting key (
ORDER BY): determines the physical sort order on disk. Governs index pruning and merge behavior. - Primary key (
PRIMARY KEY): must be a prefix of the sorting key. If omitted, defaults to the full sorting key.
For query generation — including push-down predicates and sort-order-aware pagination — treat the sorting key as the effective primary key. Use system.columns.is_in_sorting_key to identify which columns participate, and system.tables.sorting_key for the full expression string.
Type mapping
Stripping type modifiers
The raw type string from system.columns includes wrappers that must be stripped before mapping to target types. The two wrappers are:
Nullable(T)— mark the column as nullable, then mapTLowCardinality(T)— dictionary-encoded on disk; treat identically toTfor all practical purposes
Both can nest: LowCardinality(Nullable(String)) → String + nullable.
Strip by repeatedly checking and removing the outermost wrapper:
After stripping, match against the base type name (e.g. String, Int32, DateTime64(3, 'UTC')).
Numeric type mapping
| ClickHouse type | Java | Python | Go |
|---|---|---|---|
Int8 | Byte | int | int8 |
Int16 | Short | int | int16 |
Int32 | Integer | int | int32 |
Int64 | Long | int | int64 |
Int128 / Int256 | BigInteger | int | *big.Int |
UInt8 | Short | int | uint8 |
UInt16 | Integer | int | uint16 |
UInt32 | Long | int | uint32 |
UInt64 | BigInteger | int | uint64 |
Float32 | Float | float | float32 |
Float64 | Double | float | float64 |
Decimal(P, S) | BigDecimal | Decimal | *big.Float |
UInt64 exceeds Long.MAX_VALUE (2^63 - 1). In Java, use BigInteger or unsigned long arithmetic. In JSON output, see 64-bit integers in JSON.
String types
String: raw bytes, UTF-8 by convention. Map toString/str/string.FixedString(N): stored as exactly N bytes, zero-padded. Strip trailing null bytes (\x00) before displaying or comparing. Do not expose rawFixedStringvalues to end users without stripping.Enum8/Enum16: returned as their string labels by default over HTTP and JDBC. Access the underlying integer withCAST(col AS Int8)/CAST(col AS Int16).
Date and time types
| ClickHouse type | Java | Python | Go |
|---|---|---|---|
Date | LocalDate | datetime.date | time.Time (date only) |
Date32 | LocalDate | datetime.date | time.Time (date only) |
DateTime | ZonedDateTime | datetime (with tz) | time.Time |
DateTime64(n) | ZonedDateTime | datetime (with tz) | time.Time |
Time | LocalTime | datetime.time | time.Duration |
Time64(n) | LocalTime | datetime.time | time.Duration |
Time and Time64 are new in ClickHouse 25.6. Check server version before relying on them.
DateTime and DateTime64 carry a timezone that can be set at the column level or at the server level. Parse the timezone from the type string when present: DateTime64(3, 'America/New_York').
Do not use session_timezone. It is experimental and can invalidate partition pruning by changing how DateTime literals are interpreted. Convert timezones in the application layer using the column's declared timezone.
Complex types
| ClickHouse type | HTTP (JSONEachRow) | JDBC |
|---|---|---|
Array(T) | JSON array | java.sql.Array |
Map(K, V) | JSON object | Map<K, V> via getObject() |
Tuple(T1, T2, ...) | JSON array (positional) | List<Object> via getObject() |
UUID | Standard UUID string | java.util.UUID |
IPv4 | Dotted-decimal string | String |
IPv6 | Colon-hex string | String |
Additional notes:
JSON/Variant/Dynamic: introduced in ClickHouse 24.x. Connector support is limited. Cast toStringor extract sub-paths in theSELECTclause until your client library adds native support.AggregateFunction: contains binary intermediate state. It is not human-readable and cannot be decoded by generic connectors. ExcludeAggregateFunctioncolumns from user-facing column lists. Users interact with them via-Mergecombinators (e.g.sumMerge,avgMerge).
Identifiers are case-sensitive
Database names, table names, column names, and most function names are case-sensitive in ClickHouse. MyTable and mytable are different tables.
When generating SQL from schema discovery results, always preserve the exact casing returned by system.tables and system.columns. Wrap identifiers in double quotes if they conflict with reserved words or contain special characters:
Querying patterns
Always use parameterized queries
Never concatenate user-supplied values into query strings. ClickHouse supports parameterized queries on both the HTTP API and JDBC.
HTTP API — use {name:Type} placeholders in the query body and pass values as param_name query parameters:
JDBC — use PreparedStatement:
HTTP 200 does not mean success
ClickHouse sends an HTTP 200 OK before it knows whether the query will complete successfully — the status line is written before query execution begins. Errors appear either in the X-ClickHouse-Exception-Code response header or mid-stream in the response body.
Always handle errors on HTTP responses as follows:
- Check
X-ClickHouse-Exception-Codeon every response, regardless of HTTP status code. A non-empty value means the query failed. - Scan the body for
Code: NNN. DB::Exception:when consuming a streaming response. This string signals a server-side error that terminated the stream mid-result. - Force synchronous error reporting by adding
wait_end_of_query=1to the request. This buffers the entire result server-side before sending any bytes, so the HTTP status code is accurate. Use only for small results — buffering a large result set on the server defeats streaming and risks OOM.
Streaming large result sets
Use a streaming format for any query that might return large results. JSON and JSONCompact buffer the entire result set before sending — they will OOM on large exports.
Recommended streaming formats:
| Format | Use case |
|---|---|
JSONEachRow | Human-readable, self-describing, easy to parse line-by-line |
TabSeparated / TabSeparatedWithNames | High-throughput exports, simple parsing |
RowBinary | Maximum throughput, compact wire format |
Python — streaming with requests:
Go — streaming with database/sql:
JavaScript — streaming with the ClickHouse JS client:
Set timeouts
Always set max_execution_time (in seconds) in query settings to prevent runaway queries from consuming server resources indefinitely.
The client-side socket timeout must be strictly greater than max_execution_time. If the socket timeout fires first, the client drops the connection, but the server continues executing the query with no way to cancel it from that connection.
For out-of-band cancellation, issue KILL QUERY from a separate connection using the query_id you assigned:
See Tagging queries for how to assign a deterministic query_id.
Pagination
LIMIT/OFFSET degrades linearly as the offset grows — ClickHouse must scan and discard the first N rows on every page. It also has a documented correctness issue at certain offset values with some table engines.
Use keyset (cursor-based) pagination instead:
On each page, pass the created_at of the last row as the next cursor value. The cursor column should be part of the sorting key so that ClickHouse can prune data granules instead of scanning the full table.
For very large exports (millions of rows), stream the full result set rather than paginating — see Streaming large result sets.
64-bit integers in JSON (JavaScript)
Int64 and UInt64 values in ClickHouse JSON output are sent as bare JSON numbers by default. JavaScript's JSON.parse() silently loses precision for integers above 2^53 because it maps all numbers to float64.
Set output_format_json_quote_64bit_integers=1 to receive 64-bit integers as quoted strings:
Output changes from:
to:
Parse these with a BigInt-aware library in your connector.
Connection management
Connection pooling
ClickHouse uses stateless HTTP — there is no server-side session to maintain between requests. Pool HTTP connections (TCP sockets), not logical sessions.
Recommended HikariCP settings for typical BI workloads:
The maxLifetime setting is critical: it must be below the server's keep_alive_timeout. ClickHouse Cloud sets this to 10 seconds by default; self-hosted clusters vary. If a pooled connection outlives the server's idle timeout, the next request on that connection receives a broken-pipe error. Setting maxLifetime to 270,000 ms (4.5 minutes) is a safe default for self-hosted deployments where keep_alive_timeout is typically 300 seconds.
For non-Java connectors, configure the equivalent HTTP keep-alive and connection TTL settings on your HTTP client.
ClickHouse Cloud: auto-pause and warm-up
Development tier services on ClickHouse Cloud auto-pause after a period of inactivity. The first connection after a pause can take 10–30 seconds while the service resumes.
Design your connection setup to retry with exponential backoff before surfacing an error:
Do not surface the first connection failure to the user immediately — auto-pause recovery is expected behavior, not an error.
Observability
Tagging queries
Assign a query_id and log_comment to every query your connector issues. This makes queries attributable in system.query_log, which is essential for debugging slow queries and auditing.
HTTP — set headers on each request:
Use a deterministic query_id scheme (e.g. <job-name>:<sequence> or a UUID derived from the query content) so that retries produce the same ID and duplicates are easy to spot in system.query_log.
JDBC — ClickHouse JDBC exposes ClickHouseStatement which accepts settings:
system.query_log flush latency
system.query_log is written asynchronously. The default flush interval is approximately 7.5 seconds. Do not query system.query_log immediately after issuing a query — the row may not exist yet.
If your connector needs to read query metadata (e.g. rows read, elapsed time) after execution, either:
- Add a delay of at least 10 seconds before querying
system.query_log. - Poll with retry until the row appears:
The type column distinguishes query lifecycle events: QueryStart, QueryFinish, ExceptionBeforeStart, ExceptionWhileProcessing. Filter to QueryFinish for completed queries.