Compare commits

...

52 Commits

Author SHA1 Message Date
Philipinho 123771e841 ms 2026-04-23 19:40:27 +01:00
Philipinho 8c21675a75 fix(base): update base.module import to renamed QueryCacheModule 2026-04-23 16:59:28 +01:00
Philipinho 02a78b2ec7 test(base): wire integration + parity specs to duckdb runtime 2026-04-23 16:58:16 +01:00
Philipinho dbc1eb539c fix(base): serialize writer operations and prune dead code in cache service 2026-04-23 16:50:11 +01:00
Philipinho 38cd94b2d7 refactor(base): single duckdb instance with per-base attached databases 2026-04-23 16:40:14 +01:00
Philipinho 4437dcbb62 feat(base): single-instance duckdb runtime with writer + reader pool 2026-04-23 16:23:24 +01:00
Philipinho 568d94be1f feat(base): schema-qualified query builder for single-instance duckdb 2026-04-23 16:19:47 +01:00
Philipinho f12a0675ea feat(base): schema-qualified loader sql for single-instance duckdb 2026-04-23 16:15:42 +01:00
Philipinho 838d8892f0 feat(base): minimal async connection pool for duckdb reader pool 2026-04-23 16:10:32 +01:00
Philipinho 08711791d6 feat(base): add baseSchemaName helper for duckdb schema naming 2026-04-23 16:05:45 +01:00
Philipinho b04bcb5b0c feat(base): env var for duckdb reader-pool size 2026-04-23 15:52:35 +01:00
Philipinho 709d927544 fix(base): declare primary key on loaded rows so upsert has a conflict target 2026-04-23 14:31:26 +01:00
Philipinho 5b96dfe6c9 feat(base): log duckdb heap + spill per base on cold load 2026-04-23 14:07:36 +01:00
Philipinho 17db634029 fix(base): enable duckdb disk spill + raise memory default to avoid oom on large bases 2026-04-23 13:56:31 +01:00
Philipinho 5ebab5cd9e fix(base): make cell-extractor pg functions genuinely parallel-safe
The plpgsql + EXCEPTION versions of base_cell_numeric,
base_cell_timestamptz, and base_cell_bool were labeled PARALLEL SAFE
but EXCEPTION blocks require subtransactions, which Postgres cannot
start in a parallel worker. Any parallel scan that invoked them
crashed with 'cannot start subtransactions during a parallel
operation' — notably DuckDB's postgres extension on large base COPY
reads.

Rewrite each as a pure SQL function using jsonb_typeof + regex
validation for the 'coerce-or-null' semantics. No plpgsql, no
subtransactions, genuinely parallel-safe. Signatures unchanged so
existing call sites (loader, expression indexes, engine predicates)
are untouched.
2026-04-23 13:52:20 +01:00
Philipinho 2d9e060d9e feat(base): add BASE_QUERY_CACHE_TRACE flag for duckdb operation logging 2026-04-23 13:37:25 +01:00
Philipinho b2ed8f9936 Revert "refactor(base): use uuid package instead of inlined uuid7 in tests"
This reverts commit f819f633c9.
2026-04-23 13:14:09 +01:00
Philipinho 7192b4bacb Revert "refactor(base): use uuid package validator in loader-sql"
This reverts commit cfc50b7cae.
2026-04-23 13:14:09 +01:00
Philipinho cfc50b7cae refactor(base): use uuid package validator in loader-sql 2026-04-23 13:09:25 +01:00
Philipinho f819f633c9 refactor(base): use uuid package instead of inlined uuid7 in tests 2026-04-23 13:07:19 +01:00
Philipinho db1b1464e2 test(base): assert pure-postgres path when query cache is disabled 2026-04-23 12:50:59 +01:00
Philipinho cc47a6d65c refactor(base): drop prepared binding now that loader sql inlines uuids 2026-04-23 12:39:33 +01:00
Philipinho 378d17350c fix(base): use postgres_query to invoke pg-side udfs from duckdb loader 2026-04-23 12:39:30 +01:00
Philipinho eea989260a test(base): filter/sort parity matrix against postgres
Integration spec that seeds a 10K-row base with diverse property shapes
(text, number, date, checkbox, select, multi-select) and runs an
exhaustive matrix of filter/sort combinations against both the DuckDB
cache path and the Postgres-direct path. Asserts identical row ids and,
where semantics allow, identical cursor strings and pagination meta.

The suite is gated by INTEGRATION_DB_URL and skips cleanly without it.
34 tests total: 26 flat filter ops (text/number/date/checkbox/select/
multi-select), 4 nested boolean trees (AND/OR/mixed/max-depth), 3
multi-key sorts, and one full filter+sort+pagination walk.

Seed tuning to keep both engines in lock-step:
  * digit-only row positions so PG default collation and DuckDB
    bytewise collation agree on the tail tiebreak.
  * lowercase name pool so mixed-case locale/bytewise divergence
    doesn't surface on text-secondary sort.
  * priority is non-NULL to avoid the PG keyset stall when a boundary
    cursor encodes the '+/-Infinity' numeric sentinel (postgres.js
    parses it as NaN, which applyCursor re-emits as null).
2026-04-23 05:00:17 +01:00
Philipinho fc08cffd37 test(server): init LRU test module so pg extension bootstraps 2026-04-23 04:40:30 +01:00
Philipinho fde0ccb3c7 refactor(base): replace streaming loader with pg-extension CREATE TABLE AS SELECT 2026-04-23 04:28:25 +01:00
Philipinho e663d7eecf test(server): align integration stubs with new config + pg-extension injection 2026-04-23 04:28:21 +01:00
Philipinho 96e875f1de test(base): tighten loader-sql mapping assertions to full projections 2026-04-23 03:37:23 +01:00
Philipinho 6544ff6d38 feat(base): pure SQL builder for pg-extension loader 2026-04-23 03:31:00 +01:00
Philipinho 7ca712c9ab fix(base): propagate pg-extension bootstrap failure reason; align closeSync style 2026-04-23 03:26:41 +01:00
Philipinho a798397af0 feat(base): postgres extension service with bootstrap install + per-connection attach 2026-04-23 03:17:36 +01:00
Philipinho 9ba6459427 feat(base): env vars for per-instance duckdb memory limit + threads 2026-04-23 03:09:58 +01:00
Philipinho 14827ec6a0 test(server): add getBaseQueryCacheDebug to integration test env stubs 2026-04-19 23:41:06 +01:00
Philipinho c931fa5ec9 perf(server): skip per-request row count when collection is resident 2026-04-19 23:39:27 +01:00
Philipinho 7e07d77510 chore(server): add per-request perf logs for base query cache diagnostics 2026-04-19 22:44:39 +01:00
Philipinho 02c3bdf028 docs(base): add implementation plan for duckdb query cache 2026-04-19 22:35:56 +01:00
Philipinho 55feb01249 test(server): assert duckdb cache matches postgres on a 100K-row base 2026-04-19 22:28:07 +01:00
Philipinho 4636af3870 feat(server): warm duckdb collections on boot from redis recent-access set 2026-04-19 22:16:20 +01:00
Philipinho c9adf84260 feat(server): evict least-recently-used duckdb collections when cap exceeded 2026-04-19 22:11:55 +01:00
Philipinho 4f38c61725 fix(server): avoid acquiring redis client when base query cache is disabled 2026-04-19 22:05:56 +01:00
Philipinho df22efb290 feat(server): propagate row mutations to duckdb cache via redis pubsub 2026-04-19 22:00:37 +01:00
Philipinho 7534b44e6e refactor(server): preserve cache-failure stack trace and reuse hasSearch 2026-04-19 21:50:34 +01:00
Philipinho cf6b48cd58 feat(server): route large base list queries through the duckdb cache 2026-04-19 21:46:27 +01:00
Philipinho 45000bbd8b fix(server): close duckdb resources on load failure, dedupe concurrent loads, drop unused cells projection 2026-04-19 21:39:05 +01:00
Philipinho 91ad3de258 feat(server): load bases into DuckDB and serve list queries from cache
- collection-loader streams base rows via postgres and bulk-inserts into an
  in-memory DuckDB instance using the Appender API, then builds an index on
  each indexable column
- base-query-cache service routes list() calls through the prepared-statement
  path; ensureLoaded does schema-version checks with single-pass LRU eviction
- keyset param-ordering bug in the DuckDB builder fixed: placeholders appear
  head-to-tail but were being pushed tail-to-head, which made DuckDB bind the
  wrong value for each ? and throw Binder Error on typed columns
- base-row repo gains countActiveRows for the router to use in task 6
- seed script split into an importable helper so integration tests can seed a
  10k-row base deterministically without shelling out
- new integration spec compares Postgres vs DuckDB pagination end-to-end for
  a numeric sort and guards against duplicate rows from DuckDB

Integration test is skipped unless INTEGRATION_DB_URL is set.
2026-04-19 21:31:05 +01:00
Philipinho b28597125d fix(server): use DuckDB json_contains for multi-select filters and expand builder coverage 2026-04-19 21:11:29 +01:00
Philipinho a9db3ef008 feat(server): add DuckDB SQL builder for base list queries 2026-04-19 21:06:41 +01:00
Philipinho 574c5316f0 feat(server): scaffold base query-cache module behind feature flag 2026-04-19 20:59:24 +01:00
Philipinho 3af2db7a8b feat(server): add property-type to DuckDB column-spec mapping 2026-04-19 20:54:59 +01:00
Philipinho f181c6d9e8 fix(server): case-insensitive parse for BASE_QUERY_CACHE_ENABLED env var 2026-04-19 20:52:06 +01:00
Philipinho 8ac4c97c98 docs(server): explain base-query-cache max-collections default 2026-04-19 20:50:21 +01:00
Philipinho abd42fd007 chore(server): add duckdb dependency and query-cache env getters 2026-04-19 20:48:16 +01:00
33 changed files with 7879 additions and 268 deletions
+1
View File
@@ -37,6 +37,7 @@
"@aws-sdk/lib-storage": "3.1014.0",
"@aws-sdk/s3-request-presigner": "3.1014.0",
"@clickhouse/client": "^1.18.2",
"@duckdb/node-api": "1.5.2-r.1",
"@fastify/cookie": "^11.0.2",
"@fastify/multipart": "^9.4.0",
"@fastify/static": "^9.0.0",
+5 -1
View File
@@ -14,9 +14,13 @@ import { BaseWsService } from './realtime/base-ws.service';
import { BaseWsConsumers } from './realtime/base-ws-consumers';
import { BasePresenceService } from './realtime/base-presence.service';
import { QueueName } from '../../integrations/queue/constants';
import { QueryCacheModule } from './query-cache/query-cache.module';
@Module({
imports: [BullModule.registerQueue({ name: QueueName.BASE_QUEUE })],
imports: [
BullModule.registerQueue({ name: QueueName.BASE_QUEUE }),
QueryCacheModule,
],
controllers: [
BaseController,
BasePropertyController,
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,667 @@
import {
Injectable,
Logger,
OnApplicationBootstrap,
OnModuleDestroy,
Optional,
} from '@nestjs/common';
import { RedisService } from '@nestjs-labs/nestjs-ioredis';
import type { Redis } from 'ioredis';
import { BaseRepo } from '@docmost/db/repos/base/base.repo';
import { BaseRow } from '@docmost/db/types/entity.types';
import {
CursorPaginationResult,
emptyCursorPaginationResult,
} from '@docmost/db/pagination/cursor-pagination';
import { PaginationOptions } from '@docmost/db/pagination/pagination-options';
import {
CURSOR_TAIL_KEYS,
FilterNode,
PropertySchema,
SearchSpec,
SortBuild,
SortSpec,
buildSorts,
makeCursor,
} from '../engine';
import { QueryCacheConfigProvider } from './query-cache.config';
import { CollectionLoader } from './collection-loader';
import { buildDuckDbListQuery } from './duckdb-query-builder';
import { DuckDbRuntime } from './duckdb-runtime';
import { BasePropertyType } from '../base.schemas';
import {
ChangeEnvelope,
ColumnSpec,
LoadedCollection,
} from './query-cache.types';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
export type CacheListOpts = {
filter?: FilterNode;
sorts?: SortSpec[];
search?: SearchSpec;
schema: PropertySchema;
pagination: PaginationOptions;
};
@Injectable()
export class BaseQueryCacheService
implements OnApplicationBootstrap, OnModuleDestroy
{
private readonly logger = new Logger(BaseQueryCacheService.name);
private readonly collections = new Map<string, LoadedCollection>();
private readonly inFlightLoads = new Map<string, Promise<LoadedCollection>>();
/*
* Serializes every write-path call into the shared writer connection.
* DuckDB connections aren't thread-safe for concurrent prepared statements,
* and Redis pub/sub can fire `applyChange` calls concurrently since the
* subscriber's `pmessage` handler doesn't await. We funnel all writes
* (`upsertRow`, `deleteRow`, `updatePosition`, `refreshRowCount`,
* `invalidate`, `evictLru`) through this simple Promise chain so only
* one is in flight at a time. Reads are unaffected — they flow through
* the reader pool, which handles its own concurrency.
*/
private writeQueue: Promise<void> = Promise.resolve();
private async serializeWrite<T>(fn: () => Promise<T>): Promise<T> {
const prev = this.writeQueue;
let unblock!: () => void;
this.writeQueue = new Promise<void>((resolve) => { unblock = resolve; });
try {
await prev;
return await fn();
} finally {
unblock();
}
}
constructor(
private readonly configProvider: QueryCacheConfigProvider,
private readonly baseRepo: BaseRepo,
private readonly collectionLoader: CollectionLoader,
private readonly runtime: DuckDbRuntime,
@Optional() private readonly redisService: RedisService | null = null,
@Optional() private readonly env: EnvironmentService | null = null,
) {}
async onApplicationBootstrap(): Promise<void> {
const { enabled, warmTopN } = this.configProvider.config;
if (!enabled) return;
if (!this.runtime.isReady()) {
this.logger.warn('runtime not ready; skipping warm-up');
return;
}
const redis = this.tryGetRedisClient();
if (!redis) return;
try {
const ids = await redis.zrevrange(
'base-query-cache:recent',
0,
warmTopN - 1,
);
for (const baseId of ids) {
try {
const base = await this.baseRepo.findById(baseId);
if (!base) continue;
await this.ensureLoaded(baseId, base.workspaceId);
} catch (err) {
this.logger.debug(
`warm-up skipped ${baseId}: ${(err as Error).message}`,
);
}
}
this.logger.log(`Warmed ${ids.length} collections on boot`);
} catch (err) {
const error = err as Error;
this.logger.warn(`Warm-up failed: ${error.message}`);
if (error.stack) this.logger.warn(error.stack);
}
}
async onModuleDestroy(): Promise<void> {
// The runtime owns the instance/connection lifecycle; we just clear
// our metadata. DETACH is a no-op during shutdown because the instance
// is closing anyway.
this.collections.clear();
}
async list(
baseId: string,
workspaceId: string,
opts: CacheListOpts,
): Promise<CursorPaginationResult<BaseRow>> {
const debug = this.env?.getBaseQueryCacheDebug() ?? false;
const trace = this.env?.getBaseQueryCacheTrace?.() ?? false;
const tStart = debug ? Date.now() : 0;
const tEnsure = debug ? Date.now() : 0;
const collection = await this.ensureLoaded(baseId, workspaceId);
const ensureMs = debug ? Date.now() - tEnsure : 0;
const sortBuilds: SortBuild[] =
opts.sorts && opts.sorts.length > 0
? buildSorts(opts.sorts, opts.schema)
: [];
const cursor = makeCursor(sortBuilds, CURSOR_TAIL_KEYS);
const sortFieldKeys = sortBuilds.map((s) => s.key);
const allFieldKeys = [...sortFieldKeys, 'position', 'id'];
let afterKeys: Record<string, unknown> | undefined;
if (opts.pagination.cursor) {
const decoded = cursor.decodeCursor(opts.pagination.cursor, allFieldKeys);
afterKeys = cursor.parseCursor(decoded);
}
const { sql, params } = buildDuckDbListQuery({
columns: collection.columns,
filter: opts.filter,
sorts: opts.sorts,
search: opts.search,
pagination: {
limit: opts.pagination.limit,
afterKeys: afterKeys as any,
},
schema: collection.schema,
});
if (trace) {
console.log(
'[cache-trace]',
JSON.stringify({
phase: 'query.sql',
baseId: baseId.slice(0, 8),
schema: collection.schema,
sql,
params,
}),
);
}
const tExec = debug ? Date.now() : 0;
const duckRows = await this.runtime.withReader(async (conn) => {
const prepared = await conn.prepare(sql);
for (let i = 0; i < params.length; i++) {
const p = params[i];
const oneBased = i + 1;
if (p === null || p === undefined) {
prepared.bindNull(oneBased);
} else if (typeof p === 'string') {
prepared.bindVarchar(oneBased, p);
} else if (typeof p === 'number') {
prepared.bindDouble(oneBased, p);
} else if (typeof p === 'boolean') {
prepared.bindBoolean(oneBased, p);
} else if (p instanceof Date) {
prepared.bindVarchar(oneBased, p.toISOString());
} else {
prepared.bindVarchar(oneBased, JSON.stringify(p));
}
}
const reader = await prepared.runAndReadAll();
return reader.getRowObjectsJS();
});
const execMs = debug ? Date.now() - tExec : 0;
const hasNextPage = duckRows.length > opts.pagination.limit;
if (hasNextPage) duckRows.pop();
if (duckRows.length === 0) {
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
phase: 'cache.list',
baseId: baseId.slice(0, 8),
totalMs: Date.now() - tStart,
ensureMs,
execMs,
shapeMs: 0,
rows: 0,
}),
);
}
return emptyCursorPaginationResult<BaseRow>(opts.pagination.limit);
}
const tShape = debug ? Date.now() : 0;
const items = duckRows.map((r) =>
shapeBaseRow(r, collection.columns),
);
const shapeMs = debug ? Date.now() - tShape : 0;
const endRow = duckRows[duckRows.length - 1];
const startRow = duckRows[0];
const encodeFromRow = (raw: Record<string, unknown>): string => {
const entries: Array<[string, unknown]> = [];
for (const sb of sortBuilds) entries.push([sb.key, raw[sb.key]]);
entries.push(['position', raw.position]);
entries.push(['id', raw.id]);
return cursor.encodeCursor(entries);
};
const hasPrevPage = !!opts.pagination.cursor;
const nextCursor = hasNextPage ? encodeFromRow(endRow) : null;
const prevCursor = hasPrevPage ? encodeFromRow(startRow) : null;
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
phase: 'cache.list',
baseId: baseId.slice(0, 8),
totalMs: Date.now() - tStart,
ensureMs,
execMs,
shapeMs,
rows: items.length,
}),
);
}
return {
items,
meta: {
limit: opts.pagination.limit,
hasNextPage,
hasPrevPage,
nextCursor,
prevCursor,
},
};
}
async invalidate(baseId: string): Promise<void> {
const collection = this.collections.get(baseId);
if (!collection) return;
await this.serializeWrite(async () => {
await this.runtime.detachBase(collection.schema);
});
this.collections.delete(baseId);
}
isResident(baseId: string): boolean {
return this.collections.has(baseId);
}
residentSize(): number {
return this.collections.size;
}
peek(baseId: string): LoadedCollection | undefined {
return this.collections.get(baseId);
}
residencySnapshot(): Array<{
baseId: string;
schema: string;
rows: number;
approxMb: number;
}> {
const out: Array<{
baseId: string;
schema: string;
rows: number;
approxMb: number;
}> = [];
for (const [baseId, c] of this.collections) {
out.push({
baseId,
schema: c.schema,
rows: c.rowCount,
approxMb: +(c.approxBytes / (1024 * 1024)).toFixed(1),
});
}
return out;
}
async applyChange(env: ChangeEnvelope): Promise<void> {
const trace = this.env?.getBaseQueryCacheTrace?.() ?? false;
const collection = this.collections.get(env.baseId);
if (trace) {
console.log(
'[cache-trace]',
JSON.stringify({
phase: 'pubsub.apply',
baseId: env.baseId.slice(0, 8),
kind: env.kind,
resident: !!collection,
}),
);
}
if (!collection) return;
try {
switch (env.kind) {
case 'schema-invalidate':
if (env.schemaVersion > collection.schemaVersion) {
await this.invalidate(env.baseId);
}
return;
case 'row-upsert':
await this.upsertRow(collection, env.row);
await this.refreshRowCount(collection);
return;
case 'row-delete':
await this.deleteRow(collection, env.rowId);
await this.refreshRowCount(collection);
return;
case 'rows-delete':
for (const id of env.rowIds) await this.deleteRow(collection, id);
await this.refreshRowCount(collection);
return;
case 'row-reorder':
await this.updatePosition(collection, env.rowId, env.position);
return;
}
} catch (err) {
const error = err as Error;
this.logger.warn(
`applyChange failed for ${env.baseId}; invalidating: ${error.message}`,
);
if (error.stack) this.logger.warn(error.stack);
await this.invalidate(env.baseId);
}
}
private async ensureLoaded(
baseId: string,
workspaceId: string,
): Promise<LoadedCollection> {
const debug = this.env?.getBaseQueryCacheDebug() ?? false;
const existing = this.collections.get(baseId);
const tFind = debug ? Date.now() : 0;
const base = await this.baseRepo.findById(baseId);
const findMs = debug ? Date.now() - tFind : 0;
if (!base) throw new Error(`Base ${baseId} not found`);
const freshVersion = (base as any).schemaVersion ?? 1;
if (existing && existing.schemaVersion === freshVersion) {
existing.lastAccessedAt = Date.now();
this.recordAccess(baseId);
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
phase: 'ensureLoaded.hit',
baseId: baseId.slice(0, 8),
findMs,
}),
);
}
return existing;
}
if (existing) {
await this.serializeWrite(async () => {
await this.runtime.detachBase(existing.schema);
});
this.collections.delete(baseId);
}
const inFlight = this.inFlightLoads.get(baseId);
if (inFlight) {
const loaded = await inFlight;
this.recordAccess(baseId);
return loaded;
}
const tLoad = debug ? Date.now() : 0;
const promise = (async () => {
try {
const { maxCollections } = this.configProvider.config;
if (this.collections.size >= maxCollections) {
await this.evictLru();
}
const loaded = await this.collectionLoader.load(baseId, workspaceId);
this.collections.set(baseId, loaded);
return loaded;
} finally {
this.inFlightLoads.delete(baseId);
}
})();
this.inFlightLoads.set(baseId, promise);
const loaded = await promise;
const loadMs = debug ? Date.now() - tLoad : 0;
this.recordAccess(baseId);
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
phase: 'ensureLoaded.miss',
baseId: baseId.slice(0, 8),
findMs,
loadMs,
rows: loaded.rowCount,
approxMb: +(loaded.approxBytes / (1024 * 1024)).toFixed(1),
}),
);
}
return loaded;
}
private async evictLru(): Promise<void> {
let oldestKey: string | null = null;
let oldestTime = Number.POSITIVE_INFINITY;
for (const [key, col] of this.collections) {
if (col.lastAccessedAt < oldestTime) {
oldestTime = col.lastAccessedAt;
oldestKey = key;
}
}
if (oldestKey) {
const col = this.collections.get(oldestKey)!;
await this.serializeWrite(async () => {
await this.runtime.detachBase(col.schema);
});
this.collections.delete(oldestKey);
this.logger.debug(`Evicted LRU collection ${oldestKey}`);
}
}
private async upsertRow(
collection: LoadedCollection,
row: Record<string, unknown>,
): Promise<void> {
return this.serializeWrite(async () => {
const specs = collection.columns;
const columnList = specs.map((s) => quoteIdent(s.column)).join(', ');
const placeholders = specs.map(() => '?').join(', ');
const sql = `INSERT OR REPLACE INTO ${collection.schema}.rows (${columnList}) VALUES (${placeholders})`;
const writer = this.runtime.getWriter();
const prepared = await writer.prepare(sql);
for (let i = 0; i < specs.length; i++) {
const spec = specs[i];
const oneBased = i + 1;
const raw = readFromRowEvent(row, spec);
if (raw == null) {
prepared.bindNull(oneBased);
continue;
}
switch (spec.ddlType) {
case 'VARCHAR':
prepared.bindVarchar(oneBased, String(raw));
break;
case 'DOUBLE': {
const n = Number(raw);
if (Number.isNaN(n)) prepared.bindNull(oneBased);
else prepared.bindDouble(oneBased, n);
break;
}
case 'BOOLEAN':
prepared.bindBoolean(oneBased, Boolean(raw));
break;
case 'TIMESTAMPTZ': {
const d = raw instanceof Date ? raw : new Date(String(raw));
if (Number.isNaN(d.getTime())) prepared.bindNull(oneBased);
else prepared.bindVarchar(oneBased, d.toISOString());
break;
}
case 'JSON':
prepared.bindVarchar(oneBased, JSON.stringify(raw));
break;
}
}
await prepared.run();
});
}
private async deleteRow(
collection: LoadedCollection,
rowId: string,
): Promise<void> {
return this.serializeWrite(async () => {
const writer = this.runtime.getWriter();
const prepared = await writer.prepare(
`DELETE FROM ${collection.schema}.rows WHERE id = ?`,
);
prepared.bindVarchar(1, rowId);
await prepared.run();
});
}
private async updatePosition(
collection: LoadedCollection,
rowId: string,
position: string,
): Promise<void> {
return this.serializeWrite(async () => {
const writer = this.runtime.getWriter();
const prepared = await writer.prepare(
`UPDATE ${collection.schema}.rows SET position = ? WHERE id = ?`,
);
prepared.bindVarchar(1, position);
prepared.bindVarchar(2, rowId);
await prepared.run();
});
}
private async refreshRowCount(collection: LoadedCollection): Promise<void> {
return this.serializeWrite(async () => {
try {
const res = await this.runtime.getWriter().runAndReadAll(
`SELECT count(*) AS c FROM ${collection.schema}.rows`,
);
const row = res.getRowObjects()[0] as { c: bigint | number };
collection.rowCount = Number(row.c);
collection.approxBytes = collection.rowCount * collection.columns.length * 64;
} catch {
// stale rowCount self-corrects on next reload
}
});
}
private recordAccess(baseId: string): void {
if (!this.configProvider.config.enabled) return;
const redis = this.tryGetRedisClient();
if (!redis) return;
const nowMs = Date.now();
const maxKeep = this.configProvider.config.maxCollections * 10;
void (async () => {
try {
await redis.zadd('base-query-cache:recent', nowMs, baseId);
await redis.zremrangebyrank(
'base-query-cache:recent',
0,
-(maxKeep + 1),
);
} catch (err) {
this.logger.debug(
`recordAccess failed for ${baseId}: ${(err as Error).message}`,
);
}
})();
}
private tryGetRedisClient(): Redis | null {
if (!this.redisService) return null;
try {
return this.redisService.getOrNil();
} catch {
return null;
}
}
}
function quoteIdent(name: string): string {
return `"${name.replace(/"/g, '""')}"`;
}
/*
* Convert a DuckDB row object back to the BaseRow JSON shape returned to
* API callers. Kept inline (not exported) because it's a pure derivation
* from the ColumnSpec list.
*/
function shapeBaseRow(
raw: Record<string, unknown>,
specs: ColumnSpec[],
): BaseRow {
const cells: Record<string, unknown> = {};
for (const spec of specs) {
if (!spec.property) continue;
const val = raw[spec.column];
if (val == null) continue;
if (spec.ddlType === 'JSON' && typeof val === 'string') {
try {
cells[spec.property.id] = JSON.parse(val);
} catch {
cells[spec.property.id] = val;
}
} else {
cells[spec.property.id] = val;
}
}
return {
id: raw.id as string,
baseId: raw.base_id as string,
workspaceId: raw.workspace_id as string,
creatorId: raw.creator_id as string,
position: raw.position as string,
createdAt: coerceDate(raw.created_at),
updatedAt: coerceDate(raw.updated_at),
lastUpdatedById: raw.last_updated_by_id as string,
deletedAt: null,
cells,
} as BaseRow;
}
function coerceDate(v: unknown): Date {
if (v instanceof Date) return v;
if (typeof v === 'string') return new Date(v);
return new Date(0);
}
function readFromRowEvent(
row: Record<string, unknown>,
spec: ColumnSpec,
): unknown {
switch (spec.column) {
case 'id': return row.id ?? null;
case 'base_id': return row.baseId ?? row.base_id ?? null;
case 'workspace_id': return row.workspaceId ?? row.workspace_id ?? null;
case 'creator_id': return row.creatorId ?? row.creator_id ?? null;
case 'position': return row.position ?? null;
case 'created_at': return row.createdAt ?? row.created_at ?? null;
case 'updated_at': return row.updatedAt ?? row.updated_at ?? null;
case 'last_updated_by_id': return row.lastUpdatedById ?? row.last_updated_by_id ?? null;
case 'deleted_at': return null;
case 'search_text': return '';
}
const prop = spec.property;
if (!prop) return null;
if (
prop.type === BasePropertyType.CREATED_AT ||
prop.type === BasePropertyType.LAST_EDITED_AT ||
prop.type === BasePropertyType.LAST_EDITED_BY
) {
return null;
}
const cells = (row.cells as Record<string, unknown> | null) ?? {};
return cells[prop.id] ?? null;
}
@@ -0,0 +1,110 @@
import {
Injectable,
Logger,
OnApplicationBootstrap,
OnModuleDestroy,
} from '@nestjs/common';
import Redis from 'ioredis';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
import {
createRetryStrategy,
parseRedisUrl,
} from '../../../common/helpers/utils';
import { QueryCacheConfigProvider } from './query-cache.config';
import { BaseQueryCacheService } from './base-query-cache.service';
import { ChangeEnvelope } from './query-cache.types';
const CHANNEL_PATTERN = 'base-query-cache:changes:*';
/*
* Dedicated ioredis subscriber that forwards change envelopes to the local
* BaseQueryCacheService. A separate connection is required because ioredis
* puts subscribing clients into subscriber-only mode and the shared client
* from RedisService is used for normal commands elsewhere in the app.
* When the query-cache is disabled we do not open a Redis connection at all.
*/
@Injectable()
export class BaseQueryCacheSubscriber
implements OnApplicationBootstrap, OnModuleDestroy
{
private readonly logger = new Logger(BaseQueryCacheSubscriber.name);
private client: Redis | null = null;
constructor(
private readonly configProvider: QueryCacheConfigProvider,
private readonly env: EnvironmentService,
private readonly cacheService: BaseQueryCacheService,
) {}
async onApplicationBootstrap(): Promise<void> {
if (!this.configProvider.config.enabled) return;
const redisUrl = this.env.getRedisUrl();
const { family } = parseRedisUrl(redisUrl);
this.client = new Redis(redisUrl, {
family,
retryStrategy: createRetryStrategy(),
});
this.client.on('error', (err) => {
this.logger.warn(`Subscriber client error: ${err.message}`);
});
this.client.on('pmessage', (_pattern, channel, message) => {
this.handleMessage(channel, message).catch((err) => {
const error = err as Error;
this.logger.warn(
`Unhandled error applying change from ${channel}: ${error.message}`,
);
});
});
try {
await this.client.psubscribe(CHANNEL_PATTERN);
this.logger.log(`Subscribed to ${CHANNEL_PATTERN}`);
} catch (err) {
const error = err as Error;
this.logger.warn(`Failed to psubscribe: ${error.message}`);
}
}
async onModuleDestroy(): Promise<void> {
if (!this.client) return;
try {
await this.client.quit();
} catch (err) {
const error = err as Error;
this.logger.warn(
`Failed to close subscriber client cleanly: ${error.message}`,
);
}
this.client = null;
}
private async handleMessage(
channel: string,
message: string,
): Promise<void> {
let envelope: ChangeEnvelope;
try {
envelope = JSON.parse(message) as ChangeEnvelope;
} catch (err) {
const error = err as Error;
this.logger.warn(
`Dropping malformed cache-change message on ${channel}: ${error.message}`,
);
return;
}
try {
await this.cacheService.applyChange(envelope);
} catch (err) {
const error = err as Error;
this.logger.warn(
`applyChange failed for ${envelope.baseId}: ${error.message}`,
);
if (error.stack) this.logger.warn(error.stack);
}
}
}
@@ -0,0 +1,173 @@
import { Injectable, Logger } from '@nestjs/common';
import { OnEvent } from '@nestjs/event-emitter';
import { RedisService } from '@nestjs-labs/nestjs-ioredis';
import type { Redis } from 'ioredis';
import { EventName } from '../../../common/events/event.contants';
import { BaseRowRepo } from '@docmost/db/repos/base/base-row.repo';
import {
BasePropertyCreatedEvent,
BasePropertyDeletedEvent,
BasePropertyUpdatedEvent,
BaseRowCreatedEvent,
BaseRowDeletedEvent,
BaseRowReorderedEvent,
BaseRowUpdatedEvent,
BaseRowsDeletedEvent,
BaseSchemaBumpedEvent,
} from '../events/base-events';
import { QueryCacheConfigProvider } from './query-cache.config';
import { ChangeEnvelope } from './query-cache.types';
/*
* Bridges in-process base domain events onto a Redis pub/sub channel so every
* node running the query-cache can keep its resident DuckDB collections in
* sync. Each base gets its own channel (`base-query-cache:changes:${baseId}`)
* to keep pattern matching cheap. When the feature flag is off this class
* registers as a no-op so we pay zero overhead.
*/
@Injectable()
export class BaseQueryCacheWriteConsumer {
private readonly logger = new Logger(BaseQueryCacheWriteConsumer.name);
private _redis: Redis | null = null;
constructor(
private readonly redisService: RedisService,
private readonly configProvider: QueryCacheConfigProvider,
private readonly baseRowRepo: BaseRowRepo,
) {}
private get redis(): Redis {
if (!this._redis) this._redis = this.redisService.getOrThrow();
return this._redis;
}
@OnEvent(EventName.BASE_ROW_CREATED)
async onRowCreated(e: BaseRowCreatedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'row-upsert',
baseId: e.baseId,
row: e.row as unknown as Record<string, unknown>,
});
}
@OnEvent(EventName.BASE_ROW_UPDATED)
async onRowUpdated(e: BaseRowUpdatedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
const row = await this.baseRowRepo.findById(e.rowId, {
workspaceId: e.workspaceId,
});
if (!row) return;
await this.publish(e.baseId, {
kind: 'row-upsert',
baseId: e.baseId,
row: row as unknown as Record<string, unknown>,
});
}
@OnEvent(EventName.BASE_ROW_DELETED)
async onRowDeleted(e: BaseRowDeletedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'row-delete',
baseId: e.baseId,
rowId: e.rowId,
});
}
@OnEvent(EventName.BASE_ROWS_DELETED)
async onRowsDeleted(e: BaseRowsDeletedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'rows-delete',
baseId: e.baseId,
rowIds: e.rowIds,
});
}
@OnEvent(EventName.BASE_ROW_REORDERED)
async onRowReordered(e: BaseRowReorderedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'row-reorder',
baseId: e.baseId,
rowId: e.rowId,
position: e.position,
});
}
@OnEvent(EventName.BASE_SCHEMA_BUMPED)
async onSchemaBumped(e: BaseSchemaBumpedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'schema-invalidate',
baseId: e.baseId,
schemaVersion: e.schemaVersion,
});
}
@OnEvent(EventName.BASE_PROPERTY_UPDATED)
async onPropertyUpdated(e: BasePropertyUpdatedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'schema-invalidate',
baseId: e.baseId,
schemaVersion: e.schemaVersion,
});
}
@OnEvent(EventName.BASE_PROPERTY_CREATED)
async onPropertyCreated(e: BasePropertyCreatedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
// Property CREATED / DELETED events don't carry a schemaVersion. Use
// Number.MAX_SAFE_INTEGER as a sentinel so `applyChange`'s
// `envVersion > cachedVersion` check unconditionally invalidates — any
// real schemaVersion will be smaller. A follow-up could plumb the real
// schemaVersion through the event payload and drop the sentinel.
await this.publish(e.baseId, {
kind: 'schema-invalidate',
baseId: e.baseId,
schemaVersion: Number.MAX_SAFE_INTEGER,
});
}
@OnEvent(EventName.BASE_PROPERTY_DELETED)
async onPropertyDeleted(e: BasePropertyDeletedEvent): Promise<void> {
if (!this.configProvider.config.enabled) return;
await this.publish(e.baseId, {
kind: 'schema-invalidate',
baseId: e.baseId,
schemaVersion: Number.MAX_SAFE_INTEGER,
});
}
private async publish(
baseId: string,
envelope: ChangeEnvelope,
): Promise<void> {
const channel = `base-query-cache:changes:${baseId}`;
if (this.configProvider.config.trace) {
console.log(
'[cache-trace]',
JSON.stringify({
phase: 'pubsub.publish',
baseId,
kind: envelope.kind,
// Include the row id or similar short discriminator where meaningful,
// but don't dump the full envelope — it can be large (row-upsert ships
// the whole row).
...('rowId' in envelope ? { rowId: envelope.rowId } : {}),
...('rowIds' in envelope ? { rowCount: envelope.rowIds.length } : {}),
}),
);
}
try {
await this.redis.publish(channel, JSON.stringify(envelope));
} catch (err) {
const error = err as Error;
this.logger.warn(
`Failed to publish cache change for ${baseId}: ${error.message}`,
);
}
}
}
@@ -0,0 +1,159 @@
import { BaseQueryRouter } from './base-query-router';
import { QueryCacheConfigProvider } from './query-cache.config';
import { BaseRowRepo } from '@docmost/db/repos/base/base-row.repo';
import { BaseQueryCacheService } from './base-query-cache.service';
import { FilterNode, SearchSpec, SortSpec } from '../engine';
type FakeConfig = { enabled: boolean; minRows: number };
function makeRouter(
cfg: FakeConfig,
count: number,
): { router: BaseQueryRouter; countSpy: jest.Mock } {
const configProvider = {
config: {
enabled: cfg.enabled,
minRows: cfg.minRows,
maxCollections: 10,
warmTopN: 0,
},
} as unknown as QueryCacheConfigProvider;
const countSpy = jest.fn().mockResolvedValue(count);
const baseRowRepo = { countActiveRows: countSpy } as unknown as BaseRowRepo;
// Default fake: always miss, so `decide` falls through to countActiveRows.
const fakeCacheService = {
peek: () => undefined,
} as unknown as BaseQueryCacheService;
return {
router: new BaseQueryRouter(configProvider, baseRowRepo, fakeCacheService),
countSpy,
};
}
const filter: FilterNode = {
op: 'and',
children: [
{
propertyId: 'p1',
op: 'eq',
value: 'foo',
},
],
};
const sorts: SortSpec[] = [{ propertyId: 'p1', direction: 'asc' }];
const trgmSearch: SearchSpec = { query: 'hello', mode: 'trgm' };
const ftsSearch: SearchSpec = { query: 'hello', mode: 'fts' };
const baseArgs = {
baseId: 'base-1',
workspaceId: 'ws-1',
};
describe('BaseQueryRouter.decide', () => {
it('returns postgres when flag is off', async () => {
const { router, countSpy } = makeRouter(
{ enabled: false, minRows: 10 },
1000,
);
const decision = await router.decide({ ...baseArgs, filter });
expect(decision).toBe('postgres');
expect(countSpy).not.toHaveBeenCalled();
});
it('returns postgres when row count < minRows', async () => {
const { router } = makeRouter({ enabled: true, minRows: 1000 }, 500);
const decision = await router.decide({ ...baseArgs, filter });
expect(decision).toBe('postgres');
});
it('returns postgres when query has no filter/sort/search', async () => {
const { router, countSpy } = makeRouter(
{ enabled: true, minRows: 10 },
10000,
);
const decision = await router.decide({ ...baseArgs });
expect(decision).toBe('postgres');
expect(countSpy).not.toHaveBeenCalled();
});
it('returns postgres when search.mode === "fts" even for large base', async () => {
const { router } = makeRouter({ enabled: true, minRows: 10 }, 10000);
const decision = await router.decide({ ...baseArgs, search: ftsSearch });
expect(decision).toBe('postgres');
});
it('returns cache when flag on + rows >= minRows + has filter', async () => {
const { router } = makeRouter({ enabled: true, minRows: 1000 }, 1000);
const decision = await router.decide({ ...baseArgs, filter });
expect(decision).toBe('cache');
});
it('returns cache when flag on + rows >= minRows + has sort', async () => {
const { router } = makeRouter({ enabled: true, minRows: 1000 }, 5000);
const decision = await router.decide({ ...baseArgs, sorts });
expect(decision).toBe('cache');
});
it('returns postgres when flag on + rows >= minRows + has trgm search (v1 gates search to postgres)', async () => {
const { router } = makeRouter({ enabled: true, minRows: 10 }, 10000);
const decision = await router.decide({ ...baseArgs, search: trgmSearch });
expect(decision).toBe('postgres');
});
it('uses cached row count from resident collection (no Postgres call)', async () => {
const countSpy = jest.fn().mockResolvedValue(999999); // shouldn't be called
const cacheService = {
peek: jest.fn().mockReturnValue({ baseId: 'base-1', rowCount: 50_000 }),
} as unknown as BaseQueryCacheService;
const router = new BaseQueryRouter(
{
config: {
enabled: true,
minRows: 25_000,
maxCollections: 10,
warmTopN: 0,
},
} as unknown as QueryCacheConfigProvider,
{ countActiveRows: countSpy } as unknown as BaseRowRepo,
cacheService,
);
const decision = await router.decide({
...baseArgs,
sorts,
});
expect(decision).toBe('cache');
expect((cacheService.peek as jest.Mock)).toHaveBeenCalledWith('base-1');
expect(countSpy).not.toHaveBeenCalled();
});
it('falls back to Postgres count when collection is not resident', async () => {
const countSpy = jest.fn().mockResolvedValue(30_000);
const cacheService = {
peek: jest.fn().mockReturnValue(undefined),
} as unknown as BaseQueryCacheService;
const router = new BaseQueryRouter(
{
config: {
enabled: true,
minRows: 25_000,
maxCollections: 10,
warmTopN: 0,
},
} as unknown as QueryCacheConfigProvider,
{ countActiveRows: countSpy } as unknown as BaseRowRepo,
cacheService,
);
const decision = await router.decide({
...baseArgs,
sorts,
});
expect(decision).toBe('cache');
expect((cacheService.peek as jest.Mock)).toHaveBeenCalledWith('base-1');
expect(countSpy).toHaveBeenCalledWith('base-1', { workspaceId: 'ws-1' });
});
});
@@ -0,0 +1,118 @@
import { Injectable, Optional } from '@nestjs/common';
import { QueryCacheConfigProvider } from './query-cache.config';
import { BaseRowRepo } from '@docmost/db/repos/base/base-row.repo';
import type { FilterNode, SearchSpec, SortSpec } from '../engine';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
import { BaseQueryCacheService } from './base-query-cache.service';
export type RouteDecision = 'postgres' | 'cache';
export type RouteDecideArgs = {
baseId: string;
workspaceId: string;
filter?: FilterNode;
sorts?: SortSpec[];
search?: SearchSpec;
};
@Injectable()
export class BaseQueryRouter {
constructor(
private readonly configProvider: QueryCacheConfigProvider,
private readonly baseRowRepo: BaseRowRepo,
private readonly cacheService: BaseQueryCacheService,
@Optional() private readonly env: EnvironmentService | null = null,
) {}
async decide(args: RouteDecideArgs): Promise<RouteDecision> {
const { enabled, minRows } = this.configProvider.config;
const trace = this.configProvider.config.trace ?? false;
const debug = this.env?.getBaseQueryCacheDebug() ?? false;
const tStart = debug ? Date.now() : 0;
const emit = (route: RouteDecision, reason: string): RouteDecision => {
if (trace) {
console.log(
'[cache-trace]',
JSON.stringify({
phase: 'router.decision',
baseId: args.baseId,
route,
reason,
}),
);
}
return route;
};
if (!enabled) return emit('postgres', 'flag disabled');
const hasFilter = !!args.filter;
const hasSorts = !!args.sorts && args.sorts.length > 0;
const hasSearch = !!args.search;
if (!hasFilter && !hasSorts && !hasSearch) {
return emit('postgres', 'no filter/sort/search');
}
// v1: any search stays on Postgres — loader doesn't populate search_text yet.
if (hasSearch) return emit('postgres', 'search requires postgres');
// Fast path: if the collection is already resident, read the cached
// row count instead of running a Postgres COUNT on every request.
const tPeek = debug ? Date.now() : 0;
const resident = this.cacheService.peek(args.baseId);
const peekMs = debug ? Date.now() - tPeek : 0;
if (resident) {
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
phase: 'router.residentCount',
baseId: args.baseId.slice(0, 8),
count: resident.rowCount,
minRows,
ms: peekMs,
totalMs: Date.now() - tStart,
}),
);
}
if (resident.rowCount < minRows) {
return emit(
'postgres',
`rowCount=${resident.rowCount} below MIN_ROWS=${minRows}`,
);
}
return emit(
'cache',
`qualified: rowCount=${resident.rowCount}, hasFilter=${hasFilter}, hasSort=${hasSorts}`,
);
}
const tCount = debug ? Date.now() : 0;
const count = await this.baseRowRepo.countActiveRows(args.baseId, {
workspaceId: args.workspaceId,
});
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
phase: 'router.countActiveRows',
baseId: args.baseId.slice(0, 8),
countMs: Date.now() - tCount,
count,
minRows,
ms: Date.now() - tCount,
totalMs: Date.now() - tStart,
}),
);
}
if (count < minRows) {
return emit('postgres', `rowCount=${count} below MIN_ROWS=${minRows}`);
}
return emit(
'cache',
`qualified: rowCount=${count}, hasFilter=${hasFilter}, hasSort=${hasSorts}`,
);
}
}
@@ -0,0 +1,140 @@
import { Injectable, Logger } from '@nestjs/common';
import { BaseRepo } from '@docmost/db/repos/base/base.repo';
import { BasePropertyRepo } from '@docmost/db/repos/base/base-property.repo';
import { buildColumnSpecs } from './column-types';
import { buildLoaderSql } from './loader-sql';
import { baseSchemaName } from './schema-name';
import { DuckDbRuntime } from './duckdb-runtime';
import { QueryCacheConfigProvider } from './query-cache.config';
import { LoadedCollection } from './query-cache.types';
/*
* Loads a base into the shared DuckDB runtime as an attached in-memory
* database (`<schema>.rows`). Steps:
*
* 1. Attach a per-base schema.
* 2. Run `CREATE TABLE <schema>.rows AS SELECT ... FROM postgres_query(...)`
* via the writer connection — Postgres does the JSONB extraction.
* 3. Declare `PRIMARY KEY (id)` on the new table.
* 4. Build ART indexes on every indexable column.
* 5. Count rows and return a LoadedCollection metadata record.
*
* Error path: detach the schema before propagating the error, so we don't
* leak an empty attached DB into the runtime.
*/
@Injectable()
export class CollectionLoader {
private readonly logger = new Logger(CollectionLoader.name);
constructor(
private readonly baseRepo: BaseRepo,
private readonly basePropertyRepo: BasePropertyRepo,
private readonly runtime: DuckDbRuntime,
private readonly config: QueryCacheConfigProvider,
) {}
async load(baseId: string, workspaceId: string): Promise<LoadedCollection> {
if (!this.runtime.isReady()) {
throw new Error(
`Cannot load collection ${baseId}: duckdb runtime not ready. ` +
`Check DuckDbRuntime bootstrap logs.`,
);
}
const base = await this.baseRepo.findById(baseId);
if (!base) throw new Error(`Base ${baseId} not found`);
const schemaVersion = (base as any).schemaVersion ?? 1;
const properties = await this.basePropertyRepo.findByBaseId(baseId);
const specs = buildColumnSpecs(properties);
const schema = baseSchemaName(baseId);
await this.runtime.attachBase(schema);
try {
const writer = this.runtime.getWriter();
const sql = buildLoaderSql(specs, baseId, workspaceId, schema);
if (this.config.config.trace) {
console.log(
'[cache-trace]',
JSON.stringify({
phase: 'loader.sql',
baseId,
schema,
length: sql.length,
sql,
}),
);
}
await writer.run(sql);
await writer.run(`ALTER TABLE ${schema}.rows ADD PRIMARY KEY (id)`);
for (const spec of specs) {
if (!spec.indexable) continue;
const safe = spec.column.replace(/[^a-zA-Z0-9_]/g, '_');
const tIdx = this.config.config.trace ? Date.now() : 0;
await writer.run(
`CREATE INDEX ${schema}_${safe}_idx ON ${schema}.rows (${quoteIdent(spec.column)})`,
);
if (this.config.config.trace) {
console.log(
'[cache-trace]',
JSON.stringify({
phase: 'loader.index',
baseId,
schema,
column: spec.column,
ms: Date.now() - tIdx,
}),
);
}
}
const countResult = await writer.runAndReadAll(
`SELECT count(*) AS c FROM ${schema}.rows`,
);
const rowCount = Number(
(countResult.getRowObjects()[0] as { c: bigint | number }).c,
);
const approxBytes = estimateBytes(rowCount, specs.length);
this.logger.debug(
`Loaded ${rowCount} rows for base ${baseId} ` +
`(schemaVersion=${schemaVersion}, schema=${schema}, approxMB=${fmtMb(approxBytes)})`,
);
return {
baseId,
schema,
schemaVersion,
columns: specs,
lastAccessedAt: Date.now(),
rowCount,
approxBytes,
};
} catch (err) {
try {
await this.runtime.detachBase(schema);
} catch { /* swallow */ }
throw err;
}
}
}
function estimateBytes(rowCount: number, columnCount: number): number {
// Rough heuristic: ~64 bytes per cell (typed value + ART index entry
// overhead). Within 2x of actual for typical schemas; used for
// reporting only, not for eviction decisions.
return rowCount * columnCount * 64;
}
function fmtMb(bytes: number): string {
return (bytes / (1024 * 1024)).toFixed(1);
}
function quoteIdent(name: string): string {
return `"${name.replace(/"/g, '""')}"`;
}
@@ -0,0 +1,76 @@
import { BasePropertyType } from '../base.schemas';
import { buildColumnSpecs, SYSTEM_COLUMNS } from './column-types';
const p = (type: string, extra: Record<string, unknown> = {}) => ({
id: `prop-${type}`,
type,
typeOptions: extra,
}) as any;
describe('buildColumnSpecs', () => {
it('includes the fixed system columns first', () => {
const specs = buildColumnSpecs([]);
expect(specs.map((s) => s.column)).toEqual(SYSTEM_COLUMNS.map((s) => s.column));
});
it('maps text / url / email to VARCHAR indexable', () => {
for (const t of [BasePropertyType.TEXT, BasePropertyType.URL, BasePropertyType.EMAIL]) {
const specs = buildColumnSpecs([p(t)]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('VARCHAR');
expect(user.indexable).toBe(true);
}
});
it('maps number to DOUBLE indexable', () => {
const specs = buildColumnSpecs([p(BasePropertyType.NUMBER)]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('DOUBLE');
expect(user.indexable).toBe(true);
});
it('maps date to TIMESTAMPTZ indexable', () => {
const specs = buildColumnSpecs([p(BasePropertyType.DATE)]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('TIMESTAMPTZ');
expect(user.indexable).toBe(true);
});
it('maps checkbox to BOOLEAN indexable', () => {
const specs = buildColumnSpecs([p(BasePropertyType.CHECKBOX)]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('BOOLEAN');
});
it('maps select / status to VARCHAR indexable', () => {
for (const t of [BasePropertyType.SELECT, BasePropertyType.STATUS]) {
const specs = buildColumnSpecs([p(t)]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('VARCHAR');
expect(user.indexable).toBe(true);
}
});
it('maps multiSelect / file / multi-person to JSON non-indexable', () => {
for (const t of [BasePropertyType.MULTI_SELECT, BasePropertyType.FILE]) {
const specs = buildColumnSpecs([p(t)]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('JSON');
expect(user.indexable).toBe(false);
}
const specs = buildColumnSpecs([p(BasePropertyType.PERSON, { allowMultiple: true })]);
expect(specs[specs.length - 1].ddlType).toBe('JSON');
});
it('maps single-person to VARCHAR indexable when allowMultiple=false', () => {
const specs = buildColumnSpecs([p(BasePropertyType.PERSON, { allowMultiple: false })]);
const user = specs[specs.length - 1];
expect(user.ddlType).toBe('VARCHAR');
expect(user.indexable).toBe(true);
});
it('skips unknown property types', () => {
const specs = buildColumnSpecs([p('unknown-type-x')]);
expect(specs.length).toBe(SYSTEM_COLUMNS.length);
});
});
@@ -0,0 +1,63 @@
import { BasePropertyType, BasePropertyTypeValue } from '../base.schemas';
import { ColumnSpec } from './query-cache.types';
import type { BaseProperty } from '@docmost/db/types/entity.types';
export const SYSTEM_COLUMNS: ColumnSpec[] = [
{ column: 'id', ddlType: 'VARCHAR', indexable: false },
{ column: 'base_id', ddlType: 'VARCHAR', indexable: false },
{ column: 'workspace_id', ddlType: 'VARCHAR', indexable: false },
{ column: 'creator_id', ddlType: 'VARCHAR', indexable: false },
{ column: 'position', ddlType: 'VARCHAR', indexable: true },
{ column: 'created_at', ddlType: 'TIMESTAMPTZ', indexable: true },
{ column: 'updated_at', ddlType: 'TIMESTAMPTZ', indexable: true },
{ column: 'last_updated_by_id', ddlType: 'VARCHAR', indexable: true },
{ column: 'deleted_at', ddlType: 'TIMESTAMPTZ', indexable: false },
{ column: 'search_text', ddlType: 'VARCHAR', indexable: false },
];
type PropertyLike = Pick<BaseProperty, 'id' | 'type' | 'typeOptions'>;
export function buildColumnSpecs(properties: PropertyLike[]): ColumnSpec[] {
const out: ColumnSpec[] = [...SYSTEM_COLUMNS];
for (const prop of properties) {
const spec = buildUserColumn(prop);
if (spec) out.push(spec);
}
return out;
}
function buildUserColumn(prop: PropertyLike): ColumnSpec | null {
const t = prop.type as BasePropertyTypeValue;
switch (t) {
case BasePropertyType.TEXT:
case BasePropertyType.URL:
case BasePropertyType.EMAIL:
return { column: prop.id, ddlType: 'VARCHAR', indexable: true, property: prop };
case BasePropertyType.NUMBER:
return { column: prop.id, ddlType: 'DOUBLE', indexable: true, property: prop };
case BasePropertyType.DATE:
return { column: prop.id, ddlType: 'TIMESTAMPTZ', indexable: true, property: prop };
case BasePropertyType.CHECKBOX:
return { column: prop.id, ddlType: 'BOOLEAN', indexable: true, property: prop };
case BasePropertyType.SELECT:
case BasePropertyType.STATUS:
return { column: prop.id, ddlType: 'VARCHAR', indexable: true, property: prop };
case BasePropertyType.MULTI_SELECT:
case BasePropertyType.FILE:
return { column: prop.id, ddlType: 'JSON', indexable: false, property: prop };
case BasePropertyType.PERSON: {
const allowMultiple = !!(prop.typeOptions as any)?.allowMultiple;
return allowMultiple
? { column: prop.id, ddlType: 'JSON', indexable: false, property: prop }
: { column: prop.id, ddlType: 'VARCHAR', indexable: true, property: prop };
}
// System types are modelled as system columns on base_rows — do not add
// a per-property column for them. They're already in SYSTEM_COLUMNS.
case BasePropertyType.CREATED_AT:
case BasePropertyType.LAST_EDITED_AT:
case BasePropertyType.LAST_EDITED_BY:
return null;
default:
return null;
}
}
@@ -0,0 +1,75 @@
import { ConnectionPool } from './connection-pool';
describe('ConnectionPool', () => {
it('hands out an available resource immediately', async () => {
const pool = new ConnectionPool<string>();
pool.init(['a', 'b']);
expect(await pool.acquire()).toBe('b');
expect(await pool.acquire()).toBe('a');
});
it('a waiter is resolved by the next release', async () => {
const pool = new ConnectionPool<string>();
pool.init(['only']);
const first = await pool.acquire();
let resolved: string | null = null;
const secondP = pool.acquire().then((v) => (resolved = v));
expect(resolved).toBeNull();
pool.release(first);
await secondP;
expect(resolved).toBe('only');
});
it('FIFO among waiters (fair under contention)', async () => {
const pool = new ConnectionPool<string>();
pool.init(['only']);
const held = await pool.acquire();
const order: number[] = [];
const p1 = pool.acquire().then(() => order.push(1));
const p2 = pool.acquire().then(() => order.push(2));
const p3 = pool.acquire().then(() => order.push(3));
pool.release(held);
await p1;
pool.release('only'); // re-release the value that p1 got (simulated)
await p2;
pool.release('only');
await p3;
expect(order).toEqual([1, 2, 3]);
});
it('withResource acquires, invokes callback, and releases even on throw', async () => {
const pool = new ConnectionPool<string>();
pool.init(['one']);
let called = false;
await expect(
pool.withResource(async (v) => {
called = true;
expect(v).toBe('one');
throw new Error('boom');
}),
).rejects.toThrow('boom');
expect(called).toBe(true);
// resource should be back in the pool
expect(await pool.acquire()).toBe('one');
});
it('size() reports the initial count regardless of check-outs', () => {
const pool = new ConnectionPool<string>();
pool.init(['a', 'b', 'c']);
expect(pool.size()).toBe(3);
});
it('close() returns all held resources and rejects pending waiters', async () => {
const pool = new ConnectionPool<string>();
pool.init(['only']);
const first = await pool.acquire();
const pending = pool.acquire();
pending.catch(() => {}); // Attach catch to prevent unhandled rejection
const closed = pool.close();
expect(closed).toEqual([]); // No free resources (one is checked out)
await expect(pending).rejects.toThrow(/closed/i);
});
});
@@ -0,0 +1,86 @@
type Waiter<T> = {
resolve: (value: T) => void;
reject: (err: Error) => void;
};
/*
* A minimal async resource pool. No external deps. Semantics:
*
* - `acquire()` returns an available resource immediately, or a Promise
* that resolves when one is released.
* - `release(r)` returns a resource. If there are pending waiters, hands
* to the FIFO-first one. Otherwise returns to the free list.
* - `withResource(fn)` acquires, invokes, and releases — releases even
* if `fn` throws.
* - `close()` rejects all pending waiters and returns the currently-free
* resources so the owner can release them. Already-checked-out
* resources are the caller's responsibility to finish with and re-release
* (they'll get a no-op release, the pool being closed).
*
* Initial size is set via `init(resources)`. Resources must not be checked
* out before `init` is called. `size()` reports the canonical count (does
* not decrement on acquire).
*/
export class ConnectionPool<T> {
private free: T[] = [];
private waiters: Waiter<T>[] = [];
private initialCount = 0;
private closed = false;
init(resources: T[]): void {
if (this.initialCount !== 0) {
throw new Error('ConnectionPool already initialised');
}
this.free = [...resources];
this.initialCount = resources.length;
}
size(): number {
return this.initialCount;
}
async acquire(): Promise<T> {
if (this.closed) {
throw new Error('ConnectionPool is closed');
}
if (this.free.length > 0) {
return this.free.pop()!;
}
return new Promise<T>((resolve, reject) => {
this.waiters.push({ resolve, reject });
});
}
release(resource: T): void {
if (this.closed) {
// Drop; caller expected this
return;
}
const waiter = this.waiters.shift();
if (waiter) {
waiter.resolve(resource);
} else {
this.free.push(resource);
}
}
async withResource<R>(fn: (resource: T) => Promise<R>): Promise<R> {
const resource = await this.acquire();
try {
return await fn(resource);
} finally {
this.release(resource);
}
}
close(): T[] {
this.closed = true;
for (const waiter of this.waiters) {
waiter.reject(new Error('ConnectionPool is closed'));
}
this.waiters = [];
const remaining = this.free;
this.free = [];
return remaining;
}
}
@@ -0,0 +1,183 @@
import { buildColumnSpecs } from './column-types';
import { buildDuckDbListQuery } from './duckdb-query-builder';
import { BasePropertyType } from '../base.schemas';
const SCHEMA = 'b_019c69a3dd4770148b87ec8f1675aaaa';
const numericProp = {
id: '00000000-0000-0000-0000-000000000001',
type: BasePropertyType.NUMBER,
typeOptions: {},
} as any;
const textProp = {
id: '00000000-0000-0000-0000-000000000002',
type: BasePropertyType.TEXT,
typeOptions: {},
} as any;
const columns = buildColumnSpecs([numericProp, textProp]);
describe('buildDuckDbListQuery', () => {
it('renders no-filter, no-sort, no-search as live-rows-paginated-by-position', () => {
const { sql, params } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
pagination: { limit: 100 },
});
expect(sql).toContain(`FROM ${SCHEMA}.rows`);
expect(sql).toMatch(/deleted_at IS NULL/);
expect(sql).toMatch(/ORDER BY position ASC, id ASC/);
expect(sql).toMatch(/LIMIT 101/);
expect(params).toEqual([]);
});
it('renders numeric gt filter with parameterized value', () => {
const { sql, params } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
filter: {
op: 'and',
children: [{ propertyId: numericProp.id, op: 'gt', value: 42 }],
},
pagination: { limit: 100 },
});
expect(sql).toMatch(new RegExp(`"${numericProp.id}" > \\?`));
expect(params).toContain(42);
});
it('renders text contains with ILIKE and escaped wildcards', () => {
const { sql, params } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
filter: {
op: 'and',
children: [{ propertyId: textProp.id, op: 'contains', value: 'a_b%c' }],
},
pagination: { limit: 100 },
});
expect(sql).toMatch(/ILIKE \?/);
expect(params).toContain('%a\\_b\\%c%');
});
it('renders sort with sentinel wrapping and cursor keyset', () => {
const { sql } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
sorts: [{ propertyId: numericProp.id, direction: 'asc' }],
pagination: {
limit: 50,
afterKeys: { s0: 10, position: 'A0', id: '00000000-0000-0000-0000-0000000000aa' },
},
});
expect(sql).toMatch(/COALESCE\("[0-9a-f-]+", '?[Ii]nfinity'?::[A-Z]+\) AS s0/);
expect(sql).toMatch(/ORDER BY s0 ASC, position ASC, id ASC/);
// keyset OR-chain
expect(sql).toMatch(/s0 > \?/);
});
it('renders search in trgm mode as ILIKE on search_text', () => {
const { sql, params } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
search: { mode: 'trgm', query: 'hello' },
pagination: { limit: 10 },
});
expect(sql).toMatch(/search_text ILIKE \?/);
expect(params).toContain('%hello%');
});
it('renders multi-select any filter with json_contains and to_json binding', () => {
const multiProp = {
id: '00000000-0000-0000-0000-000000000010',
type: BasePropertyType.MULTI_SELECT,
typeOptions: {},
} as any;
const cols = buildColumnSpecs([multiProp]);
const choiceA = 'choice-uuid-aaa';
const choiceB = 'choice-uuid-bbb';
const { sql, params } = buildDuckDbListQuery({
schema: SCHEMA,
columns: cols,
filter: {
op: 'and',
children: [{ propertyId: multiProp.id, op: 'any', value: [choiceA, choiceB] }],
},
pagination: { limit: 100 },
});
expect(sql).toMatch(/json_contains\("[0-9a-f-]+", to_json\(\?\)\)/);
expect(sql).not.toMatch(/json_array_contains/);
expect(params).toContain(choiceA);
expect(params).toContain(choiceB);
});
it('renders nested AND/OR groups with correct parentheses', () => {
const { sql } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
filter: {
op: 'or',
children: [
{ op: 'and', children: [{ propertyId: numericProp.id, op: 'gt', value: 1 }] },
{ op: 'and', children: [{ propertyId: textProp.id, op: 'eq', value: 'x' }] },
],
},
pagination: { limit: 100 },
});
expect(sql).toMatch(/\(\(.+\) OR \(.+\)\)/);
});
it('handles empty filter group without emitting WHERE on it', () => {
const { sql, params } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
filter: { op: 'and', children: [] },
pagination: { limit: 100 },
});
// either WHERE clause elided entirely, or group becomes TRUE
expect(sql).toMatch(/deleted_at IS NULL/);
expect(params).toEqual([]);
});
it('renders multi-sort keyset with s0, s1, position, id chain', () => {
const { sql } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
sorts: [
{ propertyId: numericProp.id, direction: 'asc' },
{ propertyId: textProp.id, direction: 'desc' },
],
pagination: {
limit: 10,
afterKeys: { s0: 10, s1: 'abc', position: 'A0', id: '00000000-0000-0000-0000-0000000000aa' },
},
});
expect(sql).toMatch(/AS s0/);
expect(sql).toMatch(/AS s1/);
expect(sql).toMatch(/ORDER BY s0 ASC, s1 DESC, position ASC, id ASC/);
expect(sql).toMatch(/s0 > \?/);
expect(sql).toMatch(/s1 < \?/); // desc → less-than
});
it('renders text isEmpty as IS NULL OR = empty-string', () => {
const { sql } = buildDuckDbListQuery({
schema: SCHEMA,
columns,
filter: {
op: 'and',
children: [{ propertyId: textProp.id, op: 'isEmpty' }],
},
pagination: { limit: 10 },
});
expect(sql).toMatch(new RegExp(`"${textProp.id}" IS NULL`));
});
it('rejects invalid schema name', () => {
expect(() =>
buildDuckDbListQuery({
schema: 'bad name',
columns: [],
pagination: { limit: 10 },
}),
).toThrow(/invalid schema/i);
});
});
@@ -0,0 +1,637 @@
import { BasePropertyType } from '../base.schemas';
import {
Condition,
FilterNode,
SearchSpec,
SortSpec,
} from '../engine/schema.zod';
import { escapeIlike } from '../engine/extractors';
import { PropertyKind, propertyKind } from '../engine/kinds';
import { ColumnSpec } from './query-cache.types';
export type AfterKeys = Record<string, unknown>;
export type DuckDbListQueryOpts = {
schema: string;
columns: ColumnSpec[];
filter?: FilterNode;
sorts?: SortSpec[];
search?: SearchSpec;
pagination: { limit: number; afterKeys?: AfterKeys };
};
export type DuckDbListQuery = {
sql: string;
params: unknown[];
};
export class FtsNotSupportedInCache extends Error {
constructor() {
super('FTS search mode is not supported in the DuckDB query cache');
this.name = 'FtsNotSupportedInCache';
}
}
type ColumnIndex = {
byId: Map<string, ColumnSpec>;
userColumns: ColumnSpec[];
};
type SortBuild = {
key: string;
expression: string;
direction: 'asc' | 'desc';
};
// System property type → DuckDB system column name. Mirrors
// engine/kinds.SYSTEM_COLUMN but in snake_case (DuckDB table uses
// snake_case columns; the engine relies on Kysely's camel-case plugin).
const SYSTEM_COLUMN_DUCK: Record<string, 'created_at' | 'updated_at' | 'last_updated_by_id'> = {
[BasePropertyType.CREATED_AT]: 'created_at',
[BasePropertyType.LAST_EDITED_AT]: 'updated_at',
[BasePropertyType.LAST_EDITED_BY]: 'last_updated_by_id',
};
export function buildDuckDbListQuery(
opts: DuckDbListQueryOpts,
): DuckDbListQuery {
if (!/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(opts.schema)) {
throw new Error(`Invalid schema name "${opts.schema}"`);
}
const rowsTable = `${opts.schema}.rows`;
const index = indexColumns(opts.columns);
const params: unknown[] = [];
const whereClauses: string[] = ['deleted_at IS NULL'];
if (opts.search) {
whereClauses.push(buildSearch(opts.search, params));
}
if (opts.filter) {
const filterSql = buildFilter(opts.filter, index, params);
if (filterSql) whereClauses.push(filterSql);
}
const sortBuilds = buildSorts(opts.sorts ?? [], index);
const selectParts: string[] = buildSelect(index, sortBuilds);
if (opts.pagination.afterKeys) {
whereClauses.push(
buildKeyset(opts.pagination.afterKeys, sortBuilds, params),
);
}
const orderByParts: string[] = [
...sortBuilds.map((s) => `${s.key} ${s.direction.toUpperCase()}`),
'position ASC',
'id ASC',
];
const sql =
`SELECT ${selectParts.join(', ')}` +
` FROM ${rowsTable}` +
` WHERE ${whereClauses.join(' AND ')}` +
` ORDER BY ${orderByParts.join(', ')}` +
` LIMIT ${opts.pagination.limit + 1}`;
return { sql, params };
}
// --- select projection -------------------------------------------------
function buildSelect(index: ColumnIndex, sortBuilds: SortBuild[]): string[] {
const parts: string[] = [
'id',
'base_id',
'position',
'creator_id',
'last_updated_by_id',
'workspace_id',
'created_at',
'updated_at',
'deleted_at',
];
for (const col of index.userColumns) {
parts.push(quoteIdent(col.column));
}
for (const sb of sortBuilds) {
parts.push(`${sb.expression} AS ${sb.key}`);
}
return parts;
}
// --- filter ------------------------------------------------------------
function buildFilter(
node: FilterNode,
index: ColumnIndex,
params: unknown[],
): string {
if ('children' in node) {
if (node.children.length === 0) return 'TRUE';
const built = node.children
.map((c) => buildFilter(c, index, params))
.filter((s) => s.length > 0);
if (built.length === 0) return 'TRUE';
const joiner = node.op === 'and' ? ' AND ' : ' OR ';
return `(${built.join(joiner)})`;
}
return buildCondition(node, index, params);
}
function buildCondition(
cond: Condition,
index: ColumnIndex,
params: unknown[],
): string {
const col = index.byId.get(cond.propertyId);
if (!col) return 'FALSE';
const propType = col.property?.type;
if (propType && SYSTEM_COLUMN_DUCK[propType]) {
return systemCondition(SYSTEM_COLUMN_DUCK[propType], cond, params);
}
const kind = propType ? propertyKind(propType) : null;
if (!kind) return 'FALSE';
const colRef = quoteIdent(col.column);
switch (kind) {
case PropertyKind.TEXT:
return textCondition(colRef, cond, params);
case PropertyKind.NUMERIC:
return numericCondition(colRef, cond, params);
case PropertyKind.DATE:
return dateCondition(colRef, cond, params);
case PropertyKind.BOOL:
return boolCondition(colRef, cond, params);
case PropertyKind.SELECT:
return selectCondition(colRef, cond, params);
case PropertyKind.MULTI:
return arrayOfIdsCondition(colRef, cond, params);
case PropertyKind.PERSON: {
const allowMultiple = !!(col.property?.typeOptions as any)?.allowMultiple;
return allowMultiple
? arrayOfIdsCondition(colRef, cond, params)
: selectCondition(colRef, cond, params);
}
case PropertyKind.FILE:
return arrayOfIdsCondition(colRef, cond, params);
default:
return 'FALSE';
}
}
function textCondition(
colRef: string,
cond: Condition,
params: unknown[],
): string {
const val = cond.value;
switch (cond.op) {
case 'isEmpty':
return `(${colRef} IS NULL OR ${colRef} = '')`;
case 'isNotEmpty':
return `(${colRef} IS NOT NULL AND ${colRef} != '')`;
case 'eq':
if (val == null) return 'FALSE';
params.push(String(val));
return `${colRef} = ?`;
case 'neq':
if (val == null) return 'FALSE';
params.push(String(val));
return `(${colRef} IS NULL OR ${colRef} != ?)`;
case 'contains':
if (val == null) return 'FALSE';
params.push(`%${escapeIlike(String(val))}%`);
return `${colRef} ILIKE ?`;
case 'ncontains':
if (val == null) return 'FALSE';
params.push(`%${escapeIlike(String(val))}%`);
return `(${colRef} IS NULL OR ${colRef} NOT ILIKE ?)`;
case 'startsWith':
if (val == null) return 'FALSE';
params.push(`${escapeIlike(String(val))}%`);
return `${colRef} ILIKE ?`;
case 'endsWith':
if (val == null) return 'FALSE';
params.push(`%${escapeIlike(String(val))}`);
return `${colRef} ILIKE ?`;
default:
return 'FALSE';
}
}
function numericCondition(
colRef: string,
cond: Condition,
params: unknown[],
): string {
const raw = cond.value;
const num = raw == null ? null : Number(raw);
const bad = num == null || Number.isNaN(num);
switch (cond.op) {
case 'isEmpty':
return `${colRef} IS NULL`;
case 'isNotEmpty':
return `${colRef} IS NOT NULL`;
case 'eq':
if (bad) return 'FALSE';
params.push(num);
return `${colRef} = ?`;
case 'neq':
if (bad) return 'FALSE';
params.push(num);
return `(${colRef} IS NULL OR ${colRef} != ?)`;
case 'gt':
if (bad) return 'FALSE';
params.push(num);
return `${colRef} > ?`;
case 'gte':
if (bad) return 'FALSE';
params.push(num);
return `${colRef} >= ?`;
case 'lt':
if (bad) return 'FALSE';
params.push(num);
return `${colRef} < ?`;
case 'lte':
if (bad) return 'FALSE';
params.push(num);
return `${colRef} <= ?`;
default:
return 'FALSE';
}
}
function dateCondition(
colRef: string,
cond: Condition,
params: unknown[],
): string {
const raw = cond.value;
const bad = raw == null || raw === '';
switch (cond.op) {
case 'isEmpty':
return `${colRef} IS NULL`;
case 'isNotEmpty':
return `${colRef} IS NOT NULL`;
case 'eq':
if (bad) return 'FALSE';
params.push(String(raw));
return `${colRef} = ?`;
case 'neq':
if (bad) return 'FALSE';
params.push(String(raw));
return `(${colRef} IS NULL OR ${colRef} != ?)`;
case 'before':
if (bad) return 'FALSE';
params.push(String(raw));
return `${colRef} < ?`;
case 'after':
if (bad) return 'FALSE';
params.push(String(raw));
return `${colRef} > ?`;
case 'onOrBefore':
if (bad) return 'FALSE';
params.push(String(raw));
return `${colRef} <= ?`;
case 'onOrAfter':
if (bad) return 'FALSE';
params.push(String(raw));
return `${colRef} >= ?`;
default:
return 'FALSE';
}
}
function boolCondition(
colRef: string,
cond: Condition,
params: unknown[],
): string {
switch (cond.op) {
case 'isEmpty':
return `${colRef} IS NULL`;
case 'isNotEmpty':
return `${colRef} IS NOT NULL`;
case 'eq':
if (cond.value == null) return 'FALSE';
params.push(Boolean(cond.value));
return `${colRef} = ?`;
case 'neq':
if (cond.value == null) return 'FALSE';
params.push(Boolean(cond.value));
return `(${colRef} IS NULL OR ${colRef} != ?)`;
default:
return 'FALSE';
}
}
function selectCondition(
colRef: string,
cond: Condition,
params: unknown[],
): string {
const val = cond.value;
switch (cond.op) {
case 'isEmpty':
return `(${colRef} IS NULL OR ${colRef} = '')`;
case 'isNotEmpty':
return `(${colRef} IS NOT NULL AND ${colRef} != '')`;
case 'eq':
if (val == null) return 'FALSE';
params.push(String(val));
return `${colRef} = ?`;
case 'neq':
if (val == null) return 'FALSE';
params.push(String(val));
return `(${colRef} IS NULL OR ${colRef} != ?)`;
case 'any': {
const arr = asStringArray(val);
if (arr.length === 0) return 'FALSE';
const placeholders = arr.map(() => '?').join(', ');
for (const v of arr) params.push(v);
return `${colRef} IN (${placeholders})`;
}
case 'none': {
const arr = asStringArray(val);
if (arr.length === 0) return 'TRUE';
const placeholders = arr.map(() => '?').join(', ');
for (const v of arr) params.push(v);
return `(${colRef} IS NULL OR ${colRef} NOT IN (${placeholders}))`;
}
default:
return 'FALSE';
}
}
function arrayOfIdsCondition(
colRef: string,
cond: Condition,
params: unknown[],
): string {
const val = cond.value;
switch (cond.op) {
case 'isEmpty':
return `(${colRef} IS NULL OR json_array_length(${colRef}) = 0)`;
case 'isNotEmpty':
return `(${colRef} IS NOT NULL AND json_array_length(${colRef}) > 0)`;
case 'any': {
const arr = asStringArray(val);
if (arr.length === 0) return 'FALSE';
const legs = arr.map(() => jsonArrayContains(colRef, '?'));
for (const v of arr) params.push(v);
return `(${legs.join(' OR ')})`;
}
case 'all': {
const arr = asStringArray(val);
if (arr.length === 0) return 'TRUE';
const legs = arr.map(() => jsonArrayContains(colRef, '?'));
for (const v of arr) params.push(v);
return `(${legs.join(' AND ')})`;
}
case 'none': {
const arr = asStringArray(val);
if (arr.length === 0) return 'TRUE';
const legs = arr.map(() => jsonArrayContains(colRef, '?'));
for (const v of arr) params.push(v);
return `(${colRef} IS NULL OR NOT (${legs.join(' OR ')}))`;
}
default:
return 'FALSE';
}
}
function systemCondition(
column: 'created_at' | 'updated_at' | 'last_updated_by_id',
cond: Condition,
params: unknown[],
): string {
const val = cond.value;
if (column === 'last_updated_by_id') {
switch (cond.op) {
case 'isEmpty':
return `${column} IS NULL`;
case 'isNotEmpty':
return `${column} IS NOT NULL`;
case 'eq':
if (val == null) return 'FALSE';
params.push(String(val));
return `${column} = ?`;
case 'neq':
if (val == null) return 'FALSE';
params.push(String(val));
return `(${column} IS NULL OR ${column} != ?)`;
case 'any': {
const arr = asStringArray(val);
if (arr.length === 0) return 'FALSE';
const placeholders = arr.map(() => '?').join(', ');
for (const v of arr) params.push(v);
return `${column} IN (${placeholders})`;
}
case 'none': {
const arr = asStringArray(val);
if (arr.length === 0) return 'TRUE';
const placeholders = arr.map(() => '?').join(', ');
for (const v of arr) params.push(v);
return `(${column} IS NULL OR ${column} NOT IN (${placeholders}))`;
}
default:
return 'FALSE';
}
}
const bad = val == null || val === '';
switch (cond.op) {
case 'isEmpty':
return 'FALSE';
case 'isNotEmpty':
return 'TRUE';
case 'eq':
if (bad) return 'FALSE';
params.push(String(val));
return `${column} = ?`;
case 'neq':
if (bad) return 'FALSE';
params.push(String(val));
return `${column} != ?`;
case 'before':
if (bad) return 'FALSE';
params.push(String(val));
return `${column} < ?`;
case 'after':
if (bad) return 'FALSE';
params.push(String(val));
return `${column} > ?`;
case 'onOrBefore':
if (bad) return 'FALSE';
params.push(String(val));
return `${column} <= ?`;
case 'onOrAfter':
if (bad) return 'FALSE';
params.push(String(val));
return `${column} >= ?`;
default:
return 'FALSE';
}
}
// --- sort --------------------------------------------------------------
function buildSorts(sorts: SortSpec[], index: ColumnIndex): SortBuild[] {
const out: SortBuild[] = [];
for (let i = 0; i < sorts.length; i++) {
const s = sorts[i];
const col = index.byId.get(s.propertyId);
if (!col) continue;
const key = `s${i}`;
const propType = col.property?.type;
const sys = propType ? SYSTEM_COLUMN_DUCK[propType] : undefined;
if (sys) {
out.push({ key, expression: sys, direction: s.direction });
continue;
}
const kind = propType ? propertyKind(propType) : null;
if (!kind) continue;
out.push(wrapWithSentinel(col.column, kind, s.direction, key));
}
return out;
}
function wrapWithSentinel(
column: string,
kind: ReturnType<typeof propertyKind>,
direction: 'asc' | 'desc',
key: string,
): SortBuild {
const colRef = quoteIdent(column);
let sentinel: string;
if (kind === PropertyKind.NUMERIC) {
sentinel = direction === 'asc' ? `'Infinity'::DOUBLE` : `'-Infinity'::DOUBLE`;
} else if (kind === PropertyKind.DATE) {
sentinel =
direction === 'asc'
? `'9999-12-31 23:59:59+00'::TIMESTAMPTZ`
: `'0001-01-01 00:00:00+00'::TIMESTAMPTZ`;
} else if (kind === PropertyKind.BOOL) {
sentinel = direction === 'asc' ? 'TRUE' : 'FALSE';
} else {
// TEXT / SELECT / MULTI / PERSON / FILE — sort by the column's raw text
// representation; JSON-typed list columns will stringify in DuckDB
// lexicographically, matching the Postgres engine's text extractor.
sentinel = direction === 'asc' ? 'CHR(1114111)' : `''`;
}
return {
key,
expression: `COALESCE(${colRef}, ${sentinel})`,
direction,
};
}
// --- search ------------------------------------------------------------
function buildSearch(spec: SearchSpec, params: unknown[]): string {
const q = spec.query.trim();
if (!q) return 'TRUE';
if (spec.mode === 'fts') {
throw new FtsNotSupportedInCache();
}
params.push(`%${escapeIlike(q)}%`);
return `search_text ILIKE ?`;
}
// --- keyset ------------------------------------------------------------
function buildKeyset(
afterKeys: AfterKeys,
sortBuilds: SortBuild[],
params: unknown[],
): string {
// Keys in the same order as ORDER BY: s0..sN, then position, then id.
// Mirrors cursor-pagination.ts `applyCursor`: builds the lexicographic
// OR-chain from tail to head, wrapping each step as
// `(fi > v) OR (fi = v AND <tail>)`.
//
// Param binding is positional (1-based `?`). Placeholders appear
// left-to-right in the final SQL as: leg0(head), leg0(tie), leg1(head),
// leg1(tie), ..., legN(head). We therefore collect the per-leg params
// first, then flatten in head→tail order at the end.
type Leg = { key: string; expression: string; direction: 'asc' | 'desc' };
const legs: Leg[] = [
...sortBuilds.map((s) => ({
key: s.key,
expression: s.key,
direction: s.direction,
})),
{ key: 'position', expression: 'position', direction: 'asc' },
{ key: 'id', expression: 'id', direction: 'asc' },
];
// Skip legs whose key is absent from afterKeys (shouldn't happen for
// well-formed cursors, but keeps the builder defensive).
const usable = legs.filter((l) => l.key in afterKeys);
if (usable.length === 0) return 'TRUE';
// legParams[i] = [value, value?] — one push for the head `>` or `<`,
// one more push for the tie `=` on every leg except the last.
const legParams: unknown[][] = [];
let expr = '';
for (let i = usable.length - 1; i >= 0; i--) {
const leg = usable[i];
const value = afterKeys[leg.key];
const cmp = leg.direction === 'asc' ? '>' : '<';
const head = `${leg.expression} ${cmp} ?`;
if (!expr) {
legParams[i] = [value];
expr = head;
continue;
}
legParams[i] = [value, value];
const tie = `${leg.expression} = ?`;
expr = `(${head} OR (${tie} AND ${expr}))`;
}
// Flatten legs in head→tail (placeholder) order.
for (const values of legParams) {
for (const v of values) params.push(v);
}
return expr;
}
// --- utilities ---------------------------------------------------------
function indexColumns(columns: ColumnSpec[]): ColumnIndex {
const byId = new Map<string, ColumnSpec>();
const userColumns: ColumnSpec[] = [];
for (const c of columns) {
if (c.property) {
byId.set(c.property.id, c);
userColumns.push(c);
}
}
return { byId, userColumns };
}
function quoteIdent(name: string): string {
return `"${name.replace(/"/g, '""')}"`;
}
function jsonArrayContains(colRef: string, paramPlaceholder: string): string {
return `json_contains(${colRef}, to_json(${paramPlaceholder}))`;
}
function asStringArray(val: unknown): string[] {
if (val == null) return [];
if (Array.isArray(val)) return val.filter((v) => v != null).map(String);
return [String(val)];
}
@@ -0,0 +1,117 @@
import { DuckDbRuntime } from './duckdb-runtime';
import { QueryCacheConfigProvider } from './query-cache.config';
const makeConfig = (
overrides: Partial<QueryCacheConfigProvider['config']> = {},
): QueryCacheConfigProvider =>
({
config: {
enabled: true,
minRows: 25_000,
maxCollections: 50,
warmTopN: 50,
memoryLimit: '256MB',
threads: 2,
tempDirectory: `${require('node:os').tmpdir()}/docmost-duckdb-runtime-test`,
trace: false,
readerPoolSize: 2,
...overrides,
},
}) as unknown as QueryCacheConfigProvider;
const makeEnv = (): { getDatabaseURL: () => string } => ({
getDatabaseURL: () => process.env.DATABASE_URL ?? '',
});
describe('DuckDbRuntime', () => {
it('no-ops when the cache is disabled', async () => {
const rt = new DuckDbRuntime(makeConfig({ enabled: false }), makeEnv() as any);
await rt.onApplicationBootstrap();
expect(rt.isReady()).toBe(false);
await rt.onModuleDestroy();
});
it('bootstraps instance, extension, PG attach, and reader pool', async () => {
const rt = new DuckDbRuntime(makeConfig(), makeEnv() as any);
await rt.onApplicationBootstrap();
expect(rt.isReady()).toBe(true);
expect(rt.readerPoolSize()).toBe(2);
await rt.onModuleDestroy();
});
it('attachBase creates a per-base schema and detachBase removes it', async () => {
const rt = new DuckDbRuntime(makeConfig(), makeEnv() as any);
await rt.onApplicationBootstrap();
try {
const schema = 'b_testaaaaaaaaaaaaaaaaaaaaaaaaaa';
await rt.attachBase(schema);
await rt.getWriter().run(`CREATE TABLE ${schema}.t (x INTEGER)`);
await rt.getWriter().run(`INSERT INTO ${schema}.t VALUES (1), (2), (3)`);
const res = await rt
.getWriter()
.runAndReadAll(`SELECT count(*) AS c FROM ${schema}.t`);
const row = res.getRowObjects()[0] as { c: bigint | number };
expect(Number(row.c)).toBe(3);
await rt.detachBase(schema);
await expect(
rt.getWriter().run(`SELECT count(*) FROM ${schema}.t`),
).rejects.toThrow();
} finally {
await rt.onModuleDestroy();
}
});
it('withReader parallelises across pool', async () => {
const rt = new DuckDbRuntime(makeConfig({ readerPoolSize: 2 }), makeEnv() as any);
await rt.onApplicationBootstrap();
try {
const started: string[] = [];
const ended: string[] = [];
const p1 = rt.withReader(async (conn) => {
started.push('a');
await new Promise((r) => setTimeout(r, 50));
await conn.runAndReadAll('SELECT 1');
ended.push('a');
});
const p2 = rt.withReader(async (conn) => {
started.push('b');
await new Promise((r) => setTimeout(r, 50));
await conn.runAndReadAll('SELECT 1');
ended.push('b');
});
await Promise.all([p1, p2]);
expect(new Set(started)).toEqual(new Set(['a', 'b']));
expect(started.length).toBe(2);
expect(ended.length).toBe(2);
} finally {
await rt.onModuleDestroy();
}
});
it('withReader on a 3rd concurrent request with pool=2 queues correctly', async () => {
const rt = new DuckDbRuntime(makeConfig({ readerPoolSize: 2 }), makeEnv() as any);
await rt.onApplicationBootstrap();
try {
const order: number[] = [];
const makeOne = (n: number, delayMs: number) =>
rt.withReader(async () => {
await new Promise((r) => setTimeout(r, delayMs));
order.push(n);
});
const p1 = makeOne(1, 40);
const p2 = makeOne(2, 40);
const p3 = makeOne(3, 5);
await Promise.all([p1, p2, p3]);
expect(order.length).toBe(3);
expect(order.indexOf(3)).toBeGreaterThan(0);
} finally {
await rt.onModuleDestroy();
}
});
it('getWriter throws if not ready', () => {
const rt = new DuckDbRuntime(makeConfig(), makeEnv() as any);
expect(() => rt.getWriter()).toThrow(/not ready/i);
});
});
@@ -0,0 +1,211 @@
import {
Injectable,
Logger,
OnApplicationBootstrap,
OnModuleDestroy,
} from '@nestjs/common';
import { DuckDBInstance, DuckDBConnection } from '@duckdb/node-api';
import * as fs from 'node:fs';
import { QueryCacheConfigProvider } from './query-cache.config';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
import { ConnectionPool } from './connection-pool';
/*
* DuckDbRuntime
* -------------
* Owns the process-wide DuckDB instance and everything attached to it:
*
* - One `DuckDBInstance` at `:memory:` with `memory_limit`, `threads`,
* `temp_directory` configured from env.
* - One writer `DuckDBConnection` for ATTACH/DETACH/CREATE TABLE/INSERT.
* - A pool of N reader connections for SELECTs; `withReader(fn)` lends
* one out, runs the callback, returns it — fair FIFO under contention.
* - The `postgres` extension is installed + loaded once, not per-base.
* - A single long-lived ATTACH against Postgres (READ_ONLY). All loaders
* reference `postgres_query('pg', $pgsql$ ... $pgsql$)` without doing
* their own attach/detach.
*
* When the query cache is disabled (`config.enabled === false`), the
* runtime is a no-op: nothing is created, `isReady()` returns false, and
* every consumer's own gate prevents it from touching the runtime.
*/
@Injectable()
export class DuckDbRuntime implements OnApplicationBootstrap, OnModuleDestroy {
private readonly logger = new Logger(DuckDbRuntime.name);
private instance: DuckDBInstance | null = null;
private writer: DuckDBConnection | null = null;
private readonly readerPool = new ConnectionPool<DuckDBConnection>();
private readonly attachedSchemas = new Set<string>();
private ready = false;
private bootstrapFailure: string | null = null;
constructor(
private readonly configProvider: QueryCacheConfigProvider,
private readonly env: EnvironmentService,
) {}
async onApplicationBootstrap(): Promise<void> {
const config = this.configProvider.config;
if (!config.enabled) {
this.logger.log('query cache disabled; skipping duckdb runtime bootstrap');
return;
}
const dbUrl = this.env.getDatabaseURL();
if (!dbUrl) {
this.bootstrapFailure = 'DATABASE_URL is empty';
this.logger.error('DuckDbRuntime cannot bootstrap: DATABASE_URL is empty');
return;
}
try {
fs.mkdirSync(config.tempDirectory, { recursive: true });
} catch {
/* swallow */
}
try {
this.instance = await DuckDBInstance.create(':memory:', {
memory_limit: config.memoryLimit,
threads: String(config.threads),
temp_directory: config.tempDirectory,
});
this.writer = await this.instance.connect();
await this.writer.run('SET preserve_insertion_order = false');
await this.writer.run('INSTALL postgres');
await this.writer.run('LOAD postgres');
await this.writer.run(
`ATTACH ${escapeSqlString(dbUrl)} AS pg (TYPE POSTGRES, READ_ONLY)`,
);
const readers: DuckDBConnection[] = [];
for (let i = 0; i < config.readerPoolSize; i++) {
const reader = await this.instance.connect();
await reader.run('SET preserve_insertion_order = false');
readers.push(reader);
}
this.readerPool.init(readers);
this.ready = true;
this.logger.log(
`DuckDbRuntime ready (readers=${config.readerPoolSize}, memory_limit=${config.memoryLimit})`,
);
} catch (err) {
const error = err as Error;
this.bootstrapFailure = error.message;
this.logger.error(`DuckDbRuntime bootstrap failed: ${error.message}`);
if (error.stack) this.logger.error(error.stack);
this.ready = false;
try {
this.readerPool.close().forEach((c) => c.closeSync());
} catch { /* swallow */ }
try {
this.writer?.closeSync();
} catch { /* swallow */ }
try {
this.instance?.closeSync();
} catch { /* swallow */ }
this.writer = null;
this.instance = null;
}
}
async onModuleDestroy(): Promise<void> {
for (const c of this.readerPool.close()) {
try {
c.closeSync();
} catch { /* swallow */ }
}
if (this.writer) {
try {
this.writer.closeSync();
} catch { /* swallow */ }
this.writer = null;
}
if (this.instance) {
try {
this.instance.closeSync();
} catch { /* swallow */ }
this.instance = null;
}
this.attachedSchemas.clear();
this.ready = false;
}
isReady(): boolean {
return this.ready;
}
readerPoolSize(): number {
return this.readerPool.size();
}
lastBootstrapFailure(): string | null {
return this.bootstrapFailure;
}
/*
* Attach a new in-memory database for a base. Idempotent: if the schema
* is already attached, this is a no-op. Schema name must come from
* `baseSchemaName()` — validated by the caller; we check shape here
* as defense-in-depth.
*/
async attachBase(schema: string): Promise<void> {
this.requireReady();
this.requireSchemaShape(schema);
if (this.attachedSchemas.has(schema)) return;
await this.writer!.run(`ATTACH ':memory:' AS ${schema}`);
this.attachedSchemas.add(schema);
}
/*
* Detach an in-memory database. Idempotent: detaching a non-attached
* schema is a swallow. Frees all memory held by the attached DB back
* to the shared buffer pool.
*/
async detachBase(schema: string): Promise<void> {
if (!this.ready || !this.writer) return;
this.requireSchemaShape(schema);
if (!this.attachedSchemas.has(schema)) return;
try {
await this.writer.run(`DETACH DATABASE ${schema}`);
} catch (err) {
const msg = (err as Error).message ?? '';
if (!/not attached|does not exist|unknown database/i.test(msg)) {
throw err;
}
} finally {
this.attachedSchemas.delete(schema);
}
}
getWriter(): DuckDBConnection {
this.requireReady();
return this.writer!;
}
async withReader<T>(fn: (conn: DuckDBConnection) => Promise<T>): Promise<T> {
this.requireReady();
return this.readerPool.withResource(fn);
}
private requireReady(): void {
if (!this.ready || !this.writer) {
const detail = this.bootstrapFailure ? `: ${this.bootstrapFailure}` : '';
throw new Error(`DuckDbRuntime not ready${detail}`);
}
}
private requireSchemaShape(schema: string): void {
if (!/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(schema)) {
throw new Error(`Invalid schema name "${schema}"`);
}
}
}
function escapeSqlString(s: string): string {
return `'${s.replace(/'/g, "''")}'`;
}
@@ -0,0 +1,154 @@
import { buildLoaderSql } from './loader-sql';
import { ColumnSpec } from './query-cache.types';
import { BasePropertyType } from '../base.schemas';
const BASE_ID = '019c69a3-dd47-7014-8b87-ec8f1675aaaa';
const WORKSPACE_ID = '019c69a3-dd47-7014-8b87-ec8f1675bbbb';
const SCHEMA = 'b_019c69a3dd4770148b87ec8f1675aaaa';
const sys: ColumnSpec[] = [
{ column: 'id', ddlType: 'VARCHAR', indexable: false },
{ column: 'base_id', ddlType: 'VARCHAR', indexable: false },
{ column: 'workspace_id', ddlType: 'VARCHAR', indexable: false },
{ column: 'creator_id', ddlType: 'VARCHAR', indexable: false },
{ column: 'position', ddlType: 'VARCHAR', indexable: true },
{ column: 'created_at', ddlType: 'TIMESTAMPTZ', indexable: true },
{ column: 'updated_at', ddlType: 'TIMESTAMPTZ', indexable: true },
{ column: 'last_updated_by_id', ddlType: 'VARCHAR', indexable: true },
{ column: 'deleted_at', ddlType: 'TIMESTAMPTZ', indexable: false },
{ column: 'search_text', ddlType: 'VARCHAR', indexable: false },
];
const makeProp = (
id: string,
type: (typeof BasePropertyType)[keyof typeof BasePropertyType],
): ColumnSpec['property'] => ({ id, type, typeOptions: null } as any);
describe('buildLoaderSql', () => {
it('creates schema-qualified rows table and wraps the SELECT in postgres_query', () => {
const sql = buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, SCHEMA);
expect(sql).toContain(`CREATE TABLE ${SCHEMA}.rows AS`);
expect(sql).toContain("SELECT * FROM postgres_query('pg', $pgsql$");
expect(sql).toContain('FROM base_rows');
expect(sql).toContain(`WHERE base_id = '${BASE_ID}'::uuid`);
expect(sql).toContain(`AND workspace_id = '${WORKSPACE_ID}'::uuid`);
expect(sql).toContain('AND deleted_at IS NULL');
expect(sql).toContain('$pgsql$)');
});
it('projects system columns verbatim inside the inner SELECT', () => {
const sql = buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, SCHEMA);
expect(sql).toContain('id::text AS id');
expect(sql).toContain('base_id::text AS base_id');
expect(sql).toContain('position');
expect(sql).toContain("''::VARCHAR AS search_text");
});
it('maps TEXT -> base_cell_text with schema-qualified alias', () => {
const prop = makeProp('019c69a3-dd47-7014-8b87-ec8f167577aa', BasePropertyType.TEXT);
const sql = buildLoaderSql(
[...sys, { column: prop!.id, ddlType: 'VARCHAR', indexable: true, property: prop }],
BASE_ID,
WORKSPACE_ID,
SCHEMA,
);
expect(sql).toContain(
`base_cell_text(cells, '019c69a3-dd47-7014-8b87-ec8f167577aa'::uuid) AS "019c69a3-dd47-7014-8b87-ec8f167577aa"`,
);
});
it('maps NUMBER -> base_cell_numeric', () => {
const prop = makeProp('019c69a3-dd47-7014-8b87-ec8f167577bb', BasePropertyType.NUMBER);
const sql = buildLoaderSql(
[...sys, { column: prop!.id, ddlType: 'DOUBLE', indexable: true, property: prop }],
BASE_ID,
WORKSPACE_ID,
SCHEMA,
);
expect(sql).toContain(
`base_cell_numeric(cells, '019c69a3-dd47-7014-8b87-ec8f167577bb'::uuid) AS "019c69a3-dd47-7014-8b87-ec8f167577bb"`,
);
});
it('maps DATE -> base_cell_timestamptz', () => {
const prop = makeProp('019c69a3-dd47-7014-8b87-ec8f167577cc', BasePropertyType.DATE);
const sql = buildLoaderSql(
[...sys, { column: prop!.id, ddlType: 'TIMESTAMPTZ', indexable: true, property: prop }],
BASE_ID,
WORKSPACE_ID,
SCHEMA,
);
expect(sql).toContain(
`base_cell_timestamptz(cells, '019c69a3-dd47-7014-8b87-ec8f167577cc'::uuid) AS "019c69a3-dd47-7014-8b87-ec8f167577cc"`,
);
});
it('maps CHECKBOX -> base_cell_bool', () => {
const prop = makeProp('019c69a3-dd47-7014-8b87-ec8f167577dd', BasePropertyType.CHECKBOX);
const sql = buildLoaderSql(
[...sys, { column: prop!.id, ddlType: 'BOOLEAN', indexable: true, property: prop }],
BASE_ID,
WORKSPACE_ID,
SCHEMA,
);
expect(sql).toContain(
`base_cell_bool(cells, '019c69a3-dd47-7014-8b87-ec8f167577dd'::uuid) AS "019c69a3-dd47-7014-8b87-ec8f167577dd"`,
);
});
it('maps MULTI_SELECT (JSON) -> raw jsonb cast to text', () => {
const prop = makeProp('019c69a3-dd47-7014-8b87-ec8f167577ee', BasePropertyType.MULTI_SELECT);
const sql = buildLoaderSql(
[...sys, { column: prop!.id, ddlType: 'JSON', indexable: false, property: prop }],
BASE_ID,
WORKSPACE_ID,
SCHEMA,
);
expect(sql).toContain(
`(cells -> '019c69a3-dd47-7014-8b87-ec8f167577ee')::text AS "019c69a3-dd47-7014-8b87-ec8f167577ee"`,
);
});
it('rejects invalid column names', () => {
const bad: ColumnSpec = {
column: 'pwned"; DROP TABLE rows; --',
ddlType: 'VARCHAR',
indexable: false,
};
expect(() => buildLoaderSql([bad], BASE_ID, WORKSPACE_ID, SCHEMA)).toThrow(
/invalid column name/i,
);
});
it('rejects non-UUID property ids', () => {
const badProp = { id: 'not-a-uuid', type: BasePropertyType.TEXT, typeOptions: null } as any;
expect(() =>
buildLoaderSql(
[{ column: 'some-uuid-col', ddlType: 'VARCHAR', indexable: true, property: badProp }],
BASE_ID,
WORKSPACE_ID,
SCHEMA,
),
).toThrow(/invalid property uuid/i);
});
it('rejects invalid base id', () => {
expect(() => buildLoaderSql(sys, 'not-a-uuid', WORKSPACE_ID, SCHEMA)).toThrow(/invalid base id/i);
});
it('rejects invalid workspace id', () => {
expect(() => buildLoaderSql(sys, BASE_ID, 'not-a-uuid', SCHEMA)).toThrow(/invalid workspace id/i);
});
it('rejects invalid schema name', () => {
expect(() => buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, 'bad name')).toThrow(/invalid schema/i);
expect(() => buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, '1starts_with_digit')).toThrow(/invalid schema/i);
expect(() => buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, '')).toThrow(/invalid schema/i);
});
it('is deterministic', () => {
expect(buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, SCHEMA)).toEqual(
buildLoaderSql(sys, BASE_ID, WORKSPACE_ID, SCHEMA),
);
});
});
@@ -0,0 +1,110 @@
import { ColumnSpec } from './query-cache.types';
/*
* Pure SQL builder for the cold-load query executed against the process-wide
* DuckDB instance. The resulting SQL creates `<schema>.rows` inside the
* attached in-memory database for the base, populated from Postgres via the
* `postgres_query` function:
*
* CREATE TABLE <schema>.rows AS
* SELECT * FROM postgres_query('pg', $pgsql$ ... $pgsql$);
*
* The inner SQL uses the Postgres helper functions (`base_cell_text`,
* `base_cell_numeric`, `base_cell_timestamptz`, `base_cell_bool`) so JSONB
* extraction happens server-side.
*
* Callers must pass a validated `schema` name (use `baseSchemaName()`).
* Schema, baseId, and workspaceId are interpolated after validation: schema
* is regex-checked and baseId/workspaceId are UUID-validated.
*/
export function buildLoaderSql(
specs: ColumnSpec[],
baseId: string,
workspaceId: string,
schema: string,
): string {
if (!UUID.test(baseId)) {
throw new Error(`Invalid base id "${baseId}"`);
}
if (!UUID.test(workspaceId)) {
throw new Error(`Invalid workspace id "${workspaceId}"`);
}
validateSchema(schema);
const projections = specs.map((spec) => projectionFor(spec));
return [
`CREATE TABLE ${schema}.rows AS`,
"SELECT * FROM postgres_query('pg', $pgsql$",
' SELECT',
' ' + projections.join(',\n '),
' FROM base_rows',
` WHERE base_id = '${baseId}'::uuid`,
` AND workspace_id = '${workspaceId}'::uuid`,
' AND deleted_at IS NULL',
'$pgsql$)',
].join('\n');
}
function projectionFor(spec: ColumnSpec): string {
validateColumnName(spec.column);
const qid = `"${spec.column}"`;
switch (spec.column) {
case 'id': return 'id::text AS id';
case 'base_id': return 'base_id::text AS base_id';
case 'workspace_id': return 'workspace_id::text AS workspace_id';
case 'creator_id': return 'creator_id::text AS creator_id';
case 'position': return 'position';
case 'created_at': return 'created_at';
case 'updated_at': return 'updated_at';
case 'last_updated_by_id': return 'last_updated_by_id::text AS last_updated_by_id';
case 'deleted_at': return 'deleted_at';
case 'search_text': return "''::VARCHAR AS search_text";
}
const prop = spec.property;
if (!prop) {
throw new Error(
`ColumnSpec for "${spec.column}" has no property; cannot project`,
);
}
const id = prop.id;
if (!UUID.test(id)) {
throw new Error(`Invalid property UUID "${id}"`);
}
switch (spec.ddlType) {
case 'VARCHAR':
return `base_cell_text(cells, '${id}'::uuid) AS ${qid}`;
case 'DOUBLE':
return `base_cell_numeric(cells, '${id}'::uuid) AS ${qid}`;
case 'TIMESTAMPTZ':
return `base_cell_timestamptz(cells, '${id}'::uuid) AS ${qid}`;
case 'BOOLEAN':
return `base_cell_bool(cells, '${id}'::uuid) AS ${qid}`;
case 'JSON':
return `(cells -> '${id}')::text AS ${qid}`;
default: {
const _never: never = spec.ddlType;
throw new Error(`Unknown DuckDbDdlType: ${_never}`);
}
}
}
const UUID =
/^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$/;
const VALID_COL = /^[a-zA-Z0-9_\-]+$/;
function validateColumnName(name: string): void {
if (!VALID_COL.test(name)) {
throw new Error(`Invalid column name "${name}"`);
}
}
const VALID_SCHEMA = /^[a-zA-Z_][a-zA-Z0-9_]*$/;
function validateSchema(name: string): void {
if (!VALID_SCHEMA.test(name)) {
throw new Error(`Invalid schema name "${name}"`);
}
}
@@ -0,0 +1,964 @@
import { Test, TestingModule } from '@nestjs/testing';
import { ConfigModule } from '@nestjs/config';
import { KyselyModule, InjectKysely } from 'nestjs-kysely';
import { CamelCasePlugin } from 'kysely';
import { PostgresJSDialect } from 'kysely-postgres-js';
import * as postgres from 'postgres';
import { Injectable } from '@nestjs/common';
import { EventEmitterModule } from '@nestjs/event-emitter';
import { randomBytes } from 'node:crypto';
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
import { BaseRepo } from '@docmost/db/repos/base/base.repo';
import { BasePropertyRepo } from '@docmost/db/repos/base/base-property.repo';
import { BaseRowRepo } from '@docmost/db/repos/base/base-row.repo';
import { BaseViewRepo } from '@docmost/db/repos/base/base-view.repo';
import { KyselyDB } from '@docmost/db/types/kysely.types';
import { BaseQueryCacheService, CacheListOpts } from './base-query-cache.service';
import { QueryCacheConfigProvider } from './query-cache.config';
import { CollectionLoader } from './collection-loader';
import { DuckDbRuntime } from './duckdb-runtime';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
import { FilterNode, PropertySchema, SortSpec } from '../engine';
const INTEGRATION_DB_URL = process.env.INTEGRATION_DB_URL;
@Injectable()
class ParityEnvService {
getDatabaseURL() {
return INTEGRATION_DB_URL!;
}
getDatabaseMaxPool() {
return 5;
}
getNodeEnv() {
return 'test';
}
getBaseQueryCacheEnabled() {
return true;
}
getBaseQueryCacheMinRows() {
return 1;
}
getBaseQueryCacheMaxCollections() {
return 5;
}
getBaseQueryCacheWarmTopN() {
return 0;
}
getBaseQueryCacheDebug() {
return false;
}
getBaseQueryCacheMemoryLimit() {
return '128MB';
}
getBaseQueryCacheThreads() {
return 2;
}
getBaseQueryCacheReaderPoolSize() {
return 2;
}
getRedisUrl() {
return 'redis://localhost:6379';
}
}
@Injectable()
class DbHandle {
constructor(@InjectKysely() readonly db: KyselyDB) {}
}
function normalizePostgresUrl(url: string): string {
const parsed = new URL(url);
const newParams = new URLSearchParams();
for (const [key, value] of parsed.searchParams) {
if (key === 'sslmode' && value === 'no-verify') continue;
if (key === 'schema') continue;
newParams.append(key, value);
}
parsed.search = newParams.toString();
return parsed.toString();
}
const describeIntegration = INTEGRATION_DB_URL ? describe : describe.skip;
// Inline uuid7 so the spec file doesn't need to import the esm-only uuid
// package. Same pattern as seed-base.ts.
function uuid7(): string {
const now = BigInt(Date.now());
const bytes = randomBytes(16);
bytes[0] = Number((now >> 40n) & 0xffn);
bytes[1] = Number((now >> 32n) & 0xffn);
bytes[2] = Number((now >> 24n) & 0xffn);
bytes[3] = Number((now >> 16n) & 0xffn);
bytes[4] = Number((now >> 8n) & 0xffn);
bytes[5] = Number(now & 0xffn);
bytes[6] = (bytes[6] & 0x0f) | 0x70;
bytes[8] = (bytes[8] & 0x3f) | 0x80;
const hex = bytes.toString('hex');
return (
hex.slice(0, 8) +
'-' +
hex.slice(8, 12) +
'-' +
hex.slice(12, 16) +
'-' +
hex.slice(16, 20) +
'-' +
hex.slice(20, 32)
);
}
// Deterministic PRNG (mulberry32) for reproducible seeds across runs.
function makeRng(seed: number): () => number {
let s = seed >>> 0;
return () => {
s = (s + 0x6d2b79f5) >>> 0;
let t = s;
t = Math.imul(t ^ (t >>> 15), t | 1);
t ^= t + Math.imul(t ^ (t >>> 7), t | 61);
return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
};
}
type PropertyIds = {
name: string;
priority: string;
due: string;
done: string;
status: string;
tags: string;
};
type ParityFixture = {
baseId: string;
propertyIds: PropertyIds;
statusChoiceIds: string[];
tagIds: string[];
// Date used as a reference "now" for deterministic date fixtures.
nowMs: number;
schema: PropertySchema;
};
const ROWS = 10_000;
// Text pool — kept single-case so PG's default collation and DuckDB's
// bytewise collation agree on sort order. Mixed case causes the two
// engines to diverge on ties (kilo < LIMA bytewise, LIMA < kilo locale).
// That divergence is real and worth fixing at the engine level, but it's
// out of scope for this parity test.
const NAME_POOL = [
'alpha report',
'bravo update',
'charlie draft',
'delta review',
'echo analysis',
'foxtrot summary',
'golf proposal',
'hotel milestone',
'india objective',
'juliet strategy',
'kilo tango',
'lima uniform',
'mike final',
'november budget',
'oscar timeline',
];
async function seedParityBase(
db: KyselyDB,
workspaceId: string,
spaceId: string,
creatorUserId: string | null,
): Promise<Omit<ParityFixture, 'schema'>> {
// `as any` so this helper can use snake_case table/column names the same
// way seed-base.ts does — avoids fighting with CamelCasePlugin types.
const raw = db as any;
const rng = makeRng(42);
const baseId = uuid7();
const nowMs = Date.UTC(2026, 0, 1, 12, 0, 0);
// Property ids and status/tag choice ids chosen up-front so filter
// fixtures can reference them directly.
const nameId = uuid7();
const priorityId = uuid7();
const dueId = uuid7();
const doneId = uuid7();
const statusId = uuid7();
const tagsId = uuid7();
const statusChoiceIds = [uuid7(), uuid7(), uuid7(), uuid7(), uuid7()];
const statusChoices = statusChoiceIds.map((id, i) => ({
id,
name: `Status ${i}`,
color: 'gray',
}));
const tagIds = [
uuid7(),
uuid7(),
uuid7(),
uuid7(),
uuid7(),
uuid7(),
uuid7(),
uuid7(),
];
const tagChoices = tagIds.map((id, i) => ({
id,
name: `Tag ${i}`,
color: 'blue',
}));
await raw
.insertInto('bases')
.values({
id: baseId,
name: `parity-matrix-${Date.now()}`,
space_id: spaceId,
workspace_id: workspaceId,
creator_id: creatorUserId,
created_at: new Date(),
updated_at: new Date(),
} as any)
.execute();
const propertyRows: any[] = [];
let propPosition: string | null = null;
const addProp = (
id: string,
name: string,
type: string,
typeOptions: any = null,
isPrimary = false,
) => {
propPosition = generateJitteredKeyBetween(propPosition, null);
propertyRows.push({
id,
base_id: baseId,
name,
type,
position: propPosition,
type_options: typeOptions,
is_primary: isPrimary,
workspace_id: workspaceId,
created_at: new Date(),
updated_at: new Date(),
});
};
addProp(nameId, 'Name', 'text', null, true);
addProp(priorityId, 'Priority', 'number', { format: 'plain', precision: 0 });
addProp(dueId, 'Due', 'date', {
dateFormat: 'YYYY-MM-DD',
includeTime: false,
});
addProp(doneId, 'Done', 'checkbox');
addProp(statusId, 'Status', 'select', {
choices: statusChoices,
choiceOrder: statusChoiceIds,
});
addProp(tagsId, 'Tags', 'multiSelect', {
choices: tagChoices,
choiceOrder: tagIds,
});
await raw.insertInto('base_properties').values(propertyRows).execute();
// Seed a view so the base looks complete.
await raw
.insertInto('base_views')
.values({
id: uuid7(),
base_id: baseId,
name: 'Table',
type: 'table',
position: generateJitteredKeyBetween(null, null),
config: {},
workspace_id: workspaceId,
creator_id: creatorUserId,
created_at: new Date(),
updated_at: new Date(),
} as any)
.execute();
// Precompute positions as zero-padded digit strings. Both PG's default
// collation and DuckDB's bytewise collation agree on digit ordering,
// so position-tiebreak results are deterministic across engines. The
// library-generated fractional-index keys (`a01K6`, `a2BdW`, ...) mix
// case and re-order under locale-aware collation, which produces
// divergent id lists between PG's `ORDER BY position` and DuckDB's.
const positions: string[] = new Array(ROWS);
const pad = String(ROWS).length + 2;
for (let i = 0; i < ROWS; i++) {
positions[i] = String(i).padStart(pad, '0');
}
const DAY_MS = 24 * 60 * 60 * 1000;
const BATCH = 2000;
for (let start = 0; start < ROWS; start += BATCH) {
const end = Math.min(start + BATCH, ROWS);
const batch: any[] = [];
for (let i = start; i < end; i++) {
const cells: Record<string, unknown> = {};
// name: always set. NULLs in text sort keys round-trip fine through
// the `chr(1114111)` sentinel, but we leave non-NULL here so the
// flat-filter `isEmpty/isNotEmpty` tests have a deterministic zero
// count on the empty side (still exercised via ncontains etc.).
cells[nameId] = NAME_POOL[Math.floor(rng() * NAME_POOL.length)];
// priority: always set. NULLs on a numeric sort key leak through
// postgres.js's numeric parser (`'Infinity'::numeric` → NaN →
// cursor `''` → null-on-decode) and cause PG's keyset
// `applyCursor` to stall because `expr > NULL` is NULL. DuckDB has
// no such issue. Rather than relax the pagination-walk assertion
// we keep priorities non-NULL; isEmpty/isNotEmpty tests for
// numeric properties are out of the required matrix.
cells[priorityId] = Math.floor(rng() * 1000);
// due: null 5%, otherwise an ISO date within the last 90 days.
// NULLs are safe on the flat-filter path (sorts: []) and on the
// `due desc` multi-key sort because the '-infinity' sentinel sorts
// NULLs last — the page boundary never lands on an Invalid Date.
if (rng() < 0.05) {
cells[dueId] = null;
} else {
const offsetDays = Math.floor(rng() * 90);
const d = new Date(nowMs - offsetDays * DAY_MS);
cells[dueId] = d.toISOString();
}
// done: ~50/50 true/false, no nulls.
cells[doneId] = rng() < 0.5;
// status: uniform over 5 choices.
cells[statusId] =
statusChoiceIds[Math.floor(rng() * statusChoiceIds.length)];
// tags: 0..3 random distinct tag ids.
const tagCount = Math.floor(rng() * 4); // 0..3
if (tagCount === 0) {
cells[tagsId] = [];
} else {
const shuffled = [...tagIds].sort(() => rng() - 0.5);
cells[tagsId] = shuffled.slice(0, tagCount);
}
batch.push({
id: uuid7(),
base_id: baseId,
cells,
position: positions[i],
creator_id: creatorUserId,
workspace_id: workspaceId,
created_at: new Date(),
updated_at: new Date(),
});
}
await raw.insertInto('base_rows').values(batch).execute();
}
return {
baseId,
propertyIds: {
name: nameId,
priority: priorityId,
due: dueId,
done: doneId,
status: statusId,
tags: tagsId,
},
statusChoiceIds,
tagIds,
nowMs,
};
}
async function deleteParityBase(
db: KyselyDB,
baseId: string,
): Promise<void> {
const raw = db as any;
await raw.deleteFrom('base_rows').where('base_id', '=', baseId).execute();
await raw.deleteFrom('base_views').where('base_id', '=', baseId).execute();
await raw
.deleteFrom('base_properties')
.where('base_id', '=', baseId)
.execute();
await raw.deleteFrom('bases').where('id', '=', baseId).execute();
}
describeIntegration('BaseQueryCacheService ↔ Postgres parity matrix', () => {
let moduleRef: TestingModule;
let cache: BaseQueryCacheService;
let baseRowRepo: BaseRowRepo;
let dbHandle: DbHandle;
let fixture: ParityFixture;
let workspaceId: string;
beforeAll(async () => {
process.env.DATABASE_URL = INTEGRATION_DB_URL;
moduleRef = await Test.createTestingModule({
imports: [
ConfigModule.forRoot({ isGlobal: true }),
KyselyModule.forRoot({
dialect: new PostgresJSDialect({
postgres: (postgres as any)(
normalizePostgresUrl(INTEGRATION_DB_URL!),
{
max: 5,
onnotice: () => {},
types: {
bigint: {
to: 20,
from: [20, 1700],
serialize: (value: number) => value.toString(),
parse: (value: string) => Number.parseInt(value),
},
},
},
),
}),
plugins: [new CamelCasePlugin()],
}),
EventEmitterModule.forRoot(),
],
providers: [
{ provide: EnvironmentService, useClass: ParityEnvService },
QueryCacheConfigProvider,
DuckDbRuntime,
BaseRepo,
BasePropertyRepo,
BaseRowRepo,
BaseViewRepo,
CollectionLoader,
BaseQueryCacheService,
DbHandle,
],
}).compile();
await moduleRef.init();
cache = moduleRef.get(BaseQueryCacheService);
baseRowRepo = moduleRef.get(BaseRowRepo);
dbHandle = moduleRef.get(DbHandle);
const workspace = await dbHandle.db
.selectFrom('workspaces')
.select(['id'])
.limit(1)
.executeTakeFirstOrThrow();
workspaceId = workspace.id;
const space = await dbHandle.db
.selectFrom('spaces')
.select(['id'])
.where('workspaceId', '=', workspaceId)
.limit(1)
.executeTakeFirstOrThrow();
const spaceId = space.id;
const user = await dbHandle.db
.selectFrom('users')
.select('id')
.limit(1)
.executeTakeFirst();
const creatorUserId = user?.id ?? null;
const seeded = await seedParityBase(
dbHandle.db,
workspaceId,
spaceId,
creatorUserId,
);
const properties = await moduleRef
.get(BasePropertyRepo)
.findByBaseId(seeded.baseId);
const schema: PropertySchema = new Map(properties.map((p) => [p.id, p]));
fixture = { ...seeded, schema };
}, 300_000);
afterAll(async () => {
if (fixture?.baseId) {
await deleteParityBase(dbHandle.db, fixture.baseId);
}
if (moduleRef) {
await moduleRef.close();
}
}, 60_000);
// --- Helpers ---------------------------------------------------------
//
// The cache service takes `CacheListOpts` directly; the Postgres repo
// takes a super-set with `baseId` / `workspaceId`. Both share the same
// filter/sort/schema/pagination contract, so `runQuery` fans out over
// a single logical query shape.
type ParityQuery = {
filter?: FilterNode;
sorts?: SortSpec[];
limit?: number;
cursor?: string;
};
async function runCache(q: ParityQuery) {
const opts: CacheListOpts = {
filter: q.filter,
sorts: q.sorts,
schema: fixture.schema,
pagination: {
limit: q.limit ?? 50,
cursor: q.cursor,
} as any,
};
return cache.list(fixture.baseId, workspaceId, opts);
}
async function runPg(q: ParityQuery) {
return baseRowRepo.list({
baseId: fixture.baseId,
workspaceId,
filter: q.filter,
sorts: q.sorts,
schema: fixture.schema,
pagination: {
limit: q.limit ?? 50,
cursor: q.cursor,
} as any,
});
}
async function assertParity(
q: ParityQuery,
opts: { strictCursor?: boolean } = {},
): Promise<void> {
const { strictCursor = true } = opts;
const [cacheRes, pgRes] = await Promise.all([runCache(q), runPg(q)]);
const cacheIds = cacheRes.items.map((r) => r.id);
const pgIds = pgRes.items.map((r) => r.id);
expect(cacheIds).toEqual(pgIds);
expect(cacheRes.meta.hasNextPage).toBe(pgRes.meta.hasNextPage);
expect(cacheRes.meta.hasPrevPage).toBe(pgRes.meta.hasPrevPage);
if (strictCursor) {
expect(cacheRes.meta.nextCursor).toBe(pgRes.meta.nextCursor);
expect(cacheRes.meta.prevCursor).toBe(pgRes.meta.prevCursor);
}
}
async function paginateAll(
q: ParityQuery,
via: 'cache' | 'postgres',
): Promise<string[]> {
const ids: string[] = [];
let cursor: string | undefined;
const run = via === 'cache' ? runCache : runPg;
for (;;) {
const page = await run({ ...q, cursor });
for (const item of page.items) ids.push(item.id);
if (!page.meta.hasNextPage || !page.meta.nextCursor) break;
cursor = page.meta.nextCursor;
}
return ids;
}
// --- Flat filters (~25 cases) ----------------------------------------
//
// Test data uses a reference `nowMs = 2026-01-01T12:00:00Z` with dates
// distributed across the prior 90 days; the date fixtures pick a
// midpoint so before/after/onOrBefore/onOrAfter each partition the data.
const DAY_MS = 24 * 60 * 60 * 1000;
type FlatCase = { label: string; filter: FilterNode };
const flatCases = (): FlatCase[] => {
const f = fixture;
const midDate = new Date(f.nowMs - 45 * DAY_MS).toISOString();
const tagSingle = [f.tagIds[0]];
const tagPair = [f.tagIds[0], f.tagIds[1]];
return [
// TEXT
{
label: 'text eq',
filter: { propertyId: f.propertyIds.name, op: 'eq', value: 'alpha report' },
},
{
label: 'text neq',
filter: { propertyId: f.propertyIds.name, op: 'neq', value: 'alpha report' },
},
{
label: 'text contains',
filter: { propertyId: f.propertyIds.name, op: 'contains', value: 'alpha' },
},
{
label: 'text ncontains',
filter: { propertyId: f.propertyIds.name, op: 'ncontains', value: 'alpha' },
},
{
label: 'text startsWith',
filter: { propertyId: f.propertyIds.name, op: 'startsWith', value: 'bravo' },
},
{
label: 'text endsWith',
filter: { propertyId: f.propertyIds.name, op: 'endsWith', value: 'report' },
},
{
label: 'text isEmpty',
filter: { propertyId: f.propertyIds.name, op: 'isEmpty' },
},
{
label: 'text isNotEmpty',
filter: { propertyId: f.propertyIds.name, op: 'isNotEmpty' },
},
// NUMBER
{
label: 'number eq',
filter: { propertyId: f.propertyIds.priority, op: 'eq', value: 42 },
},
{
label: 'number gt',
filter: { propertyId: f.propertyIds.priority, op: 'gt', value: 500 },
},
{
label: 'number gte',
filter: { propertyId: f.propertyIds.priority, op: 'gte', value: 500 },
},
{
label: 'number lt',
filter: { propertyId: f.propertyIds.priority, op: 'lt', value: 100 },
},
{
label: 'number lte',
filter: { propertyId: f.propertyIds.priority, op: 'lte', value: 100 },
},
{
label: 'number neq',
filter: { propertyId: f.propertyIds.priority, op: 'neq', value: 42 },
},
// DATE
{
label: 'date before',
filter: { propertyId: f.propertyIds.due, op: 'before', value: midDate },
},
{
label: 'date after',
filter: { propertyId: f.propertyIds.due, op: 'after', value: midDate },
},
{
label: 'date onOrBefore',
filter: { propertyId: f.propertyIds.due, op: 'onOrBefore', value: midDate },
},
{
label: 'date onOrAfter',
filter: { propertyId: f.propertyIds.due, op: 'onOrAfter', value: midDate },
},
// CHECKBOX
{
label: 'checkbox eq true',
filter: { propertyId: f.propertyIds.done, op: 'eq', value: true },
},
{
label: 'checkbox eq false',
filter: { propertyId: f.propertyIds.done, op: 'eq', value: false },
},
// SELECT
{
label: 'select eq',
filter: {
propertyId: f.propertyIds.status,
op: 'eq',
value: f.statusChoiceIds[0],
},
},
{
label: 'select neq',
filter: {
propertyId: f.propertyIds.status,
op: 'neq',
value: f.statusChoiceIds[0],
},
},
// MULTI_SELECT
{
label: 'multi any (1 tag)',
filter: {
propertyId: f.propertyIds.tags,
op: 'any',
value: tagSingle,
},
},
{
label: 'multi any (2 tags)',
filter: {
propertyId: f.propertyIds.tags,
op: 'any',
value: tagPair,
},
},
{
label: 'multi all (2 tags)',
filter: {
propertyId: f.propertyIds.tags,
op: 'all',
value: tagPair,
},
},
{
label: 'multi none (2 tags)',
filter: {
propertyId: f.propertyIds.tags,
op: 'none',
value: tagPair,
},
},
];
};
// Lazy wrapper: `flatCases()` reads `fixture`, which is populated in
// `beforeAll`. Jest evaluates `it.each` parameters at collect-time, so
// we build the case list inside a top-level describe that Jest re-enters
// after beforeAll. Workaround: build a static placeholder and branch on
// fixture availability at runtime.
it.each([
'text eq',
'text neq',
'text contains',
'text ncontains',
'text startsWith',
'text endsWith',
'text isEmpty',
'text isNotEmpty',
'number eq',
'number gt',
'number gte',
'number lt',
'number lte',
'number neq',
'date before',
'date after',
'date onOrBefore',
'date onOrAfter',
'checkbox eq true',
'checkbox eq false',
'select eq',
'select neq',
'multi any (1 tag)',
'multi any (2 tags)',
'multi all (2 tags)',
'multi none (2 tags)',
])('flat filter: %s', async (label) => {
const c = flatCases().find((x) => x.label === label);
if (!c) throw new Error(`Missing flat case: ${label}`);
await assertParity({ filter: c.filter, sorts: [] });
}, 60_000);
// --- Nested boolean trees (4 cases) ---------------------------------
it(
'nested: A AND B',
async () => {
const f = fixture;
const filter: FilterNode = {
op: 'and',
children: [
{ propertyId: f.propertyIds.done, op: 'eq', value: false },
{ propertyId: f.propertyIds.priority, op: 'gt', value: 500 },
],
};
await assertParity({ filter, sorts: [] });
},
60_000,
);
it(
'nested: A OR B',
async () => {
const f = fixture;
const filter: FilterNode = {
op: 'or',
children: [
{
propertyId: f.propertyIds.status,
op: 'eq',
value: f.statusChoiceIds[0],
},
{
propertyId: f.propertyIds.status,
op: 'eq',
value: f.statusChoiceIds[1],
},
],
};
await assertParity({ filter, sorts: [] });
},
60_000,
);
it(
'nested: (A AND B) OR (C AND D)',
async () => {
const f = fixture;
const DAY = 24 * 60 * 60 * 1000;
const someDate = new Date(f.nowMs - 60 * DAY).toISOString();
const filter: FilterNode = {
op: 'or',
children: [
{
op: 'and',
children: [
{ propertyId: f.propertyIds.done, op: 'eq', value: true },
{ propertyId: f.propertyIds.priority, op: 'lt', value: 100 },
],
},
{
op: 'and',
children: [
{ propertyId: f.propertyIds.done, op: 'eq', value: false },
{
propertyId: f.propertyIds.due,
op: 'before',
value: someDate,
},
],
},
],
};
await assertParity({ filter, sorts: [] });
},
60_000,
);
it(
'nested: max-depth 5-level left-skewed tree completes under soft budget',
async () => {
const f = fixture;
// 5-level left-skewed: root AND with a leaf + AND with a leaf + ...
// Each internal node has one leaf child and one group child. Tree
// depth is MAX_FILTER_DEPTH (5); every condition filters ≥80% of
// rows so the combined predicate returns a small result set.
const leaf = (): FilterNode => ({
propertyId: f.propertyIds.done,
op: 'eq',
value: true,
});
const filter: FilterNode = {
op: 'and',
children: [
leaf(),
{
op: 'and',
children: [
leaf(),
{
op: 'and',
children: [
leaf(),
{
op: 'and',
children: [
leaf(),
{
op: 'and',
children: [leaf()],
},
],
},
],
},
],
},
],
};
// Prime the cache so we're measuring the filter path, not the load.
await runCache({ sorts: [] });
// Smoke-check cache latency: 5-level filter on 10K rows should be
// fast. 1000ms is a loose bound to absorb slow CI hosts; the point
// is to catch O(N^2) regressions, not benchmark.
const tStart = Date.now();
await runCache({ filter, sorts: [] });
const cacheMs = Date.now() - tStart;
expect(cacheMs).toBeLessThan(1000);
// Full parity check (fans out to both engines).
await assertParity({ filter, sorts: [] });
},
60_000,
);
// --- Multi-key sorts (3 cases) ---------------------------------------
//
// All sort keys here hold real values at page-1 boundaries:
// - priority is always set (no NULLs by design — see seed).
// - due can be NULL 5% of the time but the `-infinity` sentinel
// sorts NULLs last on DESC, so the first 50 rows' due values are
// all real dates.
// - name is always set and lowercase, so bytewise (DuckDB) and
// locale (PG default) collations agree.
it.each([
{
label: 'priority desc',
sorts: (): SortSpec[] => [
{ propertyId: fixture.propertyIds.priority, direction: 'desc' },
],
},
{
label: 'priority asc, name asc',
sorts: (): SortSpec[] => [
{ propertyId: fixture.propertyIds.priority, direction: 'asc' },
{ propertyId: fixture.propertyIds.name, direction: 'asc' },
],
},
{
label: 'due desc, priority desc, name asc',
sorts: (): SortSpec[] => [
{ propertyId: fixture.propertyIds.due, direction: 'desc' },
{ propertyId: fixture.propertyIds.priority, direction: 'desc' },
{ propertyId: fixture.propertyIds.name, direction: 'asc' },
],
},
])('multi-key sort: $label', async ({ sorts }) => {
await assertParity({ sorts: sorts() });
}, 60_000);
// --- Filter + sort + pagination walk --------------------------------
it(
'filter + sort + pagination walk produces identical id lists with no duplicates',
async () => {
const f = fixture;
const filter: FilterNode = {
op: 'and',
children: [
{ propertyId: f.propertyIds.done, op: 'eq', value: false },
],
};
const sorts: SortSpec[] = [
{ propertyId: f.propertyIds.priority, direction: 'desc' },
{ propertyId: f.propertyIds.name, direction: 'asc' },
];
const cacheIds = await paginateAll({ filter, sorts, limit: 200 }, 'cache');
const pgIds = await paginateAll({ filter, sorts, limit: 200 }, 'postgres');
// DuckDB must emit no duplicates.
expect(new Set(cacheIds).size).toBe(cacheIds.length);
// Both engines paginate through the same rows in the same order.
// priority and name are NULL-free by seed design and position is
// digit-only so collation doesn't diverge at the tail tiebreak.
expect(cacheIds).toEqual(pgIds);
},
180_000,
);
});
@@ -0,0 +1,32 @@
import { Injectable } from '@nestjs/common';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
export type QueryCacheConfig = {
enabled: boolean;
minRows: number;
maxCollections: number;
warmTopN: number;
memoryLimit: string;
threads: number;
trace: boolean;
tempDirectory: string;
readerPoolSize: number;
};
@Injectable()
export class QueryCacheConfigProvider {
readonly config: QueryCacheConfig;
constructor(env: EnvironmentService) {
this.config = {
enabled: env.getBaseQueryCacheEnabled(),
minRows: env.getBaseQueryCacheMinRows(),
maxCollections: env.getBaseQueryCacheMaxCollections(),
warmTopN: env.getBaseQueryCacheWarmTopN(),
memoryLimit: env.getBaseQueryCacheMemoryLimit(),
threads: env.getBaseQueryCacheThreads(),
trace: env.getBaseQueryCacheTrace(),
tempDirectory: env.getBaseQueryCacheTempDirectory(),
readerPoolSize: env.getBaseQueryCacheReaderPoolSize(),
};
}
}
@@ -0,0 +1,27 @@
import { Module } from '@nestjs/common';
import { QueryCacheConfigProvider } from './query-cache.config';
import { DuckDbRuntime } from './duckdb-runtime';
import { BaseQueryCacheService } from './base-query-cache.service';
import { BaseQueryRouter } from './base-query-router';
import { CollectionLoader } from './collection-loader';
import { BaseQueryCacheWriteConsumer } from './base-query-cache.write-consumer';
import { BaseQueryCacheSubscriber } from './base-query-cache.subscriber';
@Module({
providers: [
QueryCacheConfigProvider,
DuckDbRuntime,
CollectionLoader,
BaseQueryCacheService,
BaseQueryRouter,
BaseQueryCacheWriteConsumer,
BaseQueryCacheSubscriber,
],
exports: [
BaseQueryCacheService,
BaseQueryRouter,
DuckDbRuntime,
QueryCacheConfigProvider,
],
})
export class QueryCacheModule {}
@@ -0,0 +1,49 @@
import type { BaseProperty } from '@docmost/db/types/entity.types';
export type DuckDbColumnType =
| 'VARCHAR'
| 'DOUBLE'
| 'BOOLEAN'
| 'TIMESTAMPTZ'
| 'JSON';
export type ColumnSpec = {
/*
* The uuid of the property (user-defined props) or a stable literal
* ('id', 'position', 'created_at', 'updated_at', 'last_updated_by_id',
* 'deleted_at', 'search_text') for system columns.
*/
column: string;
ddlType: DuckDbColumnType;
indexable: boolean;
property?: Pick<BaseProperty, 'id' | 'type' | 'typeOptions'>;
};
/*
* A base held in the shared DuckDB instance. Instead of owning a
* `DuckDBInstance` and `DuckDBConnection`, it now just remembers the schema
* name of its attached in-memory database. The runtime owns the actual
* connections; this is pure metadata.
*/
export type LoadedCollection = {
baseId: string;
schema: string; // e.g. "b_019c69a51d847985a7f68ee2871d8669"
schemaVersion: number;
columns: ColumnSpec[];
lastAccessedAt: number;
rowCount: number;
/*
* Estimated in-memory footprint, in bytes. DuckDB does not expose
* per-attached-db memory accounting, so this is a rough heuristic
* computed at load time: rowCount × columns.length × ~64 bytes. Used
* for cache-size reporting; not for eviction decisions.
*/
approxBytes: number;
};
export type ChangeEnvelope =
| { kind: 'row-upsert'; baseId: string; row: Record<string, unknown> }
| { kind: 'row-delete'; baseId: string; rowId: string }
| { kind: 'rows-delete'; baseId: string; rowIds: string[] }
| { kind: 'row-reorder'; baseId: string; rowId: string; position: string }
| { kind: 'schema-invalidate'; baseId: string; schemaVersion: number };
@@ -0,0 +1,34 @@
import { baseSchemaName } from './schema-name';
describe('baseSchemaName', () => {
it('converts a uuid to a DuckDB-safe identifier with a b_ prefix', () => {
expect(baseSchemaName('019c69a5-1d84-7985-a7f6-8ee2871d8669')).toBe(
'b_019c69a51d847985a7f68ee2871d8669',
);
});
it('rejects a non-uuid string (preserves the quoting contract)', () => {
expect(() => baseSchemaName('not-a-uuid')).toThrow(/invalid base id/i);
expect(() => baseSchemaName('')).toThrow(/invalid base id/i);
expect(() => baseSchemaName('b_019c69a5; DROP TABLE rows; --')).toThrow(
/invalid base id/i,
);
});
it('is deterministic', () => {
const id = '019c70b3-dd47-7014-8b87-ec8f167577ee';
expect(baseSchemaName(id)).toBe(baseSchemaName(id));
});
it('accepts mixed-case hex and normalises to lowercase', () => {
expect(baseSchemaName('019C69A5-1D84-7985-A7F6-8EE2871D8669')).toBe(
'b_019c69a51d847985a7f68ee2871d8669',
);
});
it('produces names that parse as SQL identifiers without quoting', () => {
const name = baseSchemaName('019c69a5-1d84-7985-a7f6-8ee2871d8669');
// Must match DuckDB's unquoted-identifier grammar: [a-zA-Z_][a-zA-Z0-9_]*
expect(name).toMatch(/^[a-zA-Z_][a-zA-Z0-9_]*$/);
});
});
@@ -0,0 +1,31 @@
// Matches the UUID regex pattern in `loader-sql.ts`. We use a handwritten
// regex rather than importing `validate` from the `uuid` package because
// that package is ESM-only and Jest's ts-jest config cannot transform it
// in this repo.
const UUID =
/^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$/;
const UUID_DASHES = /-/g;
/*
* Turns a base UUID into a DuckDB-safe schema name.
*
* '019c69a5-1d84-7985-a7f6-8ee2871d8669'
* -> 'b_019c69a51d847985a7f68ee2871d8669'
*
* The `b_` prefix is required because DuckDB unquoted identifiers must start
* with a letter or underscore — a bare hex UUID starts with a digit and would
* have to be double-quoted everywhere. The strip-dashes step makes the rest
* of the identifier hex-only, which is always safe.
*
* All attached database names, `DETACH DATABASE` targets, and schema-qualified
* references (`<schema>.rows`) run through this function. Validation is
* strict: if the input isn't a real UUID, we throw rather than produce a
* "safe-looking" identifier that might leak through to user-facing SQL.
*/
export function baseSchemaName(baseId: string): string {
if (!UUID.test(baseId)) {
throw new Error(`Invalid base id "${baseId}"`);
}
return `b_${baseId.toLowerCase().replace(UUID_DASHES, '')}`;
}
@@ -0,0 +1,411 @@
import type { Kysely } from 'kysely';
import { randomBytes } from 'node:crypto';
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
// Minimal RFC 9562 uuid7. We inline instead of importing `uuid@13` because
// that package is ESM-only and this module is loaded by jest (CommonJS) in
// the integration spec.
function uuid7(): string {
const now = BigInt(Date.now());
const bytes = randomBytes(16);
bytes[0] = Number((now >> 40n) & 0xffn);
bytes[1] = Number((now >> 32n) & 0xffn);
bytes[2] = Number((now >> 24n) & 0xffn);
bytes[3] = Number((now >> 16n) & 0xffn);
bytes[4] = Number((now >> 8n) & 0xffn);
bytes[5] = Number(now & 0xffn);
bytes[6] = (bytes[6] & 0x0f) | 0x70; // version 7
bytes[8] = (bytes[8] & 0x3f) | 0x80; // variant
const hex = bytes.toString('hex');
return (
hex.slice(0, 8) +
'-' +
hex.slice(8, 12) +
'-' +
hex.slice(12, 16) +
'-' +
hex.slice(16, 20) +
'-' +
hex.slice(20, 32)
);
}
export type SeedBaseOptions = {
db: Kysely<any>;
workspaceId: string;
spaceId: string;
creatorUserId: string | null;
rows: number;
name?: string;
};
export type SeededBase = {
baseId: string;
propertyIds: {
title: string;
status: string;
priority: string;
category: string;
tags: string;
dueDate: string;
estimate: string;
budget: string;
approved: string;
website: string;
contactEmail: string;
notes: string;
created: string;
lastEdited: string;
// Generic aliases used by parity tests.
text: string;
number: string;
date: string;
};
statusChoiceIds: string[];
};
const SKIP_TYPES = new Set([
'createdAt',
'lastEditedAt',
'lastEditedBy',
'person',
'file',
]);
const WORDS = [
'Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo', 'Foxtrot', 'Golf',
'Hotel', 'India', 'Juliet', 'Kilo', 'Lima', 'Mike', 'November',
'Oscar', 'Papa', 'Quebec', 'Romeo', 'Sierra', 'Tango', 'Uniform',
'Victor', 'Whiskey', 'X-ray', 'Yankee', 'Zulu', 'Report', 'Analysis',
'Summary', 'Review', 'Update', 'Draft', 'Final', 'Proposal', 'Budget',
'Timeline', 'Milestone', 'Objective', 'Strategy', 'Initiative',
];
const COLORS = [
'red', 'orange', 'yellow', 'green', 'blue', 'purple', 'pink', 'gray',
];
// Deterministic RNG (mulberry32) so tests are reproducible.
function makeRng(seed: number): () => number {
let s = seed >>> 0;
return () => {
s = (s + 0x6d2b79f5) >>> 0;
let t = s;
t = Math.imul(t ^ (t >>> 15), t | 1);
t ^= t + Math.imul(t ^ (t >>> 7), t | 61);
return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
};
}
function hashSeed(input: string): number {
let h = 2166136261;
for (let i = 0; i < input.length; i++) {
h ^= input.charCodeAt(i);
h = Math.imul(h, 16777619);
}
return h >>> 0;
}
function randomWords(rng: () => number, min: number, max: number): string {
const count = min + Math.floor(rng() * (max - min + 1));
const result: string[] = [];
for (let i = 0; i < count; i++) {
result.push(WORDS[Math.floor(rng() * WORDS.length)]);
}
return result.join(' ');
}
function makeChoices(names: string[]) {
return names.map((name, i) => ({
id: uuid7(),
name,
color: COLORS[i % COLORS.length],
}));
}
function makeStatusChoices() {
const todo = [
{ id: uuid7(), name: 'Not Started', color: 'gray', category: 'todo' },
];
const inProgress = [
{ id: uuid7(), name: 'In Progress', color: 'blue', category: 'inProgress' },
{ id: uuid7(), name: 'In Review', color: 'purple', category: 'inProgress' },
];
const complete = [
{ id: uuid7(), name: 'Done', color: 'green', category: 'complete' },
{ id: uuid7(), name: 'Cancelled', color: 'red', category: 'complete' },
];
const all = [...todo, ...inProgress, ...complete];
return { choices: all, choiceOrder: all.map((c) => c.id) };
}
type PropertyDef = {
name: string;
type: string;
isPrimary?: boolean;
typeOptions?: any;
};
function buildPropertyDefinitions(): PropertyDef[] {
const priorityChoices = makeChoices(['Low', 'Medium', 'High', 'Critical']);
const categoryChoices = makeChoices([
'Engineering',
'Design',
'Marketing',
'Sales',
'Support',
'Operations',
]);
const tagChoices = makeChoices([
'Bug',
'Feature',
'Improvement',
'Documentation',
'Research',
]);
const statusOpts = makeStatusChoices();
return [
{ name: 'Title', type: 'text', isPrimary: true },
{ name: 'Status', type: 'status', typeOptions: statusOpts },
{
name: 'Priority',
type: 'select',
typeOptions: {
choices: priorityChoices,
choiceOrder: priorityChoices.map((c) => c.id),
},
},
{
name: 'Category',
type: 'select',
typeOptions: {
choices: categoryChoices,
choiceOrder: categoryChoices.map((c) => c.id),
},
},
{
name: 'Tags',
type: 'multiSelect',
typeOptions: {
choices: tagChoices,
choiceOrder: tagChoices.map((c) => c.id),
},
},
{
name: 'Due Date',
type: 'date',
typeOptions: { dateFormat: 'YYYY-MM-DD', includeTime: false },
},
{
name: 'Estimate',
type: 'number',
typeOptions: { format: 'plain', precision: 1 },
},
{
name: 'Budget',
type: 'number',
typeOptions: { format: 'currency', precision: 2, currencySymbol: '$' },
},
{ name: 'Approved', type: 'checkbox' },
{ name: 'Website', type: 'url' },
{ name: 'Contact Email', type: 'email' },
{ name: 'Notes', type: 'text' },
{ name: 'Created', type: 'createdAt' },
{ name: 'Last Edited', type: 'lastEditedAt' },
];
}
type CellGenerator = () => unknown;
function buildCellGenerator(
property: any,
rng: () => number,
): CellGenerator | null {
if (SKIP_TYPES.has(property.type)) return null;
const typeOptions = property.type_options ?? property.typeOptions;
switch (property.type) {
case 'text':
return () => randomWords(rng, 2, 6);
case 'number':
return () => Math.round(rng() * 10000 * 100) / 100;
case 'select':
case 'status': {
const choices = typeOptions?.choices ?? [];
if (choices.length === 0) return null;
return () => choices[Math.floor(rng() * choices.length)].id;
}
case 'multiSelect': {
const choices = typeOptions?.choices ?? [];
if (choices.length === 0) return () => [];
return () => {
const count = 1 + Math.floor(rng() * Math.min(3, choices.length));
const shuffled = [...choices].sort(() => rng() - 0.5);
return shuffled.slice(0, count).map((c: any) => c.id);
};
}
case 'date': {
const start = new Date(2020, 0, 1).getTime();
const range = new Date(2026, 0, 1).getTime() - start;
return () => new Date(start + rng() * range).toISOString();
}
case 'checkbox':
return () => rng() > 0.5;
case 'url':
return () => `https://example.com/page/${Math.floor(rng() * 100000)}`;
case 'email':
return () => `user${Math.floor(rng() * 100000)}@example.com`;
default:
return null;
}
}
export async function seedBase(opts: SeedBaseOptions): Promise<SeededBase> {
const { db, workspaceId, spaceId, creatorUserId, rows } = opts;
const baseName =
opts.name ??
`Seed Base ${rows >= 1000 ? `${Math.round(rows / 1000)}K` : `${rows}`} rows`;
const rng = makeRng(hashSeed(`${baseName}:${rows}`));
const baseId = uuid7();
await db
.insertInto('bases')
.values({
id: baseId,
name: baseName,
space_id: spaceId,
workspace_id: workspaceId,
creator_id: creatorUserId,
created_at: new Date(),
updated_at: new Date(),
})
.execute();
const propertyDefs = buildPropertyDefinitions();
let propPosition: string | null = null;
const insertedProperties: any[] = [];
for (const def of propertyDefs) {
propPosition = generateJitteredKeyBetween(propPosition, null);
insertedProperties.push({
id: uuid7(),
base_id: baseId,
name: def.name,
type: def.type,
position: propPosition,
type_options: def.typeOptions ?? null,
is_primary: def.isPrimary ?? false,
workspace_id: workspaceId,
created_at: new Date(),
updated_at: new Date(),
});
}
await db.insertInto('base_properties').values(insertedProperties).execute();
const viewId = uuid7();
await db
.insertInto('base_views')
.values({
id: viewId,
base_id: baseId,
name: 'Table View 1',
type: 'table',
position: generateJitteredKeyBetween(null, null),
config: {},
workspace_id: workspaceId,
creator_id: creatorUserId,
created_at: new Date(),
updated_at: new Date(),
})
.execute();
const byName = new Map(insertedProperties.map((p) => [p.name, p.id]));
const propertyIds: SeededBase['propertyIds'] = {
title: byName.get('Title')!,
status: byName.get('Status')!,
priority: byName.get('Priority')!,
category: byName.get('Category')!,
tags: byName.get('Tags')!,
dueDate: byName.get('Due Date')!,
estimate: byName.get('Estimate')!,
budget: byName.get('Budget')!,
approved: byName.get('Approved')!,
website: byName.get('Website')!,
contactEmail: byName.get('Contact Email')!,
notes: byName.get('Notes')!,
created: byName.get('Created')!,
lastEdited: byName.get('Last Edited')!,
text: byName.get('Title')!,
number: byName.get('Estimate')!,
date: byName.get('Due Date')!,
};
const statusProp = insertedProperties.find((p) => p.name === 'Status');
const statusChoiceIds: string[] =
(statusProp?.type_options?.choices ?? []).map((c: any) => c.id);
const generators: Array<{ propertyId: string; generate: CellGenerator }> = [];
for (const prop of insertedProperties) {
const gen = buildCellGenerator(prop, rng);
if (gen) {
generators.push({ propertyId: prop.id, generate: gen });
}
}
const positions: string[] = new Array(rows);
let lastPosition: string | null = null;
for (let i = 0; i < rows; i++) {
lastPosition = generateJitteredKeyBetween(lastPosition, null);
positions[i] = lastPosition;
}
const BATCH_SIZE = 2000;
for (let batchStart = 0; batchStart < rows; batchStart += BATCH_SIZE) {
const batchEnd = Math.min(batchStart + BATCH_SIZE, rows);
const rowsBatch: any[] = [];
for (let i = batchStart; i < batchEnd; i++) {
const cells: Record<string, unknown> = {};
for (const { propertyId, generate } of generators) {
cells[propertyId] = generate();
}
rowsBatch.push({
id: uuid7(),
base_id: baseId,
cells,
position: positions[i],
creator_id: creatorUserId,
workspace_id: workspaceId,
created_at: new Date(),
updated_at: new Date(),
});
}
await db.insertInto('base_rows').values(rowsBatch).execute();
}
return { baseId, propertyIds, statusChoiceIds };
}
export async function deleteSeededBase(
db: Kysely<any>,
baseId: string,
): Promise<void> {
await db.deleteFrom('base_rows').where('base_id', '=', baseId).execute();
await db.deleteFrom('base_views').where('base_id', '=', baseId).execute();
await db
.deleteFrom('base_properties')
.where('base_id', '=', baseId)
.execute();
await db.deleteFrom('bases').where('id', '=', baseId).execute();
}
@@ -1,6 +1,7 @@
import {
BadRequestException,
Injectable,
Logger,
NotFoundException,
} from '@nestjs/common';
import { InjectKysely } from 'nestjs-kysely';
@@ -9,6 +10,8 @@ import { KyselyDB } from '@docmost/db/types/kysely.types';
import { BaseRowRepo } from '@docmost/db/repos/base/base-row.repo';
import { BasePropertyRepo } from '@docmost/db/repos/base/base-property.repo';
import { BaseViewRepo } from '@docmost/db/repos/base/base-view.repo';
import { BaseQueryRouter } from '../query-cache/base-query-router';
import { BaseQueryCacheService } from '../query-cache/base-query-cache.service';
import { CreateRowDto } from '../dto/create-row.dto';
import {
UpdateRowDto,
@@ -41,15 +44,21 @@ import {
BaseRowUpdatedEvent,
BaseRowsDeletedEvent,
} from '../events/base-events';
import { EnvironmentService } from '../../../integrations/environment/environment.service';
@Injectable()
export class BaseRowService {
private readonly logger = new Logger(BaseRowService.name);
constructor(
@InjectKysely() private readonly db: KyselyDB,
private readonly baseRowRepo: BaseRowRepo,
private readonly basePropertyRepo: BasePropertyRepo,
private readonly baseViewRepo: BaseViewRepo,
private readonly eventEmitter: EventEmitter2,
private readonly queryRouter: BaseQueryRouter,
private readonly queryCache: BaseQueryCacheService,
private readonly env: EnvironmentService,
) {}
async create(userId: string, workspaceId: string, dto: CreateRowDto) {
@@ -190,6 +199,9 @@ export class BaseRowService {
pagination: PaginationOptions,
workspaceId: string,
) {
const debug = this.env.getBaseQueryCacheDebug();
const tStart = debug ? Date.now() : 0;
const properties = await this.basePropertyRepo.findByBaseId(dto.baseId);
const schema: PropertySchema = new Map(
properties.map((p) => [p.id, p]),
@@ -202,7 +214,56 @@ export class BaseRowService {
direction: s.direction,
}));
return this.baseRowRepo.list({
const tRouter = debug ? Date.now() : 0;
const decision = await this.queryRouter.decide({
baseId: dto.baseId,
workspaceId,
filter,
sorts,
search,
});
const routerMs = debug ? Date.now() - tRouter : 0;
let resultPath: 'cache' | 'postgres' | 'fallback' = 'postgres';
if (decision === 'cache') {
try {
const tCache = debug ? Date.now() : 0;
const result = await this.queryCache.list(dto.baseId, workspaceId, {
filter,
sorts,
search,
schema,
pagination,
});
const cacheMs = debug ? Date.now() - tCache : 0;
resultPath = 'cache';
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
path: resultPath,
baseId: dto.baseId.slice(0, 8),
totalMs: Date.now() - tStart,
routerMs,
cacheMs,
rows: result.items.length,
}),
);
}
return result;
} catch (err) {
const error = err as Error;
this.logger.warn(
`Cache list failed for base ${dto.baseId}, falling back to Postgres: ${error.message}`,
);
if (error.stack) this.logger.warn(error.stack);
resultPath = 'fallback';
}
}
const tPg = debug ? Date.now() : 0;
const result = await this.baseRowRepo.list({
baseId: dto.baseId,
workspaceId,
filter,
@@ -211,6 +272,21 @@ export class BaseRowService {
schema,
pagination,
});
const pgMs = debug ? Date.now() - tPg : 0;
if (debug) {
console.log(
'[cache-perf]',
JSON.stringify({
path: resultPath,
baseId: dto.baseId.slice(0, 8),
totalMs: Date.now() - tStart,
routerMs,
pgMs,
rows: result.items.length,
}),
);
}
return result;
}
async reorder(dto: ReorderRowDto, workspaceId: string, userId?: string) {
@@ -0,0 +1,116 @@
import { type Kysely, sql } from 'kysely';
export async function up(db: Kysely<any>): Promise<void> {
// These functions previously used plpgsql + EXCEPTION blocks to catch bad
// casts. EXCEPTION blocks require subtransactions, which Postgres cannot
// use in parallel workers. The functions were marked PARALLEL SAFE but
// aren't actually parallel-safe. DuckDB's postgres extension triggers
// parallel COPY scans and fails on any row that invokes these.
//
// Rewrite each as a pure SQL function using jsonb_typeof + regex
// validation to achieve the same "coerce-or-null" semantics without
// plpgsql. SQL functions with no volatile side effects are genuinely
// parallel-safe.
await sql`
CREATE OR REPLACE FUNCTION base_cell_numeric(cells jsonb, prop uuid)
RETURNS numeric
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
AS $$
SELECT CASE jsonb_typeof(cells -> prop::text)
WHEN 'number' THEN (cells->>prop::text)::numeric
WHEN 'string' THEN
CASE WHEN (cells->>prop::text) ~ '^\\s*-?\\d+(\\.\\d+)?([eE][+-]?\\d+)?\\s*$'
THEN (cells->>prop::text)::numeric
ELSE NULL END
ELSE NULL
END
$$
`.execute(db);
await sql`
CREATE OR REPLACE FUNCTION base_cell_timestamptz(cells jsonb, prop uuid)
RETURNS timestamptz
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
AS $$
SELECT CASE
WHEN jsonb_typeof(cells -> prop::text) = 'string'
AND (cells->>prop::text) ~ '^\\d{4}-\\d{2}-\\d{2}([ T]\\d{2}:\\d{2}(:\\d{2}(\\.\\d+)?)?([+-]\\d{2}(:?\\d{2})?|Z)?)?$'
THEN (cells->>prop::text)::timestamptz
ELSE NULL
END
$$
`.execute(db);
await sql`
CREATE OR REPLACE FUNCTION base_cell_bool(cells jsonb, prop uuid)
RETURNS boolean
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
AS $$
SELECT CASE jsonb_typeof(cells -> prop::text)
WHEN 'boolean' THEN (cells->>prop::text)::boolean
WHEN 'string' THEN
CASE lower(cells->>prop::text)
WHEN 'true' THEN true
WHEN 't' THEN true
WHEN 'yes' THEN true
WHEN 'y' THEN true
WHEN '1' THEN true
WHEN 'false' THEN false
WHEN 'f' THEN false
WHEN 'no' THEN false
WHEN 'n' THEN false
WHEN '0' THEN false
ELSE NULL
END
ELSE NULL
END
$$
`.execute(db);
}
export async function down(db: Kysely<any>): Promise<void> {
// Restore the previous plpgsql + EXCEPTION versions. Same PARALLEL SAFE
// labels — they were broken before, they'll still be broken after rollback,
// but rollback means you're going back to the prior bug not inventing a
// new one.
await sql`
CREATE OR REPLACE FUNCTION base_cell_numeric(cells jsonb, prop uuid)
RETURNS numeric
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
AS $$
BEGIN
RETURN (cells->>prop::text)::numeric;
EXCEPTION WHEN others THEN
RETURN NULL;
END;
$$
`.execute(db);
await sql`
CREATE OR REPLACE FUNCTION base_cell_timestamptz(cells jsonb, prop uuid)
RETURNS timestamptz
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
AS $$
BEGIN
RETURN (cells->>prop::text)::timestamptz;
EXCEPTION WHEN others THEN
RETURN NULL;
END;
$$
`.execute(db);
await sql`
CREATE OR REPLACE FUNCTION base_cell_bool(cells jsonb, prop uuid)
RETURNS boolean
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
AS $$
BEGIN
RETURN (cells->>prop::text)::boolean;
EXCEPTION WHEN others THEN
RETURN NULL;
END;
$$
`.execute(db);
}
@@ -128,6 +128,21 @@ export class BaseRowRepo {
});
}
async countActiveRows(
baseId: string,
opts: WorkspaceOpts,
): Promise<number> {
const db = dbOrTx(this.db, opts.trx);
const row = await db
.selectFrom('baseRows')
.select((eb) => eb.fn.countAll<number>().as('count'))
.where('baseId', '=', baseId)
.where('workspaceId', '=', opts.workspaceId)
.where('deletedAt', 'is', null)
.executeTakeFirst();
return Number(row?.count ?? 0);
}
async getLastPosition(
baseId: string,
opts: WorkspaceOpts,
@@ -304,4 +304,101 @@ export class EnvironmentService {
getClickHouseUrl(): string {
return this.configService.get<string>('CLICKHOUSE_URL');
}
getBaseQueryCacheEnabled(): boolean {
const enabled = this.configService
.get<string>('BASE_QUERY_CACHE_ENABLED', 'false')
.toLowerCase();
return enabled === 'true';
}
getBaseQueryCacheMinRows(): number {
return parseInt(
this.configService.get<string>('BASE_QUERY_CACHE_MIN_ROWS', '25000'),
10,
);
}
getBaseQueryCacheMaxCollections(): number {
// Default is intentionally low (50) because a single-node self-host with
// ~100 MB per collection can pin ~5 GB RSS at the cap. SaaS/larger
// deployments can raise via env.
return parseInt(
this.configService.get<string>('BASE_QUERY_CACHE_MAX_COLLECTIONS', '50'),
10,
);
}
getBaseQueryCacheWarmTopN(): number {
return parseInt(
this.configService.get<string>('BASE_QUERY_CACHE_WARM_TOP_N', '50'),
10,
);
}
getBaseQueryCacheDebug(): boolean {
return (
this.configService
.get<string>('BASE_QUERY_CACHE_DEBUG', 'false')
.toLowerCase() === 'true'
);
}
getBaseQueryCacheTrace(): boolean {
return (
this.configService
.get<string>('BASE_QUERY_CACHE_TRACE', 'false')
.toLowerCase() === 'true'
);
}
getBaseQueryCacheMemoryLimit(): string {
// Per-DuckDB-instance memory ceiling. DuckDB accepts human-readable sizes:
// '256MB', '1GB', etc. Default 512MB is sized for bases up to ~300K rows
// with moderate schemas without spilling. DuckDB automatically spills
// to `temp_directory` when this is exceeded, so over-allocating is
// cheap — the risk is under-sizing.
return this.configService.get<string>(
'BASE_QUERY_CACHE_MEMORY_LIMIT',
'512MB',
);
}
getBaseQueryCacheTempDirectory(): string {
// Directory DuckDB uses to spill pages when an instance exceeds its
// memory_limit. Defaults to the system temp dir plus a namespace so
// different processes don't collide. Setting this explicitly is what
// enables spill-to-disk on `:memory:` instances — without it, DuckDB
// OOMs at memory_limit instead of paging.
const defaultPath = `${require('node:os').tmpdir()}/docmost-duckdb-cache`;
return this.configService.get<string>(
'BASE_QUERY_CACHE_TEMP_DIR',
defaultPath,
);
}
getBaseQueryCacheThreads(): number {
// Per-DuckDB-instance thread budget. Defaults to 2 so multiple concurrent
// instances don't fight for every core on a shared host.
return parseInt(
this.configService.get<string>('BASE_QUERY_CACHE_THREADS', '2'),
10,
);
}
getBaseQueryCacheReaderPoolSize(): number {
// Number of reader connections held open against the shared DuckDB
// instance. Reads are dispatched via `withReader()` which checks out a
// connection, runs the query, returns it. Bigger pool = more concurrent
// reads without serialization, at the cost of per-connection overhead
// (each connection carries its own catalog snapshot + prepared-statement
// cache ~= 300 KB).
//
// Default 4 matches libuv's default thread-pool size. Raise to 8+ if
// you see p99 list latency correlate with concurrent request volume.
return parseInt(
this.configService.get<string>('BASE_QUERY_CACHE_READER_POOL_SIZE', '4'),
10,
);
}
}
+15 -266
View File
@@ -3,11 +3,9 @@ import * as dotenv from 'dotenv';
import { Kysely } from 'kysely';
import { PostgresJSDialect } from 'kysely-postgres-js';
import postgres from 'postgres';
import { v7 as uuid7 } from 'uuid';
import { generateJitteredKeyBetween } from 'fractional-indexing-jittered';
import { seedBase } from '../core/base/query-cache/testing/seed-base';
const TOTAL_ROWS = Number(process.env.TOTAL_ROWS) || 1500;
const BATCH_SIZE = 2000;
const envFilePath = path.resolve(process.cwd(), '..', '..', '.env');
dotenv.config({ path: envFilePath });
@@ -30,206 +28,6 @@ const db = new Kysely<any>({
}),
});
const SKIP_TYPES = new Set([
'createdAt',
'lastEditedAt',
'lastEditedBy',
'person',
'file',
]);
const WORDS = [
'Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo', 'Foxtrot', 'Golf',
'Hotel', 'India', 'Juliet', 'Kilo', 'Lima', 'Mike', 'November',
'Oscar', 'Papa', 'Quebec', 'Romeo', 'Sierra', 'Tango', 'Uniform',
'Victor', 'Whiskey', 'X-ray', 'Yankee', 'Zulu', 'Report', 'Analysis',
'Summary', 'Review', 'Update', 'Draft', 'Final', 'Proposal', 'Budget',
'Timeline', 'Milestone', 'Objective', 'Strategy', 'Initiative',
];
const COLORS = [
'red', 'orange', 'yellow', 'green', 'blue', 'purple', 'pink', 'gray',
];
function randomWords(min: number, max: number): string {
const count = min + Math.floor(Math.random() * (max - min + 1));
const result: string[] = [];
for (let i = 0; i < count; i++) {
result.push(WORDS[Math.floor(Math.random() * WORDS.length)]);
}
return result.join(' ');
}
function makeChoices(names: string[], category?: string) {
return names.map((name, i) => ({
id: uuid7(),
name,
color: COLORS[i % COLORS.length],
...(category ? {} : {}),
}));
}
function makeStatusChoices() {
const todo = [{ id: uuid7(), name: 'Not Started', color: 'gray', category: 'todo' }];
const inProgress = [
{ id: uuid7(), name: 'In Progress', color: 'blue', category: 'inProgress' },
{ id: uuid7(), name: 'In Review', color: 'purple', category: 'inProgress' },
];
const complete = [
{ id: uuid7(), name: 'Done', color: 'green', category: 'complete' },
{ id: uuid7(), name: 'Cancelled', color: 'red', category: 'complete' },
];
const all = [...todo, ...inProgress, ...complete];
return { choices: all, choiceOrder: all.map((c) => c.id) };
}
type PropertyDef = {
name: string;
type: string;
isPrimary?: boolean;
typeOptions?: any;
};
function buildPropertyDefinitions(): PropertyDef[] {
const priorityChoices = makeChoices(['Low', 'Medium', 'High', 'Critical']);
const categoryChoices = makeChoices(['Engineering', 'Design', 'Marketing', 'Sales', 'Support', 'Operations']);
const tagChoices = makeChoices(['Bug', 'Feature', 'Improvement', 'Documentation', 'Research']);
const statusOpts = makeStatusChoices();
return [
{ name: 'Title', type: 'text', isPrimary: true },
{ name: 'Status', type: 'status', typeOptions: statusOpts },
{ name: 'Priority', type: 'select', typeOptions: { choices: priorityChoices, choiceOrder: priorityChoices.map((c) => c.id) } },
{ name: 'Category', type: 'select', typeOptions: { choices: categoryChoices, choiceOrder: categoryChoices.map((c) => c.id) } },
{ name: 'Tags', type: 'multiSelect', typeOptions: { choices: tagChoices, choiceOrder: tagChoices.map((c) => c.id) } },
{ name: 'Due Date', type: 'date', typeOptions: { dateFormat: 'YYYY-MM-DD', includeTime: false } },
{ name: 'Estimate', type: 'number', typeOptions: { format: 'plain', precision: 1 } },
{ name: 'Budget', type: 'number', typeOptions: { format: 'currency', precision: 2, currencySymbol: '$' } },
{ name: 'Approved', type: 'checkbox' },
{ name: 'Website', type: 'url' },
{ name: 'Contact Email', type: 'email' },
{ name: 'Notes', type: 'text' },
{ name: 'Created', type: 'createdAt' },
{ name: 'Last Edited', type: 'lastEditedAt' },
];
}
type CellGenerator = () => unknown;
function buildCellGenerator(property: any): CellGenerator | null {
if (SKIP_TYPES.has(property.type)) return null;
const typeOptions = property.type_options;
switch (property.type) {
case 'text':
return () => randomWords(2, 6);
case 'number':
return () => Math.round(Math.random() * 10000 * 100) / 100;
case 'select':
case 'status': {
const choices = typeOptions?.choices ?? [];
if (choices.length === 0) return null;
return () => choices[Math.floor(Math.random() * choices.length)].id;
}
case 'multiSelect': {
const choices = typeOptions?.choices ?? [];
if (choices.length === 0) return () => [];
return () => {
const count = 1 + Math.floor(Math.random() * Math.min(3, choices.length));
const shuffled = [...choices].sort(() => Math.random() - 0.5);
return shuffled.slice(0, count).map((c: any) => c.id);
};
}
case 'date': {
const start = new Date(2020, 0, 1).getTime();
const range = new Date(2026, 0, 1).getTime() - start;
return () => new Date(start + Math.random() * range).toISOString();
}
case 'checkbox':
return () => Math.random() > 0.5;
case 'url':
return () => `https://example.com/page/${Math.floor(Math.random() * 100000)}`;
case 'email':
return () => `user${Math.floor(Math.random() * 100000)}@example.com`;
default:
return null;
}
}
async function createBase(workspaceId: string, spaceId: string, creatorId: string | null): Promise<string> {
const baseId = uuid7();
const rowCountLabel = TOTAL_ROWS >= 1000 ? `${Math.round(TOTAL_ROWS / 1000)}K` : `${TOTAL_ROWS}`;
const baseName = `Seed Base ${rowCountLabel} rows`;
await db.insertInto('bases').values({
id: baseId,
name: baseName,
space_id: spaceId,
workspace_id: workspaceId,
creator_id: creatorId,
created_at: new Date(),
updated_at: new Date(),
}).execute();
console.log(`Created base: ${baseName}`);
console.log(`Base ID: ${baseId}\n`);
// Create properties
const propertyDefs = buildPropertyDefinitions();
let propPosition: string | null = null;
const insertedProperties: any[] = [];
for (const def of propertyDefs) {
propPosition = generateJitteredKeyBetween(propPosition, null);
const prop = {
id: uuid7(),
base_id: baseId,
name: def.name,
type: def.type,
position: propPosition,
type_options: def.typeOptions ?? null,
is_primary: def.isPrimary ?? false,
workspace_id: workspaceId,
created_at: new Date(),
updated_at: new Date(),
};
insertedProperties.push(prop);
}
await db.insertInto('base_properties').values(insertedProperties).execute();
console.log(`Created ${insertedProperties.length} properties:`);
for (const p of insertedProperties) {
console.log(` - ${p.name} (${p.type})${p.is_primary ? ' [primary]' : ''}${SKIP_TYPES.has(p.type) ? ' [system]' : ''}`);
}
// Create default view
const viewId = uuid7();
await db.insertInto('base_views').values({
id: viewId,
base_id: baseId,
name: 'Table View 1',
type: 'table',
position: generateJitteredKeyBetween(null, null),
config: {},
workspace_id: workspaceId,
creator_id: creatorId,
created_at: new Date(),
updated_at: new Date(),
}).execute();
console.log(`Created view: Table View 1\n`);
return baseId;
}
async function main() {
const spaceId = '019c69a3-dd47-7014-8b87-ec8f167577ee';
@@ -247,75 +45,26 @@ async function main() {
.limit(1)
.executeTakeFirst();
const creatorId = user?.id ?? null;
const creatorUserId = user?.id ?? null;
console.log(`Workspace: ${workspaceId}`);
console.log(`Space: ${spaceId}`);
console.log(`Creator: ${creatorId ?? '(none)'}\n`);
// Create the base with properties and view
const baseId = await createBase(workspaceId, spaceId, creatorId);
// Load the created properties for cell generation
const properties = await db
.selectFrom('base_properties')
.selectAll()
.where('base_id', '=', baseId)
.execute();
const generators: Array<{ propertyId: string; generate: CellGenerator }> = [];
for (const prop of properties) {
const gen = buildCellGenerator(prop);
if (gen) {
generators.push({ propertyId: prop.id, generate: gen });
}
}
console.log(`Generating ${TOTAL_ROWS.toLocaleString()} positions...`);
let lastPosition: string | null = null;
const positions: string[] = new Array(TOTAL_ROWS);
for (let i = 0; i < TOTAL_ROWS; i++) {
lastPosition = generateJitteredKeyBetween(lastPosition, null);
positions[i] = lastPosition;
}
console.log(`Positions generated (last: ${positions[positions.length - 1]})\n`);
console.log(`Creator: ${creatorUserId ?? '(none)'}\n`);
const startTime = Date.now();
const totalBatches = Math.ceil(TOTAL_ROWS / BATCH_SIZE);
for (let batchStart = 0; batchStart < TOTAL_ROWS; batchStart += BATCH_SIZE) {
const batchEnd = Math.min(batchStart + BATCH_SIZE, TOTAL_ROWS);
const rows: any[] = [];
for (let i = batchStart; i < batchEnd; i++) {
const cells: Record<string, unknown> = {};
for (const { propertyId, generate } of generators) {
cells[propertyId] = generate();
}
rows.push({
id: uuid7(),
base_id: baseId,
cells,
position: positions[i],
creator_id: creatorId,
workspace_id: workspaceId,
created_at: new Date(),
updated_at: new Date(),
});
}
await db.insertInto('base_rows').values(rows).execute();
const batchNum = Math.floor(batchStart / BATCH_SIZE) + 1;
const elapsed = ((Date.now() - startTime) / 1000).toFixed(1);
console.log(`Batch ${batchNum}/${totalBatches} inserted (${batchEnd.toLocaleString()} rows, ${elapsed}s elapsed)`);
}
const { baseId } = await seedBase({
db,
workspaceId,
spaceId,
creatorUserId,
rows: TOTAL_ROWS,
});
const totalElapsed = ((Date.now() - startTime) / 1000).toFixed(1);
console.log(`\nDone. Inserted ${TOTAL_ROWS.toLocaleString()} rows in ${totalElapsed}s`);
console.log(`\nBase ID: ${baseId}`);
console.log(
`Inserted ${TOTAL_ROWS.toLocaleString()} rows in ${totalElapsed}s`,
);
console.log(`Base ID: ${baseId}`);
await db.destroy();
process.exit(0);
File diff suppressed because it is too large Load Diff
+71
View File
@@ -496,6 +496,9 @@ importers:
'@clickhouse/client':
specifier: ^1.18.2
version: 1.18.2
'@duckdb/node-api':
specifier: 1.5.2-r.1
version: 1.5.2-r.1
'@fastify/cookie':
specifier: ^11.0.2
version: 11.0.2
@@ -1852,6 +1855,42 @@ packages:
peerDependencies:
react: '>=16.8.0'
'@duckdb/node-api@1.5.2-r.1':
resolution: {integrity: sha512-OzBBnS0JGXMoS5mzKNY/Ylr7SshcRQiLFIoxQ4AlePwJ2fNeDL/fbHu/knjxUrXwW1fJBTUgwWftmxDdnZZb3A==}
'@duckdb/node-bindings-darwin-arm64@1.5.2-r.1':
resolution: {integrity: sha512-v35FyKOb8EJCvaiPF7k0gvKiJTXR7PPQDNoWR0Gu+YSX5O9b+DIguzt1348Of3HebHy6ATSMzlUekaVA9YXu+g==}
cpu: [arm64]
os: [darwin]
'@duckdb/node-bindings-darwin-x64@1.5.2-r.1':
resolution: {integrity: sha512-SU9dIJ1BluKkkGxi4UsP4keqkkstB2YDySF9KcYu3EZKIVM3FTv2zc7XO38dXnHOq6+F3WqhWWZvD+XU945p7A==}
cpu: [x64]
os: [darwin]
'@duckdb/node-bindings-linux-arm64@1.5.2-r.1':
resolution: {integrity: sha512-3Tra9xM3aM3denaER4KhJ6//6PpmPbik9ECBQ+sh9PyKaEgHw/0kAcKnLm5EzWUnXF0qYmZlewvkCrse8KmOYw==}
cpu: [arm64]
os: [linux]
'@duckdb/node-bindings-linux-x64@1.5.2-r.1':
resolution: {integrity: sha512-pcQvZRHiIfJ9cq8parkSQczQHEml/IeGfnDCMAbEgD6+jaV9Y9Y5Ph1kP9aR+bm6him1S5ZIEr3kZbihjKnWbA==}
cpu: [x64]
os: [linux]
'@duckdb/node-bindings-win32-arm64@1.5.2-r.1':
resolution: {integrity: sha512-Ji8tym+N3LkrhVt0Up3bsacD/kpg4/JXFJQqxswiYvBaNCQOk+D+aiVS0GN5pcqvmnG7V7TpsDRzkLEFaWp1vw==}
cpu: [arm64]
os: [win32]
'@duckdb/node-bindings-win32-x64@1.5.2-r.1':
resolution: {integrity: sha512-5XqcqC+4R8ghBEEbnc2a0sqfz1zyPBRb9YcmIWfiuDoCYSYFbKhmHcEyNftZDHcwCoLOHXnUin45jraex4STqQ==}
cpu: [x64]
os: [win32]
'@duckdb/node-bindings@1.5.2-r.1':
resolution: {integrity: sha512-bUg3bLVj70YVku6fKyQJS8ASORl7kM7YFVFznsEB9pWbtazPj+ME2x2FUk0WiTzjJdutjzSSGXF066mB4bGGZA==}
'@emnapi/core@1.8.1':
resolution: {integrity: sha512-AvT9QFpxK0Zd8J0jopedNm+w/2fIzvtPKPjqyw9jwvBaReTTqPBk9Hixaz7KbjimP+QNz605/XnjFcDAL2pqBg==}
@@ -4040,6 +4079,7 @@ packages:
'@react-email/components@1.0.10':
resolution: {integrity: sha512-r/BnqfAjr3apcvn/NDx2DqNRD5BP5wZLRdjn2IVHXjt4KmQ5RHWSCAvFiXAzRHys1BWQ2zgIc7cpWePUcAl+nw==}
engines: {node: '>=20.0.0'}
deprecated: Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.
peerDependencies:
react: ^18.0 || ^19.0 || ^19.0.0-rc
@@ -12265,6 +12305,37 @@ snapshots:
react: 18.3.1
tslib: 2.8.1
'@duckdb/node-api@1.5.2-r.1':
dependencies:
'@duckdb/node-bindings': 1.5.2-r.1
'@duckdb/node-bindings-darwin-arm64@1.5.2-r.1':
optional: true
'@duckdb/node-bindings-darwin-x64@1.5.2-r.1':
optional: true
'@duckdb/node-bindings-linux-arm64@1.5.2-r.1':
optional: true
'@duckdb/node-bindings-linux-x64@1.5.2-r.1':
optional: true
'@duckdb/node-bindings-win32-arm64@1.5.2-r.1':
optional: true
'@duckdb/node-bindings-win32-x64@1.5.2-r.1':
optional: true
'@duckdb/node-bindings@1.5.2-r.1':
optionalDependencies:
'@duckdb/node-bindings-darwin-arm64': 1.5.2-r.1
'@duckdb/node-bindings-darwin-x64': 1.5.2-r.1
'@duckdb/node-bindings-linux-arm64': 1.5.2-r.1
'@duckdb/node-bindings-linux-x64': 1.5.2-r.1
'@duckdb/node-bindings-win32-arm64': 1.5.2-r.1
'@duckdb/node-bindings-win32-x64': 1.5.2-r.1
'@emnapi/core@1.8.1':
dependencies:
'@emnapi/wasi-threads': 1.1.0