---
name: gentic-data
description: "Give your AI agent a cloud database. Import CSVs, query with SQL, insert and sync records, and manage tables — all through the Model Context Protocol. Powered by DuckDB via MotherDuck."
license: MIT
metadata:
  author: gentic
  version: "1.0.0"
---

# Gentic Data

Give your AI agent a full cloud database. Import CSVs from any URL, query with SQL, insert and sync records, and manage tables — all through the Model Context Protocol. Powered by DuckDB via MotherDuck.

## When to apply

- User wants to create a table or import data from a CSV, Google Sheet, or S3 URL.
- User wants to query, analyze, or explore their data — tables, columns, sample rows, aggregates.
- User wants to preview a table's structure, schema, or row count.
- User wants to update, sync, upsert, or append data from a CSV.
- User wants to insert one or more records into a table with duplicate prevention.
- User wants to search text data semantically or find similar records.
- User asks about their "database", "tables", or "data" in general.

## Tools

| Tool | Description | Cost |
|------|-------------|------|
| `batch_insert_records` | Insert multiple records into a table in one batch operation with duplicate prevention. All records must have the same columns. Maximum 1000 records per call. Use sample_table first to see the table structure. | Free |
| `batch_update_table_from_csv` | Batch update ONLY existing records in a table from a CSV URL (ignores new records). If your CSV has both updates AND new records, use sync_table_from_csv instead. | Free |
| `create_table_from_csv` | Create a new table by importing a CSV file. Supports S3, HTTPS, Google Sheets, and Google Drive URLs.  If the user wants to SEARCH, EXPLORE, or FIND PATTERNS in text-heavy data (reviews, feedback, support tickets, survey responses, comments), set embed_columns to the text columns that should be searchable. This creates a '_vectors' table with per-row embeddings that can be searched semantically using the search_structured tool. Processing is async and billed at 1¢/row.  If the user just wants to IMPORT data for SQL analytics (counts, averages, aggregations), omit embed_columns for a plain synchronous import. Use query_data for SQL analysis afterward. | Free |
| `get_table_schema` | Get detailed schema information for a table including column names, data types, and nullability. | Free |
| `insert_record` | Insert a single record into a table with duplicate prevention. Use sample_table first to see the table structure and required columns. | Free |
| `list_database_tables` | List all tables in your Gentic Data database with row counts. Use this to see what data you have available. | Free |
| `query_data` | Execute a SQL SELECT query for data analysis — aggregations, counts, averages, GROUP BY, JOINs, filtering, and reporting. Works on all tables including '_vectors' tables. Supports SELECT and WITH (CTEs). Write operations and file-reading functions are blocked. Use sample_table first to understand the table structure.  Do NOT use this for semantic/natural language search — use search_structured instead. | Free |
| `recall_memory` | Search long-term cross-conversation memory captured from Slack and Telegram threads. Answers questions like 'what did we decide last week?' or 'what has the team said about X?'. Vector-searches the org-scoped `org_memory` table and returns rows ranked by semantic relevance. Each row includes text, role, surface, thread_key, agent_name, created_at, and similarity_score. Use surface/role/since/until/thread_key filters to narrow scope. Free. | Free |
| `sample_table` | Get a preview of a table with sample records and column information. Use this to understand the table structure before running analysis queries. | Free |
| `search_structured` | Search through text data using natural language. Use this when the user wants to FIND, SEARCH, or EXPLORE specific topics, themes, or patterns in a '_vectors' table (created via create_table_from_csv with embed_columns). Returns full rows ranked by semantic relevance. Supports optional SQL filters to narrow results by date, rating, category, etc.  Do NOT use this for SQL analytics (counts, averages, GROUP BY) — use query_data instead.  Example: search_structured(table_name='reviews_vectors', query='shipping delays and damaged packaging', filters="rating <= 2 AND date > '2024-06-01'") | Free |
| `sync_table_from_csv` | Sync a table with CSV data: updates existing records AND adds new ones in a single operation. This is the recommended tool when a user asks to 'sync', 'update', or 'refresh' a table from a data source. | Free |
| `update_table_from_csv` | Update an existing table from a CSV URL. Supports three modes: 'replace' (drop and recreate), 'append' (add all rows — may create duplicates), or 'upsert' (add only new rows based on unique_column). Use sample_table first to check columns. | Free |

## Workflow

### 1. Start with `list_database_tables`

For any 'what data do I have' question, start with `list_database_tables`. It's free, returns all tables with row counts, and anchors the rest of the conversation. If the user names a table directly, skip to `sample_table` or `get_table_schema`.

### 2. Import with `create_table_from_csv`

Accepts HTTPS, S3, Google Sheets, and Google Drive URLs — the last two are auto-converted to direct CSV downloads. Files must be publicly accessible. Table names must be letters/numbers/underscores only. For semantic search, pass `embed_columns` to vectorize text columns — this is the only paid tool at 1¢/row, so mention the cost when importing large datasets. CSV files are capped at 100 MB.

### 3. Always `sample_table` before writing SQL

Before running `query_data` or inserting records, call `sample_table` or `get_table_schema` to see the real column names and types. Don't guess. `sample_table` gives you a feel for the data; `get_table_schema` gives you precise types. `query_data` is strictly read-only — only SELECT and WITH (CTEs), no INSERT/UPDATE/DELETE/DROP, no `read_csv_auto()` / `read_parquet()` / `glob()`.

### 4. Pick the right update tool

`update_table_from_csv` is the general case with three modes: `replace` (drop + recreate), `append` (add all rows, allows duplicates), `upsert` (add only new rows keyed on `unique_column`). `sync_table_from_csv` is the best answer when the user says "sync", "update", or "refresh" — updates existing rows and inserts new ones in one call. `batch_update_table_from_csv` only touches existing rows (use for corrections). **Always ask which column contains unique identifiers** — don't default to append.

### 5. Insert records safely

`insert_record` for a single row, `batch_insert_records` (up to 1000 rows) for many. Both require `unique_column` for duplicate prevention — `insert_record` rejects duplicates, `batch_insert_records` silently skips them. Always `sample_table` first so you know the required columns.

### 6. Semantic search via `_vectors` tables

`search_structured` runs cosine-similarity search over a `_vectors` table created with `embed_columns`. Key params: `table_name` must end in `_vectors`, `query` is natural language, `filters` is an optional SQL WHERE clause for structured columns (validated for safety), `limit` defaults to 20 (max 100). Use this when the user asks to find 'similar' items or to search by meaning rather than exact match.

### 7. Present results clearly

Don't dump raw JSON. For query results, render a markdown table with column headers; for large result sets, show a head/tail and summarize. For imports and updates, confirm the row counts that changed. For semantic search, lead with the top match and why it's relevant. Always end with a concrete next step (another query, a chained MCP call, an export).

## Notes

- All tools are organization-scoped — users only see their own database and tables.
- All tools are free **except** vectorized imports via `create_table_from_csv` with `embed_columns` at 1¢/row. A 10k-row vectorized import is $100 — always surface the cost before running.
- Table and column names can only contain letters, numbers, and underscores.
- `query_data` is read-only. Use the insert/update tools for writes.
- CSV files are capped at 100 MB per import.
- Users don't need to 'create a database' — their database is provisioned automatically on first use.

## Tool details

- `batch_insert_records` — Insert multiple records into a table in one batch operation with duplicate prevention. All records must have the same columns. Maximum 1000 records per call. Use sample_table first to see the table structure.
  - `table_name` (string, required) — Name of the table to insert into
  - `records` (array of object, required) — Array of records to insert (all must have the same columns)
  - `unique_column` (string, required) — Column to check for duplicates (e.g. 'id', 'email'). Records with existing values are skipped.
- `batch_update_table_from_csv` — Batch update ONLY existing records in a table from a CSV URL (ignores new records). If your CSV has both updates AND new records, use sync_table_from_csv instead.
  - `table_name` (string, required) — Name of the existing table to update
  - `csv_url` (string, required) — CSV URL with updated data (supports Google Sheets, Drive, S3, HTTPS)
  - `unique_column` (string, required) — Column to match records (e.g. 'id', 'email'). Only matching records are updated.
- `create_table_from_csv` — Create a new table by importing a CSV file. Supports S3, HTTPS, Google Sheets, and Google Drive URLs.  If the user wants to SEARCH, EXPLORE, or FIND PATTERNS in text-heavy data (reviews, feedback, support tickets, survey responses, comments), set embed_columns to the text columns that should be searchable. This creates a '_vectors' table with per-row embeddings that can be searched semantically using the search_structured tool. Processing is async and billed at 1¢/row.  If the user just wants to IMPORT data for SQL analytics (counts, averages, aggregations), omit embed_columns for a plain synchronous import. Use query_data for SQL analysis afterward.
  - `table_name` (string, required) — Base name for the table (e.g. 'reviews', 'support_tickets'). If embed_columns is provided, '_vectors' is appended automatically (e.g. 'reviews' → 'reviews_vectors').
  - `csv_url` (string, required) — URL to a publicly accessible CSV file. Google Sheets and Drive URLs are auto-converted.
  - `embed_columns` (array of string) — Text columns to embed for semantic search (e.g. ['review_title', 'review_text']). REQUIRED when the user wants to search/explore/find patterns in text data. Only include natural language columns — skip IDs, dates, and numbers. Omit entirely for a plain CSV import.
  - `column_types` (object) — Explicit DuckDB type overrides for columns (e.g. {"submission_date": "TIMESTAMP"}). Only used with embed_columns.
- `get_table_schema` — Get detailed schema information for a table including column names, data types, and nullability.
  - `table_name` (string, required) — Name of the table
- `insert_record` — Insert a single record into a table with duplicate prevention. Use sample_table first to see the table structure and required columns.
  - `table_name` (string, required) — Name of the table to insert into
  - `record_data` (object, required) — Record to insert as key-value pairs (e.g. {"id": 123, "name": "John", "email": "john@example.com"})
  - `unique_column` (string, required) — Column to check for duplicates (e.g. 'id', 'email'). Insert is rejected if value already exists.
- `list_database_tables` — List all tables in your Gentic Data database with row counts. Use this to see what data you have available.
- `query_data` — Execute a SQL SELECT query for data analysis — aggregations, counts, averages, GROUP BY, JOINs, filtering, and reporting. Works on all tables including '_vectors' tables. Supports SELECT and WITH (CTEs). Write operations and file-reading functions are blocked. Use sample_table first to understand the table structure.  Do NOT use this for semantic/natural language search — use search_structured instead.
  - `sql_query` (string, required) — SQL SELECT query to execute (e.g. "SELECT * FROM sales WHERE date > '2024-01-01' LIMIT 10")
- `recall_memory` — Search long-term cross-conversation memory captured from Slack and Telegram threads. Answers questions like 'what did we decide last week?' or 'what has the team said about X?'. Vector-searches the org-scoped `org_memory` table and returns rows ranked by semantic relevance. Each row includes text, role, surface, thread_key, agent_name, created_at, and similarity_score. Use surface/role/since/until/thread_key filters to narrow scope. Free.
  - `query` (string, required) — Natural-language question — what do you want to recall from prior conversations? e.g. 'what did we decide about pricing last week?' or 'what has the team said about onboarding flow?'
  - `limit` (integer, default: `10`, required) — Max memory rows to return (default 10, max 50).
  - `surface` (string, enum: `slack` | `telegram`) — Restrict results to a single surface.
  - `since` (string) — Only include memories created on/after this ISO date (e.g. '2026-04-01').
  - `until` (string) — Only include memories created on/before this ISO date.
  - `role` (string, enum: `user` | `assistant`) — Filter by message author — 'user' for human turns, 'assistant' for agent replies.
  - `thread_key` (string) — Pin retrieval to a single conversation thread (matches the thread_key written by the capture side).
- `sample_table` — Get a preview of a table with sample records and column information. Use this to understand the table structure before running analysis queries.
  - `table_name` (string, required) — Name of the table to sample
  - `sample_size` (integer, default: `5`, required) — Number of sample records to return (default: 5, max: 20)
- `search_structured` — Search through text data using natural language. Use this when the user wants to FIND, SEARCH, or EXPLORE specific topics, themes, or patterns in a '_vectors' table (created via create_table_from_csv with embed_columns). Returns full rows ranked by semantic relevance. Supports optional SQL filters to narrow results by date, rating, category, etc.  Do NOT use this for SQL analytics (counts, averages, GROUP BY) — use query_data instead.  Example: search_structured(table_name='reviews_vectors', query='shipping delays and damaged packaging', filters="rating <= 2 AND date > '2024-06-01'")
  - `table_name` (string, required) — The vector table to search (e.g. 'reviews_vectors'). Must be a table created with embed_columns.
  - `query` (string, required) — Natural language search query (e.g. 'shipping delays and poor packaging')
  - `filters` (string) — Optional SQL WHERE conditions for structured columns (e.g. "submission_date > '2024-06-01' AND rating <= 2")
  - `limit` (integer, default: `20`, required) — Max results to return (default: 20, max: 100)
  - `columns` (array of string) — Which columns to return (defaults to all non-embedding columns)
- `sync_table_from_csv` — Sync a table with CSV data: updates existing records AND adds new ones in a single operation. This is the recommended tool when a user asks to 'sync', 'update', or 'refresh' a table from a data source.
  - `table_name` (string, required) — Name of the existing table to sync
  - `csv_url` (string, required) — CSV URL with the data to sync (supports Google Sheets, Drive, S3, HTTPS)
  - `unique_column` (string, required) — Column to match records between CSV and table (e.g. 'id', 'email', 'video_url')
- `update_table_from_csv` — Update an existing table from a CSV URL. Supports three modes: 'replace' (drop and recreate), 'append' (add all rows — may create duplicates), or 'upsert' (add only new rows based on unique_column). Use sample_table first to check columns.
  - `table_name` (string, required) — Name of the existing table to update
  - `csv_url` (string, required) — CSV URL with the new data (supports Google Sheets, Drive, S3, HTTPS)
  - `mode` (string, enum: `replace` | `append` | `upsert`, default: `"replace"`, required) — Update mode: 'replace' (overwrite all data), 'append' (add all rows), 'upsert' (add only new rows)
  - `unique_column` (string) — Column to check for uniqueness (required when mode='upsert'). Example: 'id', 'email'

---

_This SKILL.md is generated from the live Gentic MCP manifest. Tool names, descriptions, and pricing are always current. Connect Gentic Data at https://gentic.co/data._
