---
name: gentic-knowledge
description: "Give your AI agent a semantic knowledge base. Ingest documents, web pages, emails, Slack messages, and raw text — then search everything with natural language. Powered by vector embeddings."
license: MIT
metadata:
  author: gentic
  version: "1.0.0"
---

# Gentic Knowledge

Give your AI agent long-term memory. Ingest documents, web pages, emails, Slack messages, and raw text — then search everything with natural language. Powered by vector embeddings via Gemini.

## When to apply

- User wants to add documents, web pages, or text content to a knowledge base.
- User wants to search a knowledge base semantically rather than by keyword.
- User wants to see what content has been indexed, or delete content from their knowledge base.
- User mentions "knowledge base", "vectorize", "index", "semantic search", or "RAG".
- User wants to store and search emails, Slack messages, reviews, notes, or other unstructured text.

## Tools

| Tool | Description | Cost |
|------|-------------|------|
| `delete_content` | Delete content from your knowledge base. Removes all chunks for a document_id or specific chunk IDs. Content is soft-deleted (excluded from search but retained). | Free |
| `list_kb_sources` | List all indexed sources in your knowledge base. Shows each document/content with its source type, title, chunk count, and category. | Free |
| `search_knowledge` | Search your knowledge base using natural language. Returns semantically relevant results across all indexed content (documents, web pages, emails, notes, etc.). Use list_kb_sources first to see what content has been indexed. | Free |
| `vectorize_content` | Vectorize raw text content (emails, Slack messages, reviews, social posts, etc.) and store in your knowledge base for semantic search. Processing is async — use list_kb_sources to confirm indexing completed. Billed per chunk (1¢/chunk, ~1000 chars each). | 1¢ / result (min 1¢) |
| `vectorize_document` | Vectorize a document file and store it in your knowledge base for semantic search. Supports PDF, TXT, RTF, and DOCX files. Accepts HTTP URLs and Google Drive links. Processing is async — use list_kb_sources to confirm the document was indexed successfully. Billed per chunk (1¢/chunk, ~1000 chars each). | 1¢ / result (min 1¢) |
| `vectorize_web_content` | Scrape and vectorize a web page, storing it in your knowledge base for semantic search. The page is scraped, converted to clean text, chunked, and embedded. Processing is async — use list_kb_sources to confirm indexing completed. Billed per chunk (1¢/chunk) plus a 5¢ scraping fee. | 1¢ / result (min 6¢) |

## Workflow

### 1. Pick the right ingestion tool

Three tools, one chooser: **`vectorize_document`** for PDF/TXT/RTF/DOCX files (accepts HTTPS and Google Drive URLs), **`vectorize_web_content`** for live web pages (auto-scraped, cleaned, chunked; thin pages retried with JS rendering), **`vectorize_content`** for raw text (emails, Slack, reviews, notes — requires `source_type` of `email`/`slack`/`review`/`social`/`creator_feedback`/`note`/`other`). For the text tool, pass a `source_id` (e.g. email thread ID) to deduplicate. All three accept optional `title` and `category`.

### 2. Ingestion is async — tell the user

All vectorize tools run in the background. Content becomes searchable in ~10–60 seconds depending on size. **Don't block waiting.** Tell the user it's processing and call `list_kb_sources` afterwards to confirm indexing finished. Don't run `search_knowledge` immediately after ingestion and claim "no results" — the embeddings aren't ready yet.

### 3. Surface ingestion cost before large uploads

A chunk is ~1,000 characters at **1¢/chunk**. A short email is 1 chunk. A 10-page document is ~20–40 chunks. A 50-page PDF is ~150 chunks ≈ **$1.50**. `vectorize_web_content` additionally charges a **5¢ scraping fee** on top of the per-chunk cost. **Surface an estimate before the user indexes large documents or bulk content** — "this PDF will likely produce ~X chunks at 1¢ each ≈ $Y".

### 4. Search is free — use it aggressively

`search_knowledge` costs nothing, so encourage the user to refine queries. Key params: `query` (natural language, be descriptive), `source_types` filter (e.g. `["document", "web"]` or `["email", "slack"]`), `category` filter, `document_id` to scope to one source, `limit` (default 10, max 50). Results come ranked by similarity score.

### 5. Present results with source context

Show the most relevant snippets with their source info (title, type, category) and similarity scores. When searching across many sources, **group by document or source type**. Mention the total result count. If scores are all low, offer to broaden the search or check `list_kb_sources` to confirm the relevant content is actually indexed.

### 6. Manage sources cleanly

`list_kb_sources` shows everything indexed — title, source type, chunk count, category, date. Use it to verify ingestion completed or find `document_id` values for deletion. `delete_content` is a **soft delete** — removes chunks from search but retains them in storage. Delete by `document_id` to remove all chunks for a source, or by `content_ids` to remove specific chunks.

### 7. Encourage categories on ingest

The `category` field is the single best thing the user can set for future search ergonomics — it lets them filter search to a subset later (e.g. `category="support"` vs `"legal"`). Always offer a category suggestion on ingest instead of leaving it blank.

## Notes

- All tools are organization-scoped — users only see their own knowledge base.
- Ingestion is **async**. Content takes ~10–60s to become searchable. Don't run `search_knowledge` immediately and claim it's missing.
- Ingestion costs are **1¢/chunk** (~1,000 chars each). `vectorize_web_content` adds a **5¢ scraping fee**. Search is free.
- Content is deduplicated by document ID / `source_id`. Re-vectorizing the same URL or source_id updates rather than duplicates.
- `delete_content` is soft-delete — chunks are excluded from search but retained in storage.

## Tool details

- `delete_content` — Delete content from your knowledge base. Removes all chunks for a document_id or specific chunk IDs. Content is soft-deleted (excluded from search but retained).
  - `document_id` (string) — Delete all chunks belonging to this document ID. Use list_kb_sources to find document IDs.
  - `content_ids` (array of string) — Delete specific chunk IDs
- `list_kb_sources` — List all indexed sources in your knowledge base. Shows each document/content with its source type, title, chunk count, and category.
- `search_knowledge` — Search your knowledge base using natural language. Returns semantically relevant results across all indexed content (documents, web pages, emails, notes, etc.). Use list_kb_sources first to see what content has been indexed.
  - `query` (string, required) — Natural language search query
  - `source_types` (array of string) — Filter by source type(s) — e.g. ["document", "web"]
  - `category` (string) — Filter by category
  - `document_id` (string) — Search within a specific document
  - `limit` (integer, default: `10`, required) — Number of results to return (default: 10, max: 50)
- `vectorize_content` — Vectorize raw text content (emails, Slack messages, reviews, social posts, etc.) and store in your knowledge base for semantic search. Processing is async — use list_kb_sources to confirm indexing completed. Billed per chunk (1¢/chunk, ~1000 chars each).
  - `content` (string, required) — The text content to vectorize
  - `source_type` (string, enum: `email` | `slack` | `review` | `social` | `creator_feedback` | `note` | `other`, required) — Type of content
  - `title` (string) — Title or subject line (optional)
  - `source_id` (string) — External reference ID like message ID or email thread ID (optional)
  - `category` (string) — Category for filtering (optional)
  - `created_at` (string) — When the content was originally created, ISO format (optional)
  - `slug` (string) — Brain v1 (spec 005-brain FR-027/028): caller-provided stable slug derived from the source's stable external identity. Used as the knowledge `slug` column. Enforces the same regex as wiki_stage_source so the (source_type, slug) UNIQUE constraint stays meaningful regardless of ingest path. Optional for legacy callers.
  - `source_uri` (string) — Brain v1 (FR-027): canonical external identity, e.g. `slack://C123/1700000000.000000`. Optional.
  - `source_updated_at` (string) — Brain v1 (FR-027): ISO-8601 timestamp of when the original source was last modified. Optional.
- `vectorize_document` — Vectorize a document file and store it in your knowledge base for semantic search. Supports PDF, TXT, RTF, and DOCX files. Accepts HTTP URLs and Google Drive links. Processing is async — use list_kb_sources to confirm the document was indexed successfully. Billed per chunk (1¢/chunk, ~1000 chars each).
  - `url` (string, required) — URL of the document to vectorize. Supports direct HTTP URLs and Google Drive links.
  - `title` (string) — Title of the document (optional)
  - `category` (string) — Category for filtering — e.g. "research", "product_docs", "industry_news"
  - `slug` (string) — Brain v1 (spec 005-brain FR-027/028): caller-provided stable slug derived from the source's stable external identity. Used as the knowledge `slug` column so future Proactive Mother references survive title edits. Enforces the same regex as wiki_stage_source so the (source_type, slug) UNIQUE constraint stays meaningful regardless of ingest path. Optional — legacy callers continue to work without it.
  - `source_uri` (string) — Brain v1 (FR-027): canonical external identity of the source, e.g. `slack://...`, `notion://page/...`. Optional — legacy callers omit this.
  - `source_updated_at` (string) — Brain v1 (FR-027): ISO-8601 timestamp of when the original source was last modified. Used by future connector delta-sync paths.
- `vectorize_web_content` — Scrape and vectorize a web page, storing it in your knowledge base for semantic search. The page is scraped, converted to clean text, chunked, and embedded. Processing is async — use list_kb_sources to confirm indexing completed. Billed per chunk (1¢/chunk) plus a 5¢ scraping fee.
  - `url` (string, required) — URL of the web page to scrape and vectorize
  - `category` (string) — Category for filtering — e.g. "blog", "competitor", "research"
  - `slug` (string) — Brain v1 (spec 005-brain FR-027/028): caller-provided stable slug. Enforces the same regex as wiki_stage_source. Optional for legacy callers.
  - `source_uri` (string) — Brain v1 (FR-027): canonical external identity, typically the original URL. Optional.
  - `source_updated_at` (string) — Brain v1 (FR-027): ISO-8601 timestamp of when the page was last modified. Optional.

---

_This SKILL.md is generated from the live Gentic MCP manifest. Tool names, descriptions, and pricing are always current. Connect Gentic Knowledge at https://gentic.co/knowledge._
