> For the complete documentation index, see [llms.txt](https://docs.toucanai.cloud/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.toucanai.cloud/build/data-connections/data-readiness-requirements.md).

# Data Readiness requirements

### TL;DR

Toucan AI relies on semantic metadata to power accurate chart generation and Natural Language Queries. Before connecting, your data must be structured to eliminate ambiguity, as the platform performs read-only operations and cannot "clean" your source data.

### When to read this

Review these requirements before or during the Build phase to confirm your database is prepared for AI analysis.

### Data Preparation Standards

#### **1. Structural Logic: The "Single Grain" Rule**

The AI engine requires a clearly defined granularity for every table.

* Requirement: Each table should represent a single level of detail (e.g., one row per transaction).
* Constraint: Do not include "Total" or "Sub-total" rows within the same table as individual records.
* Why: AI agents use simple aggregations; mixed grains result in double-counting and massive errors in dashboards.

#### **2. Naming Conventions: Natural Language Mapping**

Toucan AI uses column names and display names to interpret user prompts and map chart axes.

* Requirement: Use declarative, nouns for table names and human-readable column names, reveiw the display names for columns
* Constraint: Avoid cryptic abbreviations or "computer-speak."
  * ❌ `rev_ext_tax` ->✅ `revenue_excluding_tax`
  * ❌ `ts_crtd` ->✅ `created_at_timestamp`

#### **3. Type Integrity**

Field type inference is based on sampled values.

* Metrics: Must be strictly numeric (float/int). Remove currency symbols or "N/A" strings from cells.
* Dates: Use [ISO 8601 / RFC3339](https://www.rfc-editor.org/rfc/rfc3339.html) format (`YYYY-MM-DD`) to avoid regional formatting errors.
* Categories: Use standardized strings or enums to prevent the AI from seeing "USA" and "United States" as two different entities.

### Summary

| **Feature**      | **Requirement**              | **Purpose**                                  |
| ---------------- | ---------------------------- | -------------------------------------------- |
| **Table Names**  | Use nouns (e.g., `orders`)   | Identifies entities for AI                   |
| **Column Names** | Human-readable, no codes     | <p>Enables Natural Language Query</p><p></p> |
| **Data Types**   | Cast and Clean (Strict Type) | Correct field-type inference                 |
| **Logic**        | Atomic grain (no sub-totals) | Prevents calculation errors                  |

#### Constraints

* **Read-Only**: Toucan AI never writes to your database; all cleaning must be done at the source.
* **Sampling**: Inference is based on schema structure and sampled values, not a full data audit.
* **Manual review**: All AI-generated metadata is editable and must be validated before production.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.toucanai.cloud/build/data-connections/data-readiness-requirements.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
