Data Readiness requirements
TL;DR
Toucan AI relies on semantic metadata to power accurate chart generation and Natural Language Queries. Before connecting, your data must be structured to eliminate ambiguity, as the platform performs read-only operations and cannot "clean" your source data.
When to read this
Review these requirements before or during the Build phase to confirm your database is prepared for AI analysis.
Data Preparation Standards
1. Structural Logic: The "Single Grain" Rule
The AI engine requires a clearly defined granularity for every table.
Requirement: Each table should represent a single level of detail (e.g., one row per transaction).
Constraint: Do not include "Total" or "Sub-total" rows within the same table as individual records.
Why: AI agents use simple aggregations; mixed grains result in double-counting and massive errors in dashboards.
2. Naming Conventions: Natural Language Mapping
Toucan AI uses column names and display names to interpret user prompts and map chart axes.
Requirement: Use declarative, nouns for table names and human-readable column names, reveiw the display names for columns
Constraint: Avoid cryptic abbreviations or "computer-speak."
❌
rev_ext_tax->✅revenue_excluding_tax❌
ts_crtd->✅created_at_timestamp
3. Type Integrity
Field type inference is based on sampled values.
Metrics: Must be strictly numeric (float/int). Remove currency symbols or "N/A" strings from cells.
Dates: Use ISO 8601 / RFC3339 format (
YYYY-MM-DD) to avoid regional formatting errors.Categories: Use standardized strings or enums to prevent the AI from seeing "USA" and "United States" as two different entities.
Summary
Feature
Requirement
Purpose
Table Names
Use nouns (e.g., orders)
Identifies entities for AI
Column Names
Human-readable, no codes
Enables Natural Language Query
Data Types
Cast and Clean (Strict Type)
Correct field-type inference
Logic
Atomic grain (no sub-totals)
Prevents calculation errors
Constraints
Read-Only: Toucan AI never writes to your database; all cleaning must be done at the source.
Sampling: Inference is based on schema structure and sampled values, not a full data audit.
Manual review: All AI-generated metadata is editable and must be validated before production.
Last updated
Was this helpful?
