Data Readiness requirements

TL;DR

Toucan AI relies on semantic metadata to power accurate chart generation and Natural Language Queries. Before connecting, your data must be structured to eliminate ambiguity, as the platform performs read-only operations and cannot "clean" your source data.

When to read this

Review these requirements before or during the Build phase to confirm your database is prepared for AI analysis.

Data Preparation Standards

1. Structural Logic: The "Single Grain" Rule

The AI engine requires a clearly defined granularity for every table.

  • Requirement: Each table should represent a single level of detail (e.g., one row per transaction).

  • Constraint: Do not include "Total" or "Sub-total" rows within the same table as individual records.

  • Why: AI agents use simple aggregations; mixed grains result in double-counting and massive errors in dashboards.

2. Naming Conventions: Natural Language Mapping

Toucan AI uses column names and display names to interpret user prompts and map chart axes.

  • Requirement: Use declarative, nouns for table names and human-readable column names, reveiw the display names for columns

  • Constraint: Avoid cryptic abbreviations or "computer-speak."

    • rev_ext_tax ->✅ revenue_excluding_tax

    • ts_crtd ->✅ created_at_timestamp

3. Type Integrity

Field type inference is based on sampled values.

  • Metrics: Must be strictly numeric (float/int). Remove currency symbols or "N/A" strings from cells.

  • Dates: Use ISO 8601 / RFC3339arrow-up-right format (YYYY-MM-DD) to avoid regional formatting errors.

  • Categories: Use standardized strings or enums to prevent the AI from seeing "USA" and "United States" as two different entities.

Summary

Feature

Requirement

Purpose

Table Names

Use nouns (e.g., orders)

Identifies entities for AI

Column Names

Human-readable, no codes

Enables Natural Language Query

Data Types

Cast and Clean (Strict Type)

Correct field-type inference

Logic

Atomic grain (no sub-totals)

Prevents calculation errors

Constraints

  • Read-Only: Toucan AI never writes to your database; all cleaning must be done at the source.

  • Sampling: Inference is based on schema structure and sampled values, not a full data audit.

  • Manual review: All AI-generated metadata is editable and must be validated before production.

Last updated

Was this helpful?