FineType Docs
Explore the full taxonomy, CLI commands, DuckDB extension, and performance benchmarks. Read more →
This guide walks you through installing the core Noon tools, running your first DuckDB query, and profiling data with FineType.
Noon tools run on macOS, Linux, and Windows (via WSL).
| Tool | Purpose | Required |
|---|---|---|
| Nushell | Shell for data workflows | Recommended |
| DuckDB | In-memory analytical SQL engine | Yes |
| FineType | Semantic type classification | Yes |
You’ll need a terminal and a package manager. The examples below use Homebrew (macOS/Linux), but alternatives are listed for each platform.
Nushell is a modern shell that treats data as structured tables — ideal for piping between tools.
# macOS / Linuxbrew install nushell
# Windowswinget install nushell
# Or via Cargo (any platform with Rust)cargo install nuVerify:
nu --versionDuckDB is a fast, embeddable SQL engine for analytics. It reads CSV, JSON, and Parquet natively.
# macOSbrew install duckdb
# Linux (apt)sudo apt install duckdb
# Windowswinget install DuckDB.cliVerify:
duckdb --versionFineType classifies text into 152 semantic types with a character-level CNN model.
# macOS (Homebrew)brew install noon-org/tap/finetype
# Any platform with Rustcargo install finetype-cliVerify:
finetype --versionLet’s walk through a complete analytics workflow: create a dataset, query it with DuckDB, then profile the column types with FineType.
Save this as contacts.csv:
id,name,email,created_at,ip_address,amountThis dataset has a mix of types: names, emails, timestamps, IP addresses, and numeric amounts.
Open a DuckDB shell and explore the data:
-- Start DuckDBduckdb
-- Load and inspectSELECT * FROM 'contacts.csv';
-- Aggregate querySELECT count(*) AS total_contacts, avg(amount) AS avg_amount, min(created_at) AS earliest, max(created_at) AS latestFROM 'contacts.csv';Expected output:
┌─────────────────┬────────────┬──────────────────────┬──────────────────────┐│ total_contacts │ avg_amount │ earliest │ latest ││ int64 │ double │ varchar │ varchar │├─────────────────┼────────────┼──────────────────────┼──────────────────────┤│ 5 │ 856.05 │ 2024-01-15T09:30:00Z │ 2024-05-01T08:00:00Z │└─────────────────┴────────────┴──────────────────────┴──────────────────────┘DuckDB automatically reads the CSV and lets you query it immediately — no schema definition needed.
Now let’s see what FineType detects in each column:
finetype profile -f contacts.csvExpected output:
Column Type Confidence────────────── ──────────────────────────────── ──────────id representation.numeric.increment 0.95name identity.person.full_name 0.92email identity.person.email 0.99created_at datetime.timestamp.iso_8601 0.98ip_address technology.internet.ip_v4 0.97amount representation.numeric.decimal 0.94FineType identifies semantic types beyond what SQL type inference gives you — it distinguishes emails from strings, IP addresses from text, and ISO timestamps from generic dates.
You can also classify single values:
# → identity.person.email
finetype infer -i "192.168.1.10"# → technology.internet.ip_v4
finetype infer -i "2024-01-15T09:30:00Z"# → datetime.timestamp.iso_8601Each prediction is a transformation contract — it maps to a DuckDB SQL expression guaranteed to parse the value correctly.
If you have the FineType DuckDB extension installed, you can classify directly in SQL:
INSTALL finetype FROM community;LOAD finetype;
SELECT column_name, finetype(value) AS semantic_typeFROM 'contacts.csv';Dive deeper into the Noon analytics ecosystem.
FineType Docs
Explore the full taxonomy, CLI commands, DuckDB extension, and performance benchmarks. Read more →
DuckDB
Learn more about DuckDB’s SQL dialect, file format support, and extensions. Documentation →
Nushell
Discover Nushell’s structured data pipelines and how they complement SQL workflows. The Nushell Book →