Getting Started
Install FineType and DuckDB, then profile your first dataset.
This guide walks you through installing FineType and DuckDB, then profiling your first dataset with semantic type detection.
Prerequisites
FineType and DuckDB run on macOS, Linux, and Windows (via WSL).
You'll need a terminal and a package manager. The examples below use Homebrew (macOS/Linux), but alternatives are listed for each tool.
Install the Tools
1. DuckDB
DuckDB is a fast, embeddable SQL engine for analytics. It reads CSV, JSON, and Parquet natively.
# macOS
brew install duckdb
# Linux (apt)
sudo apt install duckdb
# Windows
winget install DuckDB.cliVerify:
duckdb --version2. FineType
FineType classifies text into 209 semantic types with a character-level neural network.
# macOS (Homebrew)
brew install noon-org/tap/finetype
# Any platform with Rust
cargo install finetype-cliVerify:
finetype --versionYour First Workflow
Let's profile a dataset: create a CSV, query it with DuckDB, then detect the column types with FineType.
Step 1: Create a Sample CSV
Save this as contacts.csv:
id,name,email,created_at,ip_address,amount
1,Alice Chen,[email protected],2024-01-15T09:30:00Z,192.168.1.10,149.99
2,Bob Smith,[email protected],2024-02-20T14:15:00Z,10.0.0.42,2500.00
3,Carol Wu,[email protected],2024-03-08T11:00:00Z,172.16.0.1,89.50
4,Dan Reeves,[email protected],2024-04-12T16:45:00Z,192.168.0.5,1200.00
5,Eve Nakamura,[email protected],2024-05-01T08:00:00Z,10.10.10.1,340.75This dataset has a mix of types: names, emails, timestamps, IP addresses, and numeric amounts.
Step 2: Query with DuckDB
Open a DuckDB shell and explore the data:
-- Start DuckDB
duckdb
-- Load and inspect
SELECT * FROM 'contacts.csv';
-- Aggregate query
SELECT
count(*) AS total_contacts,
avg(amount) AS avg_amount,
min(created_at) AS earliest,
max(created_at) AS latest
FROM 'contacts.csv';Expected output:
┌─────────────────┬────────────┬──────────────────────┬──────────────────────┐
│ total_contacts │ avg_amount │ earliest │ latest │
│ int64 │ double │ varchar │ varchar │
├─────────────────┼────────────┼──────────────────────┼──────────────────────┤
│ 5 │ 856.05 │ 2024-01-15T09:30:00Z │ 2024-05-01T08:00:00Z │
└─────────────────┴────────────┴──────────────────────┴──────────────────────┘DuckDB reads the CSV and lets you query it immediately — no schema definition needed.
Step 3: Profile with FineType
Now let's see what FineType detects in each column:
finetype profile -f contacts.csvExpected output:
Column Type Confidence
────────────── ──────────────────────────────── ──────────
id representation.numeric.increment 0.95
name identity.person.full_name 0.92
email identity.person.email 0.99
created_at datetime.timestamp.iso_8601 0.98
ip_address technology.internet.ip_v4 0.97
amount representation.numeric.decimal 0.94FineType identifies semantic types beyond what SQL type inference gives you — it distinguishes emails from strings, IP addresses from text, and ISO timestamps from generic dates.
Step 4: Classify Individual Values
You can also classify single values:
finetype infer -i "[email protected]"
# → identity.person.email
finetype infer -i "192.168.1.10"
# → technology.internet.ip_v4
finetype infer -i "2024-01-15T09:30:00Z"
# → datetime.timestamp.iso_8601Each type maps to a DuckDB SQL expression that will parse the value correctly — profile first, then cast with confidence.
Step 5: Generate a Schema
FineType can turn its profile results into a DuckDB CREATE TABLE statement — every column typed, every cast guaranteed to succeed:
finetype schema-for -f contacts.csvExpected output:
CREATE TABLE contacts (
id BIGINT, -- increment
name VARCHAR, -- full_name
email VARCHAR, -- email
created_at TIMESTAMP, -- strptime(created_at, '%Y-%m-%dT%H:%M:%SZ')
ip_address VARCHAR, -- ip_v4 (INET)
amount DECIMAL(10,2) -- decimal
);You can use this DDL directly in DuckDB to create a typed table:
duckdb
-- Paste the CREATE TABLE statement, then load data
COPY contacts FROM 'contacts.csv' (HEADER true);
-- Now query with proper types
SELECT name, amount, created_at
FROM contacts
WHERE created_at > TIMESTAMP '2024-03-01';This completes the workflow: discover your data with profile, type it with schema-for, then query with confidence.
Bonus: DuckDB + FineType Extension
If you have the FineType DuckDB extension installed, you can classify directly in SQL:
INSTALL finetype FROM community;
LOAD finetype;
SELECT
column_name,
finetype(value) AS semantic_type
FROM 'contacts.csv';What's Next?
Keep exploring
Dive deeper into the tools.