Noonnoon

Getting Started

Install FineType and DuckDB, then profile your first dataset.

This guide walks you through installing FineType and DuckDB, then profiling your first dataset with semantic type detection.

Prerequisites

FineType and DuckDB run on macOS, Linux, and Windows (via WSL).

ToolPurposeRequired
DuckDBIn-memory analytical SQL engineYes
FineTypeSemantic type detectionYes

You'll need a terminal and a package manager. The examples below use Homebrew (macOS/Linux), but alternatives are listed for each tool.

Install the Tools

1. DuckDB

DuckDB is a fast, embeddable SQL engine for analytics. It reads CSV, JSON, and Parquet natively.

# macOS
brew install duckdb

# Linux (apt)
sudo apt install duckdb

# Windows
winget install DuckDB.cli

Verify:

duckdb --version

2. FineType

FineType classifies text into 209 semantic types with a character-level neural network.

# macOS (Homebrew)
brew install noon-org/tap/finetype

# Any platform with Rust
cargo install finetype-cli

Verify:

finetype --version

Your First Workflow

Let's profile a dataset: create a CSV, query it with DuckDB, then detect the column types with FineType.

Step 1: Create a Sample CSV

Save this as contacts.csv:

id,name,email,created_at,ip_address,amount
1,Alice Chen,[email protected],2024-01-15T09:30:00Z,192.168.1.10,149.99
2,Bob Smith,[email protected],2024-02-20T14:15:00Z,10.0.0.42,2500.00
3,Carol Wu,[email protected],2024-03-08T11:00:00Z,172.16.0.1,89.50
4,Dan Reeves,[email protected],2024-04-12T16:45:00Z,192.168.0.5,1200.00
5,Eve Nakamura,[email protected],2024-05-01T08:00:00Z,10.10.10.1,340.75

This dataset has a mix of types: names, emails, timestamps, IP addresses, and numeric amounts.

Step 2: Query with DuckDB

Open a DuckDB shell and explore the data:

-- Start DuckDB
duckdb

-- Load and inspect
SELECT * FROM 'contacts.csv';

-- Aggregate query
SELECT
    count(*) AS total_contacts,
    avg(amount) AS avg_amount,
    min(created_at) AS earliest,
    max(created_at) AS latest
FROM 'contacts.csv';

Expected output:

┌─────────────────┬────────────┬──────────────────────┬──────────────────────┐
│ total_contacts  │ avg_amount │      earliest        │       latest         │
│      int64      │   double   │      varchar         │      varchar         │
├─────────────────┼────────────┼──────────────────────┼──────────────────────┤
│               5 │     856.05 │ 2024-01-15T09:30:00Z │ 2024-05-01T08:00:00Z │
└─────────────────┴────────────┴──────────────────────┴──────────────────────┘

DuckDB reads the CSV and lets you query it immediately — no schema definition needed.

Step 3: Profile with FineType

Now let's see what FineType detects in each column:

finetype profile -f contacts.csv

Expected output:

Column          Type                              Confidence
──────────────  ────────────────────────────────  ──────────
id              representation.numeric.increment  0.95
name            identity.person.full_name         0.92
email           identity.person.email             0.99
created_at      datetime.timestamp.iso_8601       0.98
ip_address      technology.internet.ip_v4         0.97
amount          representation.numeric.decimal    0.94

FineType identifies semantic types beyond what SQL type inference gives you — it distinguishes emails from strings, IP addresses from text, and ISO timestamps from generic dates.

Step 4: Classify Individual Values

You can also classify single values:

finetype infer -i "[email protected]"
# → identity.person.email

finetype infer -i "192.168.1.10"
# → technology.internet.ip_v4

finetype infer -i "2024-01-15T09:30:00Z"
# → datetime.timestamp.iso_8601

Each type maps to a DuckDB SQL expression that will parse the value correctly — profile first, then cast with confidence.

Step 5: Generate a Schema

FineType can turn its profile results into a DuckDB CREATE TABLE statement — every column typed, every cast guaranteed to succeed:

finetype schema-for -f contacts.csv

Expected output:

CREATE TABLE contacts (
    id          BIGINT,        -- increment
    name        VARCHAR,       -- full_name
    email       VARCHAR,       -- email
    created_at  TIMESTAMP,     -- strptime(created_at, '%Y-%m-%dT%H:%M:%SZ')
    ip_address  VARCHAR,       -- ip_v4 (INET)
    amount      DECIMAL(10,2)  -- decimal
);

You can use this DDL directly in DuckDB to create a typed table:

duckdb

-- Paste the CREATE TABLE statement, then load data
COPY contacts FROM 'contacts.csv' (HEADER true);

-- Now query with proper types
SELECT name, amount, created_at
FROM contacts
WHERE created_at > TIMESTAMP '2024-03-01';

This completes the workflow: discover your data with profile, type it with schema-for, then query with confidence.

Bonus: DuckDB + FineType Extension

If you have the FineType DuckDB extension installed, you can classify directly in SQL:

INSTALL finetype FROM community;
LOAD finetype;

SELECT
    column_name,
    finetype(value) AS semantic_type
FROM 'contacts.csv';

What's Next?

Keep exploring

Dive deeper into the tools.

On this page