151 Semantic Types
Classify across 6 domains: datetime, technology, identity, representation, geography, and container types.
FineType classifies strings into a rich taxonomy of 151 semantic types — each type is a transformation contract that guarantees a DuckDB cast expression will succeed.
$ finetype infer "192.168.1.1"technology.internet.ip_v4
$ finetype infer "2024-01-15T10:30:00Z"datetime.timestamp.iso_8601
$ finetype infer "[email protected]"identity.person.emailGo beyond primitive types. Detect the real meaning of your data.
151 Semantic Types
Classify across 6 domains: datetime, technology, identity, representation, geography, and container types.
Transformation Contracts
Every prediction maps to a DuckDB SQL expression guaranteed to parse successfully. Types you can act on.
Pure Rust, No Python
Built with Candle ML. 600+ classifications/sec, 8.5 MB memory, 66ms cold start. Ship without a runtime.
FineType recognizes 151 types across 6 domains:
| Domain | Types | Examples |
|---|---|---|
datetime | 46 | ISO 8601, RFC 2822, Unix timestamps, timezones |
technology | 34 | IPv4, IPv6, MAC addresses, URLs, UUIDs, hashes |
identity | 25 | Names, emails, phone numbers, passwords |
representation | 19 | Integers, floats, booleans, hex colors, base64, JSON |
geography | 16 | Latitude, longitude, countries, cities, postal codes |
container | 11 | JSON objects, CSV rows, query strings, key-value pairs |
Label format: {domain}.{category}.{type} — e.g., technology.internet.ip_v4. Locale-specific types append a suffix: identity.person.phone_number.EN_AU.
FineType provides 9 commands covering the full ML pipeline:
# Classify a single valuefinetype infer -i "bc89:60a9:23b8:c1e9:3924:56de:3eb1:3b90"
# Classify from file (one value per line), JSON outputfinetype infer -f data.txt --output json
# Column-mode inference (distribution-based disambiguation)finetype infer -f column_values.txt --mode column
# Profile a CSV file — detect column typesfinetype profile -f data.csv
# Generate synthetic training datafinetype generate --samples 1000 --output training.ndjson
# Train a CharCNN modelfinetype train --data data/train.ndjson --epochs 10 --batch-size 64
# Evaluate model accuracyfinetype eval --data data/test.ndjson --model models/char-cnn-v2
# Evaluate on GitTables benchmarkfinetype eval-gittables --dir eval/gittables
# Validate data quality against taxonomy schemasfinetype validate -f data.ndjson --strategy quarantineSingle-value classification can be ambiguous: is 01/02/2024 a US date (Jan 2) or EU date (Feb 1)? Is 1995 a year, postal code, or plain number?
Column-mode analyzes the distribution of values in a column and applies disambiguation rules:
# CLI column-modefinetype infer -f column_values.txt --mode column
# CSV profiling (uses column-mode automatically)finetype profile -f data.csv-- Install and loadINSTALL finetype FROM community;LOAD finetype;
-- Classify a single valueSELECT finetype('192.168.1.1');-- → 'technology.internet.ip_v4'
-- Detailed output (type, confidence, DuckDB broad type)SELECT finetype_detail(value) FROM my_table;-- → '{"type":"datetime.date.us_slash","confidence":0.98,"broad_type":"DATE"}'
-- Normalize values for safe TRY_CASTSELECT finetype_cast(value) FROM my_table;
-- Recursively classify JSON fieldsSELECT finetype_unpack(json_col) FROM my_table;
-- Check extension versionSELECT finetype_version();The extension embeds model weights at compile time — no external files needed.
| Model | Accuracy | Test Samples |
|---|---|---|
| Flat CharCNN v2 | 91.97% | 15,100 |
Post-processing rules improve Macro F1 from 87.9% to 90.8% (+2.9 points without retraining).
Evaluated against 2,363 annotated columns from 883 real-world CSV tables:
| Type Category | Accuracy | Example Types |
|---|---|---|
| Timestamps | 100% | datetime.timestamp.* |
| Country names | 100% | geography.location.country |
| URLs | 89.7% | technology.internet.url |
| Dates | 88.2% | datetime.date.* |
| Person names | 80-85% | identity.person.* |
Column-mode inference improves accuracy for ambiguous types: geography +9.7%, datetime +4.8%.
| Metric | Value |
|---|---|
| Model load | 66 ms cold, 25-30 ms warm |
| Single inference | p50 = 26 ms, p95 = 41 ms |
| Batch throughput | 600-750 values/sec |
| Memory footprint | 8.5 MB peak RSS |
brew install noon-org/tap/finetypecargo install finetype-cligit clone https://github.com/noon-org/finetypecd finetypecargo build --release