Skip to content

DNA Sequence

Type Registry

DNA Sequence

representation.scientific.dna_sequence

Deoxyribonucleic acid sequence in IUPAC notation (letters: A, T, G, C, and ambiguity codes).

Domain representationscientific
Casts to VARCHAR
Scope Universal

Try it

CLI
$ finetype infer -i "ATGCAGC"
→ representation.scientific.dna_sequence

DuckDB

Detect
SELECT finetype('ATGCAGC');
-- → 'representation.scientific.dna_sequence'
Cast expression
UPPER(CAST({col} AS VARCHAR))
Safe cast pipeline
-- Normalise and cast in one step
SELECT TRY_CAST(finetype_cast(my_column) AS VARCHAR) AS clean_value
FROM my_table
WHERE finetype(my_column) = 'representation.scientific.dna_sequence';

Struct Expansion

gc_content: CAST(REGEXP_COUNT({col}, '[GC]') AS DOUBLE) / LENGTH({col})
length: LENGTH({col})

JSON Schema

finetype schema representation.scientific.dna_sequence
{
  "$id": "https://noon.sh/schemas/representation.scientific.dna_sequence",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Deoxyribonucleic acid sequence in IUPAC notation (letters: A, T, G, C, and ambiguity codes).",
  "examples": [
    "ATGCAGC",
    "GCTAGCTAGCTAG",
    "ATGATGATG"
  ],
  "pattern": "^[ATGCRYSWKMBDHVN]+$",
  "title": "DNA Sequence",
  "type": "string"
}

Examples

ATGCAGCGCTAGCTAGCTAGATGATGATG

Also known as

dna

Types in representation