< blog
9 min read

Snowflake Data Types: Practical Cheat Sheet, Aliases, and VARIANT Examples

One query cost $5K. Yours could be next - or you could see it coming.

Four-phase diagram illustrating the Snowflake data type optimization journey, from alias unification to cost reduction.

Most Snowflake data type guides are dictionaries. They list the types and describe the syntax, but stop there. That is fine for orientation. It does not help an engineer who is staring at a slow query and wondering whether the VARCHAR/STRING mix in the DDL is causing it, or whether a VARIANT column that was supposed to be a “temporary landing zone” six months ago is now consuming warehouse credits.

This guide is for that engineer. It covers the aliases that actually matter, how to inspect live tables, where TYPEOF fits into debugging workflows, and, critically, how data type decisions connect to Snowflake cost.

Table of Contents

  1. TL;DR — Key Takeaways
  2. Snowflake Data Types Cheat Sheet
  3. Understanding Snowflake String Aliases
  4. Core Data Type Families That Matter in Real Projects
  5. Semi-Structured Types: VARIANT, OBJECT, and ARRAY
  6. How to Check Data Types in Snowflake
  7. Using TYPEOF on VARIANT Values
  8. How to Choose the Right Snowflake Data Type
  9. The Gap Between Schema Design and Snowflake Cost
  10. FAQ

TL;DR — Key Takeaways

  • STRING, TEXT, NVARCHAR, NVARCHAR2, CHAR VARYING, and CHARACTER VARYING are all aliases for VARCHAR. There is no performance difference between them.
  • Pick one naming convention for DDL and enforce it across the team. Mixed aliases create confusion where no technical difference exists.
  • Use INFORMATION_SCHEMA.COLUMNS to inspect live column types. Use TYPEOF() to inspect the actual value stored inside a VARIANT column.
  • VARIANT is the right call at landing time. It is a poor long-term choice for columns that appear repeatedly in joins, filters, and dashboards.
  • Unmanaged VARIANT bloat is one of the most common causes of unexplained Snowflake spend growth. Standard query monitors will not catch it; cost attribution tooling will.
Want to get the most out of your data?

Contact us for a free assessment and no commitment.

Start saving today

Snowflake Data Types Cheat Sheet

The table below maps each data type family to its aliases, a practical DDL example, and the use case it is built for.

Family Native Type Common Aliases DDL Example Best For
Text VARCHAR STRING, TEXT, NVARCHAR, NVARCHAR2, CHAR VARYING, CHARACTER VARYING name VARCHAR Names, IDs, free-form values
Exact Numeric NUMBER INT, INTEGER, BIGINT, SMALLINT, DECIMAL, NUMERIC count NUMBER(10,0) Counts, primary keys, and financial figures
Approximate Numeric FLOAT DOUBLE, REAL, FLOAT4, FLOAT8 temperature FLOAT Scientific values, approximate math
Boolean BOOLEAN BOOL is_active BOOLEAN Binary flags, true/false states
Timestamp (no TZ) TIMESTAMP_NTZ DATETIME event_time TIMESTAMP_NTZ Warehouse-native event timestamps
Timestamp (with TZ) TIMESTAMP_TZ created_at TIMESTAMP_TZ Cross-system timestamps requiring an offset
Timestamp (local TZ) TIMESTAMP_LTZ updated_at TIMESTAMP_LTZ Session-timezone-aware values
Date DATE order_date DATE Calendar dates without time
Semi-Structured VARIANT payload VARIANT JSON payloads, dynamic schemas
Semi-Structured OBJECT metadata OBJECT Named key-value pairs
Semi-Structured ARRAY tags ARRAY Ordered lists
Binary BINARY VARBINARY hash BINARY Raw byte values

For the full official reference, see Snowflake’s data type documentation.


Understanding Snowflake String Aliases

Side-by-side comparison showing the misconception of multiple string types versus the reality of a single unified type.

Teams land on this question when they inherit a codebase that uses VARCHAR in some tables, STRING in others, and TEXT in a third. The natural assumption is that someone made a deliberate modeling decision. Usually, they did not.

In Snowflake, STRING, TEXT, NVARCHAR, NVARCHAR2, CHAR VARYING, and CHARACTER VARYING all map to the same underlying type. Choosing STRING over VARCHAR does not change storage behavior, query execution speed, or column semantics. The only thing a mixed naming style does is create doubt in code reviews.

The practical solution is a team house rule, not a technical one. Most teams standardize on VARCHAR in DDL files and allow STRING in ad hoc SQL and discussions. What matters is picking one and sticking to it.

— These four columns are identical to Snowflake
CREATE TABLE users (
id VARCHAR,
full_name STRING,
email TEXT,
username NVARCHAR
);

Snowflake stores all four as the same text family. The column names are the only difference.

Core Data Type Families That Matter in Real Projects

Production design work concentrates on five families. Here is how to think about each.

Text

Use VARCHAR. Standardize the alias. Max length defaults to 16,777,216 characters — Snowflake does not charge extra for unused length in a VARCHAR column, so padding is not a concern.

Exact Numeric

Use NUMBER(precision, scale) for anything financial. Floating-point types introduce rounding behavior that causes reconciliation headaches. If a column ever appears in a billing total or a regulatory report, it deserves exact numeric treatment.

Approximate Numeric

FLOAT and its aliases (DOUBLE, REAL) are appropriate for scientific measurements, analytics averages, and values where a fractional rounding error in the fifteenth decimal place does not matter.

Temporal

Timestamps become a source of cross-team confusion when teams mix TIMESTAMP_NTZ, TIMESTAMP_LTZ, and TIMESTAMP_TZ without a documented policy. The simplest rule: use TIMESTAMP_NTZ for warehouse-native event time (no timezone stored), and TIMESTAMP_TZ only when offset preservation across external systems is an explicit requirement.

Boolean

Use BOOLEAN for binary flags. Storing ‘yes’ and ‘no’ as VARCHAR is a pattern that resurfaces as a cast in every downstream query.

Teams that care about warehouse performance should also connect type choices to spending. Wide text columns, repeated casts on joins, and bloated semi-structured payloads drive heavier scan work, which translates directly into more Snowflake credits consumed per query run.

Semi-Structured Types: VARIANT, OBJECT, and ARRAY

VARIANT

VARIANT is Snowflake’s flexible container for semi-structured data. A single VARIANT column can hold strings, numbers, objects, arrays, or null — row by row.

— A simple VARIANT landing table
CREATE TABLE raw_events (
event_id VARCHAR,
received_at TIMESTAMP_NTZ,
payload VARIANT
);
— Insert a JSON payload directly
INSERT INTO raw_events
SELECT
UUID_STRING(),
CURRENT_TIMESTAMP(),
PARSE_JSON(‘{“user_id”: 1001, “action”: “checkout”, “items”: 3}’);

VARIANT is the right call when source payload shapes still change, when you need to move fast, or when you are building a landing zone before modeling decisions stabilize. It is not the right final answer for columns that show up repeatedly in joins, filters, and dashboards.

For more on Snowflake’s semi-structured types, see the official semi-structured data documentation.

OBJECT

OBJECT stores named key-value pairs. It often appears nested inside VARIANT columns from API payloads and event streams.

ARRAY

ARRAY stores ordered, indexed lists. Like OBJECT, it is common inside VARIANT data from application logs and streaming sources.

A practical rule: if the payload comes from a source you do not fully control yet, land it as VARIANT. Once your business logic stabilizes and the same five fields appear in every join and filter, cast those fields into typed columns in your serving model.

How to Check Data Types in Snowflake

The cleanest way to inspect live column types is INFORMATION_SCHEMA.COLUMNS.

— Check all column types in a specific table
SELECT
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
FROM information_schema.columns
WHERE table_schema = ‘ANALYTICS’
AND table_name = ‘ORDERS’
ORDER BY ordinal_position;

The data_type field returns the canonical Snowflake type — TEXT for any VARCHAR alias, FIXED for exact numerics, REAL for floats. It settles modeling arguments with evidence rather than memory.

For account-wide audits — useful during cost reviews or migration work — pair the query with schema and database filters to avoid pulling noise from every environment in the account.

— Account-wide type audit filtered to a specific database
SELECT
table_catalog,
table_schema,
table_name,
column_name,
data_type
FROM snowflake.account_usage.columns
WHERE table_catalog = ‘PROD_DB’
AND data_type = ‘VARIANT’
ORDER BY table_schema, table_name;

That second query is particularly useful for tracking VARIANT spread across a warehouse — a precursor to the cost problems described in the next section.

Using TYPEOF on VARIANT Values

INFORMATION_SCHEMA.COLUMNS tells you the declared column type. It does not tell you what is actually stored inside a VARIANT column at the row level. That is where TYPEOF() comes in.

A VARIANT column can contain a string in one row, an array in the next, and an object in a third. TYPEOF() inspects the value rather than the schema.

— Inspect the actual type of values stored in a VARIANT column
SELECT
event_id,
TYPEOF(payload) AS payload_type,
TYPEOF(payload:user_id) AS user_id_type,
TYPEOF(payload:items) AS items_type
FROM raw_events
LIMIT 20;

Typical return values include VARCHAR, ARRAY, OBJECT, BOOLEAN, and DECIMAL. Snowflake’s TYPEOF documentation notes that an integer-looking value can still return DECIMAL depending on how the value was stored and presented — worth knowing during contract debugging and ingestion audits.

TYPEOF() is most valuable when a source system stops behaving as promised. If a column that was always VARCHAR starts returning ARRAY, that pattern surfaces early in ingestion monitoring before it breaks downstream models.

How to Choose the Right Snowflake Data Type: A Step-by-Step Guide

Five-step icon flowchart guiding the selection of Snowflake data types based on query behavior and cost efficiency.

Step 1: Start with query behavior, not source shape

A source API can send everything as JSON strings. That does not mean the warehouse model should store everything as semi-structured forever. Pick types based on the filters, joins, and aggregations the downstream team will actually run.

Step 2: Lock exact math early

Money, inventory counts, billing units, and quota tracking deserve exact numeric types. Fixing approximate math after dashboards have shipped requires rebuilding trust as well as the model.

Step 3: Make a timestamp decision and write it down

Pick a default timestamp type for warehouse models and document it. Teams that mix timezone-aware and timezone-free types without a policy create pain for every analyst and BI tool that consumes the data.

Step 4: Use VARIANT as a landing zone, not a permanent model

VARIANT handles intake. Serving models benefit from typed columns for any field that appears in joins, filters, charts, and metrics. Cast early, not under pressure.

Step 5: Review type choices during cost reviews

Type sprawl shows up in warehouse bills before it shows up in incident reports. Repeated casts, heavy scan patterns on wide text columns, and VARIANT-heavy serving models are all cost signals worth catching in a regular optimization cycle — not during a quarterly budget conversation.


The Gap Between Schema Design and Snowflake Cost

The most expensive pattern in Snowflake data type management is not a wrong alias. It is VARIANT used as a permanent serving layer. Teams land raw payloads, plan to clean them later, and then build the downstream warehouse around repeated path extractions and implicit casts. Queries grow heavier month by month. Storage grows. Ownership gets unclear.

Standard query performance tools can confirm that a query is slow. They do not connect that slowness to a VARIANT-heavy design pattern, attribute the cost to a specific team or pipeline, or show the cumulative spend trend over time. That visibility requires cost attribution across the full data stack.

Seemore surfaces exactly that: how much a given data design decision costs in Snowflake compute and storage, which pipelines are driving it, and what the fastest path to resolution looks like. It is not a replacement for the type decisions above. It is what makes those decisions visible at scale.

Is data design driving up your Snowflake bill?

We can find the issues and drive those costs down

Let's talk

FAQ

Is STRING an alias for VARCHAR in Snowflake?

Yes. Snowflake treats STRING, TEXT, NVARCHAR, NVARCHAR2, CHAR VARYING, and CHARACTER VARYING as aliases for VARCHAR. They share the same storage, speed, and column behavior. The only meaningful difference is naming consistency within a team’s DDL standards.

Is there any performance difference between VARCHAR and TEXT in Snowflake?

No. Choosing TEXT over VARCHAR does not affect storage usage, scan speed, or query execution. The benefit of picking one over the other is purely about team consistency and reducing unnecessary decision-making during code review.

How do I check a column’s data type in Snowflake?

Query INFORMATION_SCHEMA.COLUMNS and filter by table_schema and table_name. The data_type field returns the canonical stored type for each column. For account-wide audits, use SNOWFLAKE.ACCOUNT_USAGE.COLUMNS with database and schema filters.

What does TYPEOF return for a VARIANT value?

TYPEOF() returns the type of the value stored inside a VARIANT column at the row level — common return values include VARCHAR, ARRAY, OBJECT, BOOLEAN, and DECIMAL. It is the right tool when a single column stores mixed payload shapes or when source system behavior needs to be audited during ingestion debugging.

When should I keep data in VARIANT instead of casting it to typed columns?

Use VARIANT during landing and when payload shapes are still changing often. Once the same fields appear reliably in joins, filters, and dashboards, cast them into typed columns in serving models. Typed serving models are cheaper to scan, easier to govern, and cleaner to hand off between teams.

How do data type choices affect Snowflake cost?

Bloated VARIANT usage, repeated casts in queries, and oversized text columns all drive heavier scan work — which means more Snowflake credits consumed per query. Type cleanup is one of the most predictable levers for reducing compute spend, particularly in warehouses that have grown organically over time without regular modeling audits.

Should you migrate to Gen2?
A data engineer, sitting in a modern high-rise office with a city skyline visible through large windows, is deep in thought. In front of him, a holographic interface displays a comparison between Gen1 and Gen2 Snowflake Warehouses. On the left, Gen1 Snowflake Warehouse is represented by stacked layers, implying its older architecture. On the right, Gen2 Snowflake Warehouse is shown with a cloud icon labeled with key features like
7 min read

Gen1 vs Gen2 Snowflake Warehouses : When Does the 25% Premium Pay Off?

22 min read

Snowflake Cost Optimization: Top 17 Techniques in 2025

Seemore Data $5,000 challenge blog thumbnail. A coral-colored browser frame displays the text
9 min read

Prove Your Snowflake Spend is Optimized with The Seemore Data $5,000 Challenge

Cool, now
what can you DO with this?

data ROI