DuckLake: First Impressions & Getting Started with DuckDB's Database-Managed Lakehouse
Sagara

Sagara @sagara

About: Data stack enthusiasm/Snowflake Data Superhero 2022-2025/Classmethod,inc (https://classmethod.jp/english/)

Location:
Japan
Joined:
Mar 6, 2025

DuckLake: First Impressions & Getting Started with DuckDB's Database-Managed Lakehouse

Publish Date: May 28
2 1

※This is an English translation of the original Japanese article:https://dev.classmethod.jp/articles/ducklake-database-based-approach-lakehouse/

Hi, this is Sagara.

A new Lakehouse format, "DuckLake," where metadata management is handled by the database, was announced on DuckDB's official blog.

https://duckdb.org/2025/05/27/ducklake

In this article, I'll briefly introduce what DuckLake is and summarize my experience trying it out locally.

What is DuckLake?

First, DuckLake is an open-source Lakehouse format.

Recognizing the complexity of file-based metadata management in recent formats like Iceberg and Delta Lake, DuckLake was implemented with the approach of handling the entire metadata management layer, including what is known as the catalog layer in Iceberg, using an SQL database.

https://duckdb.org/2025/05/27/ducklake

The blog post at the link above mentions the following four benefits of DuckLake:

  • Simplicity
    • To run DuckLake on a laptop, you just need to install DuckDB and use the DuckLake extension (in this case, DuckDB's local file handles catalog and metadata management).
    • No Avro or JSON files; everything can be controlled with SQL.
  • Scalability
    • An architecture that separates storage, compute, and metadata management.
  • Speed
    • Unlike traditional Open Table Formats, file I/O is not required.
    • Reduces the number of files written for small changes and can handle concurrent modifications.
  • Features
    • Operations are SQL-based, enabling ACID-compliant transactions, adding/deleting columns, and changing data types.
    • Data and delete files written to storage by DuckLake are compatible with Iceberg, allowing for metadata-only migration.
    • DuckLake's compute nodes have been simultaneously released as a DuckDB extension (available from DuckDB v1.3.0).

The official website and GitHub repository can be accessed from the following links:

https://ducklake.select/

https://github.com/duckdb/ducklake

Trying it Out

I'll briefly try out DuckLake in a local environment.

I'll follow the official documentation provided below.

https://ducklake.select/docs/stable/duckdb/introduction

Test Environment

  • Windows 11
  • DuckDB v1.3.0
    • Launched via an .exe file, following this blog post (link to Japanese article).

Installation

First, install DuckLake.

INSTALL ducklake;
Enter fullscreen mode Exit fullscreen mode

Create a New DuckLake Database

Execute the following command to create a new DuckLake database. If executed without specifying a path, the DuckDB database file will be created in the location where the .exe file was launched.

ATTACH 'ducklake:my_ducklake.ducklake' AS my_ducklake;
Enter fullscreen mode Exit fullscreen mode

2025-05-28_14h27_35

Create a New Table

Execute the following command to create a table based on a publicly available CSV file.

USE my_ducklake;

CREATE TABLE nl_train_stations AS
    FROM 'https://blobs.duckdb.org/nl_stations.csv';
Enter fullscreen mode Exit fullscreen mode

At this stage, a folder is added in the same directory where the DuckDB database file was created. Looking inside the folder, you can see that Parquet files have been created.

2025-05-28_14h30_53

2025-05-28_14h31_53

You can also view the contents of the Parquet files with the following query.

FROM glob('my_ducklake.ducklake.files/*');
FROM 'my_ducklake.ducklake.files/*.parquet' LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

2025-05-28_14h33_06

Executing an UPDATE Operation on the Table

Next, let's examine the behavior when an UPDATE operation is performed on the table.

Executing the following query will run an UPDATE statement that changes a station name.

UPDATE nl_train_stations SET name_long='Johan Cruijff ArenA' WHERE code = 'ASB';
SELECT name_long FROM nl_train_stations WHERE code = 'ASB';
Enter fullscreen mode Exit fullscreen mode

2025-05-28_14h44_20

Then, new Parquet files are generated in the folder holding data for this DuckLake. You can check the list of files with the following query.

Two new Parquet files have been added. Since an UPDATE was performed, we can see that a Parquet file holding the new records and a Parquet file representing the old (deleted) records (marked with a -delete suffix) have been added.

FROM glob('my_ducklake.ducklake.files/*');
Enter fullscreen mode Exit fullscreen mode

2025-05-28_14h46_25

Like other Open Table Formats, DuckLake records Snapshots to retain historical versions. In DuckLake, these Snapshots are recorded in a table, and you can check their contents with the following query.

FROM my_ducklake.snapshots();
Enter fullscreen mode Exit fullscreen mode

2025-05-28_14h49_57

Furthermore, you can use Snapshots to query the table at a specific point in time. In the following query, VERSION => 1 allows you to see the data before the UPDATE, and VERSION => 2 shows the data after the UPDATE.

SELECT name_long FROM nl_train_stations AT (VERSION => 1) WHERE code = 'ASB';
SELECT name_long FROM nl_train_stations AT (VERSION => 2) WHERE code = 'ASB';
Enter fullscreen mode Exit fullscreen mode

2025-05-28_14h51_47

Finally

Although this was a brief overview, I've introduced DuckLake and tried it out following the official documentation.

The simple architecture of "Parquet files on storage, with all metadata including Snapshots managed by the database" is excellent, as it allows everything to be inspected via SQL, just as advertised!

I'm excited to see how DuckLake will be integrated not only into MotherDuck but also into various other products in the future!

Comments 1 total

  • Fritz Larco
    Fritz LarcoJun 22, 2025

    Absolutely! For easily ingesting data into Ducklake, check out sling. There is a CLI as well as a Python interface.

Add comment