版本

  • v1.3-ossivalis

背景

DuckDB 作为开源的嵌入式分析型数据库, 整体代码框架比较清晰,适合作为学习 AP 型数据库的项目。截止 (2025-05) 最新版是 v1.3-ossivalis, 我们编译一个 Debug 版本结合 GDB 调试工具来进行代码的阅读和分析。

1
2
3
4
5
6
7
8
9
10
git clone -b v1.3-ossivalis https://github.com/duckdb/duckdb

GEN=ninja make debug -j 8

./duckdb
v1.3.0-dev2068 022f826ddf
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D

SQL Parser

DuckDB 使用 C++ 编写, 在 SQL 的解析优化执行等阶段都使用了模块化, SQL Parse 阶段调用了 Postgres 的 Parser 来进行 SQL 的解析, 完成后使用 Transformer 将 PostgreSQL 的解析树转换为 DuckDB 的语法树:

1
2
3
4
5
6
7
8
9
Parser parser(db->con->context->GetParserOptions());

/* parser.ParseQuery() 会解析 SQL 语句, 将 PG 格式的 parse tree 转为 DuckDB 的 tree. */
parser.ParseQuery(query);

/* ... */

/* SQL 的 binder, 逻辑计划生成, 调用优化器进行查询计划的优化, 物理计划生成. */
auto pending = db->con->PendingQuery(std::move(statements.back()), false);

Binder

DuckDB 选择将 Binder 操作和逻辑计划的生成进行耦合, 在 class Planner(逻辑计划生成模块)中完成 binder 操作.

Binder 操作会针对 SQL 语句进行检查, 例如 SELECT 中查询的字段是否存在, WHERE 条件中是否合法等. Binder 操作对 QueryNode 进行绑定后会生成 BoundQueryNode, 再进行逻辑计划的生成:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
BoundStatement Binder::Bind(QueryNode &node) {
BoundStatement result;
if (node.type != QueryNodeType::CTE_NODE && // Issue #13850 - Don't auto-materialize if users materialize (for now)
!Optimizer::OptimizerDisabled(context, OptimizerType::MATERIALIZED_CTE) && context.config.enable_optimizer &&
OptimizeCTEs(node)) {
switch (node.type) {
case QueryNodeType::SELECT_NODE:
result = BindWithCTE(node.Cast<SelectNode>());
break;
case QueryNodeType::RECURSIVE_CTE_NODE:
result = BindWithCTE(node.Cast<RecursiveCTENode>());
break;
case QueryNodeType::CTE_NODE:
result = BindWithCTE(node.Cast<CTENode>());
break;
default:
D_ASSERT(node.type == QueryNodeType::SET_OPERATION_NODE);
result = BindWithCTE(node.Cast<SetOperationNode>());
break;
}
} else {
/* 进行 binder 操作. */
auto bound_node = BindNode(node);

result.names = bound_node->names;
result.types = bound_node->types;

/* 逻辑计划生成. */
result.plan = CreatePlan(*bound_node);
}

return result;
}

Logical Plan 逻辑计划

我们创建一个测试表,并写入数据, 根据不同的 SQL 语句来动态调试来梳理 Logical Plan 的生成过程.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
D CREATE TABLE test_table (
id INTEGER PRIMARY KEY,
name VARCHAR,
age INTEGER,
is_active BOOLEAN,
created_at TIMESTAMP,
salary DECIMAL(10, 2),
tags TEXT[],
category VARCHAR,
UNIQUE(name, category)
);

D INSERT INTO test_table (id, name, age, is_active, created_at, salary, tags, category)
VALUES
(1, 'Alice', 30, TRUE, '2024-01-01 10:00:00', 50000.00, ARRAY['dev', 'manager'], 'A'),
(2, 'Bob', 25, FALSE, '2024-01-02 11:00:00', 45000.00, ARRAY['designer'], 'B'),
(3, 'Charlie', NULL, TRUE, '2024-01-03 12:00:00', 60000.00, NULL, 'A'),
(4, 'David', 40, TRUE, '2024-01-04 13:00:00', 70000.00, ARRAY['dev', 'lead'], 'C'),
(5, 'Eve', 35, FALSE, '2024-01-05 14:00:00', 55000.00, ARRAY['marketing'], 'B'),
(6, 'Frank', 28, TRUE, '2024-01-06 15:00:00', 48000.00, ARRAY['dev'], 'A'),
(7, 'Grace', 22, TRUE, '2024-01-07 16:00:00', 40000.00, NULL, 'C'),
(8, 'Hank', 33, FALSE, '2024-01-08 17:00:00', 49000.00, ARRAY['designer'], 'B'),
(9, 'Ivy', 27, TRUE, '2024-01-09 18:00:00', 52000.00, ARRAY['dev'], 'A'),
(10, 'Jack', 31, TRUE, '2024-01-10 19:00:00', 58000.00, ARRAY['lead'], 'C');

逻辑计划是由一系列 LogicalOperator 组成的算子树, LogicalOperator 的结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class LogicalOperator {
public:
explicit LogicalOperator(LogicalOperatorType type);

LogicalOperator(LogicalOperatorType type, vector<unique_ptr<Expression>> expressions);

virtual ~LogicalOperator();

/* 当前 LogicalOperator 的类型. */
LogicalOperatorType type;

/* 子逻辑运算符. */
vector<unique_ptr<LogicalOperator>> children;

/* 当前逻辑运算符的表达式. */
vector<unique_ptr<Expression>> expressions;

/* ... */
};

LogicalOperator 的 children 存放的是其子逻辑运算符, expressions是当前逻辑运算符的表达式, 以上述表结构为例, 执行以下 SQL:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
D PRAGMA explain_output = 'all';
D EXPLAIN SELECT * FROM test_table WHERE age > 30;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Unoptimized Logical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ PROJECTION │
│ ──────────────────── │
│ Expressions: │
│ id │
│ name │
│ age │
│ is_active │
│ created_at │
│ salary │
│ tags │
│ category │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
FILTER
│ ──────────────────── │
│ Expressions: │
│(age > CAST(30 AS INTEGER))│
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ──────────────────── │
Table: test_table │
│ Type: Sequential Scan │
└───────────────────────────┘

上述 SQL 语句有 3 个 LogicalOperator, 分别是 duckdb::LogicalOperatorType::LOGICAL_PROJECTIONduckdb::LogicalOperatorType::LOGICAL_FILTER, duckdb::LogicalOperatorType::LOGICAL_GET.

  • LOGICAL_PROJECTION: 投影算子.

  • LOGICAL_FILTER: 过滤算子.

  • LOGICAL_GET: 属于 Data sources 算子, 从数据表中扫描数据.

其他的 LogicalOperator 定义在 LogicalOperator 定义

Logical Plan 生成

查询语句的逻辑计划生成是在Binder::CreatePlan(BoundSelectNode &statement) 中实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
unique_ptr<LogicalOperator> Binder::CreatePlan(BoundSelectNode &statement) {
unique_ptr<LogicalOperator> root;
D_ASSERT(statement.from_table);

/* 在通用查询中,物理存储表的 LogicalOperator 是 LogicalOperatorType::LOGICAL_GET,
* LOGICAL_GET 会在 SELECT 语句 binder 阶段就被生成. */
root = CreatePlan(*statement.from_table);
D_ASSERT(root);

/* 处理 samples 语句: DuckDB 支持采样语法
* SELECT * FROM tbl USING SAMPLE 10% (选择表大约 10% 的样本). */
if (statement.sample_options) {
root = make_uniq<LogicalSample>(std::move(statement.sample_options), std::move(root));
}

/* 处理 where 条件. */
if (statement.where_clause) {
root = PlanFilter(std::move(statement.where_clause), std::move(root));
}

/* 聚合算子. */
if (!statement.aggregates.empty() || !statement.groups.group_expressions.empty()) {
if (!statement.groups.group_expressions.empty()) {
// visit the groups
for (auto &group : statement.groups.group_expressions) {
PlanSubqueries(group, root);
}
}
// now visit all aggregate expressions
for (auto &expr : statement.aggregates) {
PlanSubqueries(expr, root);
}
// finally create the aggregate node with the group_index and aggregate_index as obtained from the binder
auto aggregate = make_uniq<LogicalAggregate>(statement.group_index, statement.aggregate_index,
std::move(statement.aggregates));
aggregate->groups = std::move(statement.groups.group_expressions);
aggregate->groupings_index = statement.groupings_index;
aggregate->grouping_sets = std::move(statement.groups.grouping_sets);
aggregate->grouping_functions = std::move(statement.grouping_functions);

aggregate->AddChild(std::move(root));
root = std::move(aggregate);
} else if (!statement.groups.grouping_sets.empty()) {
// edge case: we have grouping sets but no groups or aggregates
// this can only happen if we have e.g. select 1 from tbl group by ();
// just output a dummy scan
root = make_uniq_base<LogicalOperator, LogicalDummyScan>(statement.group_index);
}

/* 处理 having 语句, 使用 LogicalFilter 过滤算子. */
if (statement.having) {
PlanSubqueries(statement.having, root);
auto having = make_uniq<LogicalFilter>(std::move(statement.having));

having->AddChild(std::move(root));
root = std::move(having);
}

/* 处理窗口函数语句. */
if (!statement.windows.empty()) {
auto win = make_uniq<LogicalWindow>(statement.window_index);
win->expressions = std::move(statement.windows);
// visit the window expressions
for (auto &expr : win->expressions) {
PlanSubqueries(expr, root);
}
D_ASSERT(!win->expressions.empty());
win->AddChild(std::move(root));
root = std::move(win);
}

/* 处理 QUALIFY 语句, QUALIFY 是针对窗口函数结果的过滤. */
if (statement.qualify) {
PlanSubqueries(statement.qualify, root);
auto qualify = make_uniq<LogicalFilter>(std::move(statement.qualify));

qualify->AddChild(std::move(root));
root = std::move(qualify);
}

/* Unnesting 算子. */
for (idx_t i = statement.unnests.size(); i > 0; i--) {
auto unnest_level = i - 1;
auto entry = statement.unnests.find(unnest_level);
if (entry == statement.unnests.end()) {
throw InternalException("unnests specified at level %d but none were found", unnest_level);
}
auto &unnest_node = entry->second;
auto unnest = make_uniq<LogicalUnnest>(unnest_node.index);
unnest->expressions = std::move(unnest_node.expressions);
// visit the unnest expressions
for (auto &expr : unnest->expressions) {
PlanSubqueries(expr, root);
}
D_ASSERT(!unnest->expressions.empty());
unnest->AddChild(std::move(root));
root = std::move(unnest);
}

/* 处理投影列. */
for (auto &expr : statement.select_list) {
/* 如果有子查询. */
PlanSubqueries(expr, root);
}

/* 逻辑计划最上层为 LogicalProjection 投影算子. */
auto proj = make_uniq<LogicalProjection>(statement.projection_index, std::move(statement.select_list)); auto &projection = *proj;
proj->AddChild(std::move(root));
root = std::move(proj);

/* 处理 LIMIT, ORDER BY, DISTINCT 算子. */
root = VisitQueryNode(statement, std::move(root));

/* 处理需要剪枝的情况. */
if (statement.need_prune) {
D_ASSERT(root);
vector<unique_ptr<Expression>> prune_expressions;
for (idx_t i = 0; i < statement.column_count; i++) {
prune_expressions.push_back(make_uniq<BoundColumnRefExpression>(
projection.expressions[i]->return_type, ColumnBinding(statement.projection_index, i)));
}
auto prune = make_uniq<LogicalProjection>(statement.prune_index, std::move(prune_expressions));
prune->AddChild(std::move(root));
root = std::move(prune);
}

return root;
}

Logical Plan Optimizer 逻辑计划优化

逻辑计划完成生成后, 我们需要对逻辑计划展开优化:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

/* src/main/client_context.cpp */

shared_ptr<PreparedStatementData>
ClientContext::CreatePreparedStatementInternal(ClientContextLock &lock, const string &query,
unique_ptr<SQLStatement> statement,
optional_ptr<case_insensitive_map_t<BoundParameterData>> values) {
/* ... */

if (config.enable_optimizer && logical_plan->RequireOptimizer()) {
profiler.StartPhase(MetricsType::ALL_OPTIMIZERS);

/* 进行逻辑计划的优化. */
Optimizer optimizer(*logical_planner.binder, *this);
logical_plan = optimizer.Optimize(std::move(logical_plan));

D_ASSERT(logical_plan);
profiler.EndPhase();

#ifdef DEBUG
logical_plan->Verify(*this);
#endif
}

/* ... */
}

DuckDB 支持用户自定义优化规则, 自定义的优化规则会被执行两次.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
unique_ptr<LogicalOperator> Optimizer::Optimize(unique_ptr<LogicalOperator> plan_p) {
Verify(*plan_p);

this->plan = std::move(plan_p);

/* 用户自定义的优化规则. */
for (auto &pre_optimizer_extension : DBConfig::GetConfig(context).optimizer_extensions) {
RunOptimizer(OptimizerType::EXTENSION, [&]() {
OptimizerExtensionInput input {GetContext(), *this, pre_optimizer_extension.optimizer_info.get()};
if (pre_optimizer_extension.pre_optimize_function) {
pre_optimizer_extension.pre_optimize_function(input, plan);
}
});
}

/* Built-in 优化规则. */
RunBuiltInOptimizers();

/* 用户自定义的优化规则. */
for (auto &optimizer_extension : DBConfig::GetConfig(context).optimizer_extensions) {
RunOptimizer(OptimizerType::EXTENSION, [&]() {
OptimizerExtensionInput input {GetContext(), *this, optimizer_extension.optimizer_info.get()};
if (optimizer_extension.optimize_function) {
optimizer_extension.optimize_function(input, plan);
}
});
}

Planner::VerifyPlan(context, plan);

return std::move(plan);
}

DuckDB 内置了很多启发式规则, 其中内置的优化规则包括:

  • EXPRESSION_REWRITER: 表达式重写.

表达式重写是 Optimizer 初始化时构造函数里就填好的规则, 包括常量折叠, 分配律优化, CASE 语句简化等等.

  • FILTER_PULLUP: 谓词上拉.

我们以下面这个 SQL 为例, FILTER_PULLUP 优化会将 vals1.i=5 提至FILTER算子下方, 方便后续的谓词下推:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
D explain SELECT * FROM (SELECT * FROM vals1, vals2 WHERE vals1.i=5) as tbl1,
(SELECT * FROM vals1, vals2) as tbl2
WHERE tbl1.i=tbl2.i;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Unoptimized Logical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ PROJECTION │
│ ──────────────────── │
│ Expressions: │
│ i │
│ i_1 │
│ i │
│ i_1 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
FILTER
│ ──────────────────── │
│ Expressions: (i = i) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ CROSS_PRODUCT │
│ ──────────────────── ├───────────────────────────────────────────┐
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
│ PROJECTION │ │ PROJECTION │
│ ──────────────────── │ │ ──────────────────── │
│ Expressions: │ │ Expressions: │
│ i │ │ i │
│ i │ │ i │
└─────────────┬─────────────┘ └─────────────┬─────────────┘
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
FILTER │ │ CROSS_PRODUCT │
│ ──────────────────── │ │ ──────────────────── │
│ Expressions: │ │ ├──────────────┐
│ (i = CAST(5 AS INTEGER)) │ │ │ │
└─────────────┬─────────────┘ └─────────────┬─────────────┘ │
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ CROSS_PRODUCT │ │ SEQ_SCAN ││ SEQ_SCAN │
│ ──────────────────── │ │ ──────────────────── ││ ──────────────────── │
│ ├──────────────┐ │ Table: vals1 ││ Table: vals2 │
│ │ │ │ Type: Sequential Scan ││ Type: Sequential Scan │
└─────────────┬─────────────┘ │ └───────────────────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN ││ SEQ_SCAN │
│ ──────────────────── ││ ──────────────────── │
Table: vals1 ││ Table: vals2 │
│ Type: Sequential Scan ││ Type: Sequential Scan │
└───────────────────────────┘└───────────────────────────┘

FILTER_PULLUP 优化后的逻辑计划:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Unoptimized Logical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ PROJECTION │
│ ──────────────────── │
│ Expressions: │
│ i │
│ i_1 │
│ i │
│ i_1 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ FILTER │
│ ──────────────────── │
│ Expressions: (i = i) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ FILTER │
│ ──────────────────── │
│ Expressions: (i = 5) │ /* 将谓词条件 vals1.i=5 提升至这里. */
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ CROSS_PRODUCT │
│ ──────────────────── ├───────────────────────────────────────────┐
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐
│ PROJECTION │ │ PROJECTION │
│ ──────────────────── │ │ ──────────────────── │
│ Expressions: │ │ Expressions: │
│ i │ │ i │
│ i │ │ i │
└─────────────┬─────────────┘ └─────────────┬─────────────┘
│ ┌─────────────┴─────────────┐
│ │ CROSS_PRODUCT │
│ │ ──────────────────── │
│ │ ├──────────────┐
│ │ │ │
│ └─────────────┬─────────────┘ │
┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ CROSS_PRODUCT │ │ SEQ_SCAN ││ SEQ_SCAN │
│ ──────────────────── │ │ ──────────────────── ││ ──────────────────── │
│ ├──────────────┐ │ Table: vals1 ││ Table: vals2 │
│ │ │ │ Type: Sequential Scan ││ Type: Sequential Scan │
└─────────────┬─────────────┘ │ └───────────────────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN ││ SEQ_SCAN │
│ ──────────────────── ││ ──────────────────── │
│ Table: vals1 ││ Table: vals2 │
│ Type: Sequential Scan ││ Type: Sequential Scan │
└───────────────────────────┘└───────────────────────────┘
  • FILTER_PUSHDOWN,

  • EMPTY_RESULT_PULLUP,

  • CTE_FILTER_PUSHER,

  • REGEX_RANGE,

  • IN_CLAUSE,

  • JOIN_ORDER,

  • DELIMINATOR,

  • UNNEST_REWRITER,

  • UNUSED_COLUMNS,

  • STATISTICS_PROPAGATION,

  • COMMON_SUBEXPRESSIONS,

  • COMMON_AGGREGATE,

  • COLUMN_LIFETIME,

  • BUILD_SIDE_PROBE_SIDE,

  • LIMIT_PUSHDOWN,

  • TOP_N,

  • COMPRESSED_MATERIALIZATION,

  • DUPLICATE_GROUPS,

  • REORDER_FILTER,

  • SAMPLING_PUSHDOWN,

  • JOIN_FILTER_PUSHDOWN,

  • EXTENSION,

  • MATERIALIZED_CTE,

  • SUM_REWRITER: SUM() 聚合表达式的重写.

官方注释里标明了具体的优化规则即: SUM(x + C) -> SUM(x) + C * COUNT(x), x 为某列字段, C 为常量:
SUM(x + C) 需要对每列的数据先多一次加法, 每列数据额外引入了一条加法指令.
SUM(x) 对一大列连续内存做加法, 可以使用 SIMD,比如 AVX 指令一轮处理8或16个元素, 然后使用一条乘法指令 C * count(X) 即可.

  • LATE_MATERIALIZATION