Mr Dk.'s BlogMr Dk.'s Blog
  • 🦆 About Me
  • ⛏️ Technology Stack
  • 🔗 Links
  • 🗒️ About Blog
  • Algorithm
  • C++
  • Compiler
  • Cryptography
  • DevOps
  • Docker
  • Git
  • Java
  • Linux
  • MS Office
  • MySQL
  • Network
  • Operating System
  • Performance
  • PostgreSQL
  • Productivity
  • Solidity
  • Vue.js
  • Web
  • Wireless
  • 🐧 How Linux Works (notes)
  • 🐧 Linux Kernel Comments (notes)
  • 🐧 Linux Kernel Development (notes)
  • 🐤 μc/OS-II Source Code (notes)
  • ☕ Understanding the JVM (notes)
  • ⛸️ Redis Implementation (notes)
  • 🗜️ Understanding Nginx (notes)
  • ⚙️ Netty in Action (notes)
  • ☁️ Spring Microservices (notes)
  • ⚒️ The Annotated STL Sources (notes)
  • ☕ Java Development Kit 8
GitHub
  • 🦆 About Me
  • ⛏️ Technology Stack
  • 🔗 Links
  • 🗒️ About Blog
  • Algorithm
  • C++
  • Compiler
  • Cryptography
  • DevOps
  • Docker
  • Git
  • Java
  • Linux
  • MS Office
  • MySQL
  • Network
  • Operating System
  • Performance
  • PostgreSQL
  • Productivity
  • Solidity
  • Vue.js
  • Web
  • Wireless
  • 🐧 How Linux Works (notes)
  • 🐧 Linux Kernel Comments (notes)
  • 🐧 Linux Kernel Development (notes)
  • 🐤 μc/OS-II Source Code (notes)
  • ☕ Understanding the JVM (notes)
  • ⛸️ Redis Implementation (notes)
  • 🗜️ Understanding Nginx (notes)
  • ⚙️ Netty in Action (notes)
  • ☁️ Spring Microservices (notes)
  • ⚒️ The Annotated STL Sources (notes)
  • ☕ Java Development Kit 8
GitHub
  • 📝 Notes
    • Algorithm
      • Algorithm - Bloom Filter
      • Algorithm - Disjoint Set
      • Algorithm - Fast Power
      • Algorithm - KMP
      • Algorithm - Monotonic Stack
      • Algorithm - RB-Tree
      • Algorithm - Regular Expression
      • Algorithm - Sliding Window
      • Online Judge - I/O
    • C++
      • C++ - Const
      • C++ File I/O
      • C++ - Object Layout
      • C++ - Operator Overload
      • C++ - Polymorphism
      • C++ STL algorithm
      • C++ STL map
      • C++ STL multimap
      • C++ STL priority_queue
      • C++ STL set
      • C++ STL string
      • C++ STL unordered_map
      • C++ STL vector
      • C++ - Smart Pointer
      • C++ - Template & Genericity
    • Compiler
      • ANTLR - Basic
      • Compiler - LLVM Architecture
      • Compiler - Multi-version GCC
    • Cryptography
      • Cryptography - Certbot
      • Cryptography - Digital Signature & PKCS #7
      • Cryptography - GPG
      • Cryptography - JWT
      • Cryptography - Keystore & Certificates
      • Cryptography - OAuth 2.0
      • Cryptography - Java 实现对称与非对称加密算法
      • Cryptography - TLS
    • DevOps
      • DevOps - Travis CI
    • Docker
      • Docker - Image & Storage Management
      • Docker - Image
      • Docker - Libcontainer
      • Docker - Multi-Arch Image
      • Docker - Multi-Stage Build
      • Docker - Network
      • Docker - Orchestration & Deployment
      • Docker - Overview
      • Docker - Service Building
      • Docker - Volume & Network Usage
      • Docker - Volume
      • Linux - Control Group
      • Linux - Namespace
    • Git
      • Git - Branch & Merge
      • Git - Cached
      • Git - Cherry Pick
      • Git - Commit
      • Git - Patch
      • Git - Proxy
      • Git - Rebase
      • Git - Reset
      • Git - Stash
      • Git - Theme for Git-Bash
    • Java
      • JVM - Synchronized
      • JVM - Volatile
      • Java - Annotation 注解
      • Java - BIO & NIO
      • Java - Class Path
      • Java - Condition and LockSupport
      • Java - Current Timestamp
      • Java - Deep Copy
      • Java - 运行环境配置
      • Java - Equals
      • Java - Exporting JAR
      • Java - Javadoc
      • Java - Lock
      • Java - Maven 项目构建工具
      • Java - References
      • Java - Reflection Mechanism
      • Java - String Split
      • Java - Thread Pool
      • Java - Thread
      • Tomcat - Class Loader
      • Tomcat - Container
    • Linux
      • addr2line
      • cut
      • df
      • du
      • fallocate
      • find
      • fio
      • grep
      • groupadd
      • gzip
      • head / tail
      • hexdump
      • iostat
      • iotop
      • kill
      • ldd
      • lsof
      • ltrace / strace
      • mpstat
      • netstat
      • nm
      • pidstat
      • pmap
      • readlink
      • readlink
      • rpm2cpio / rpm2archive
      • sort
      • tee
      • uniq
      • useradd
      • usermod
      • watch
      • wc
      • which
      • xargs
    • MS Office
      • MS Office - Add-in Dev
      • MS Office - Application
    • MySQL
      • InnoDB - Architecture
      • InnoDB - Backup
      • InnoDB - Checkpoint
      • InnoDB - Critical Features
      • InnoDB - Files
      • InnoDB - Index
      • InnoDB - Insert Buffer
      • InnoDB - Lock
      • InnoDB - Partition Table
      • InnoDB - Table Storage
      • MySQL - Server Configuration
      • MySQL - Storage Engine
    • Network
      • Network - ARP
      • Network - FTP
      • Network - GitHub Accelerating
      • HTTP - Message Format
      • HTTP - POST 提交表单的两种方式
      • Network - Proxy Server
      • Network - SCP
      • Network - SSH
      • Network - TCP Congestion Control
      • Network - TCP Connection Management
      • Network - TCP Flow Control
      • Network - TCP Retransmission
      • Network - Traceroute
      • Network - V2Ray
      • Network - WebSocket
      • Network - Windows 10 Mail APP
      • Network - frp
    • Operating System
      • Linux - Kernel Compilation
      • Linux - Multi-OS
      • Linux - Mutex & Condition
      • Linux - Operations
      • Linux: Package Manager
      • Linux - Process Manipulation
      • Linux - User ID
      • Linux - Execve
      • OS - Compile and Link
      • OS - Dynamic Linking
      • OS - ELF
      • Linux - Image
      • OS - Loading
      • OS - Shared Library Organization
      • OS - Static Linking
      • Syzkaller - Architecture
      • Syzkaller - Description Syntax
      • Syzkaller - Usage
      • Ubuntu - Desktop Recover (Python)
      • WSL: CentOS 8
    • Performance
      • Linux Performance - Perf Event
      • Linux Performance - Perf Record
      • Linux Performance - Perf Report
      • Linux Performance - Flame Graphs
      • Linux Performance - Off CPU Analyze
    • PostgreSQL
      • PostgreSQL - ANALYZE
      • PostgreSQL - Atomics
      • PostgreSQL - CREATE INDEX CONCURRENTLY
      • PostgreSQL - COPY FROM
      • PostgreSQL - COPY TO
      • PostgreSQL - Executor: Append
      • PostgreSQL - Executor: Group
      • PostgreSQL - Executor: Limit
      • PostgreSQL - Executor: Material
      • PostgreSQL - Executor: Nest Loop Join
      • PostgreSQL - Executor: Result
      • PostgreSQL - Executor: Sequential Scan
      • PostgreSQL - Executor: Sort
      • PostgreSQL - Executor: Unique
      • PostgreSQL - FDW Asynchronous Execution
      • PostgreSQL - GUC
      • PostgreSQL - Locking
      • PostgreSQL - LWLock
      • PostgreSQL - Multi Insert
      • PostgreSQL - Plan Hint GUC
      • PostgreSQL - Process Activity
      • PostgreSQL - Query Execution
      • PostgreSQL - Spinlock
      • PostgreSQL - Storage Management
      • PostgreSQL - VFD
      • PostgreSQL - WAL Insert
      • PostgreSQL - WAL Prefetch
    • Productivity
      • LaTeX
      • Venn Diagram
      • VuePress
    • Solidity
      • Solidity - ABI Specification
      • Solidity - Contracts
      • Solidity - Expressions and Control Structures
      • Solidity - Layout and Structure
      • Solidity - Remix IDE
      • Solidity - Slither
      • Solidity - Types
      • Solidity - Units and Globally Available Variables
    • Vue.js
      • Vue.js - Environment Variable
    • Web
      • Web - CORS
      • Web - OpenAPI Specification
    • Wireless
      • Wireless - WEP Cracking by Aircrack-ng
      • Wireless - WPS Cracking by Reaver
      • Wireless - wifiphisher

PostgreSQL - Executor: Material

Created by : Mr Dk.

2021 / 07 / 11 20:15

Hangzhou, Zhejiang, China


该节点被翻译为 物化 节点,含义为将子计划中的元组缓存在当前节点中。如果子计划节点需要被反复访问,并且每次返回的元组都相同,那么对子计划返回的元组进行缓存可以提高性能。

Tuple Store

元组缓存使用 Tuplestorestate 结构,维护了用于缓存的内存空间和临时文件。之后准备专门分析一下和这个结构相关的代码。先明确几个相关 API 的功能:

  • tuplestore_begin_heap 构造并初始化存储结构
  • tuplestore_puttupleslot 将元组 append 到 tuple store 的最后
  • tuplestore_gettupleslot 从 tuple store 中读取元组
  • tuplestore_end 释放 tuple store

Plan Node

物化节点较为简单,直接继承自 Plan 结构:

/* ----------------
 *      materialization node
 * ----------------
 */
typedef struct Material
{
    Plan        plan;
} Material;

Plan State

物化节点的 state 节点继承自 Scan State:

/* ----------------
 *   MaterialState information
 *
 *      materialize nodes are used to materialize the results
 *      of a subplan into a temporary file.
 *
 *      ss.ss_ScanTupleSlot refers to output of underlying plan.
 * ----------------
 */
typedef struct MaterialState
{
    ScanState   ss;             /* its first field is NodeTag */
    int         eflags;         /* capability flags to pass to tuplestore */
    bool        eof_underlying; /* reached end of underlying plan? */
    Tuplestorestate *tuplestorestate;
} MaterialState;

/* ----------------
 *   ScanState information
 *
 *      ScanState extends PlanState for node types that represent
 *      scans of an underlying relation.  It can also be used for nodes
 *      that scan the output of an underlying plan node --- in that case,
 *      only ScanTupleSlot is actually useful, and it refers to the tuple
 *      retrieved from the subplan.
 *
 *      currentRelation    relation being scanned (NULL if none)
 *      currentScanDesc    current scan descriptor for scan (NULL if none)
 *      ScanTupleSlot      pointer to slot in tuple table holding scan tuple
 * ----------------
 */
typedef struct ScanState
{
    PlanState   ps;             /* its first field is NodeTag */
    Relation    ss_currentRelation;
    struct TableScanDescData *ss_currentScanDesc;
    TupleTableSlot *ss_ScanTupleSlot;
} ScanState;

其中 ScanState 中维护了与扫描相关的信息,而 MaterialState 中额外扩展了用于缓存元组的 Tuplestorestate,以及指示该节点的子计划是否已经扫描完毕的 eof_underlying。

Initialization

初始化操作在 ExecInitNode() 生命周期中被调用,主要工作是分配和 Material 节点对应的 MaterialState 节点,还要对子节点递归调用 ExecInitNode()。在这里,首先把 eof_underlying 设置为 false,表示底层子节点还没有扫描完成 (EOF)。

与其它节点初始化过程不同的是,物化节点不需要初始化选择和投影信息。

/* ----------------------------------------------------------------
 *      ExecInitMaterial
 * ----------------------------------------------------------------
 */
MaterialState *
ExecInitMaterial(Material *node, EState *estate, int eflags)
{
    MaterialState *matstate;
    Plan       *outerPlan;

    /*
     * create state structure
     */
    matstate = makeNode(MaterialState);
    matstate->ss.ps.plan = (Plan *) node;
    matstate->ss.ps.state = estate;
    matstate->ss.ps.ExecProcNode = ExecMaterial;

    /*
     * We must have a tuplestore buffering the subplan output to do backward
     * scan or mark/restore.  We also prefer to materialize the subplan output
     * if we might be called on to rewind and replay it many times. However,
     * if none of these cases apply, we can skip storing the data.
     */
    matstate->eflags = (eflags & (EXEC_FLAG_REWIND |
                                  EXEC_FLAG_BACKWARD |
                                  EXEC_FLAG_MARK));

    /*
     * Tuplestore's interpretation of the flag bits is subtly different from
     * the general executor meaning: it doesn't think BACKWARD necessarily
     * means "backwards all the way to start".  If told to support BACKWARD we
     * must include REWIND in the tuplestore eflags, else tuplestore_trim
     * might throw away too much.
     */
    if (eflags & EXEC_FLAG_BACKWARD)
        matstate->eflags |= EXEC_FLAG_REWIND;

    matstate->eof_underlying = false;
    matstate->tuplestorestate = NULL;

    /*
     * Miscellaneous initialization
     *
     * Materialization nodes don't need ExprContexts because they never call
     * ExecQual or ExecProject.
     */

    /*
     * initialize child nodes
     *
     * We shield the child node from the need to support REWIND, BACKWARD, or
     * MARK/RESTORE.
     */
    eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);

    outerPlan = outerPlan(node);
    outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);

    /*
     * Initialize result type and slot. No need to initialize projection info
     * because this node doesn't do projections.
     *
     * material nodes only return tuples from their materialized relation.
     */
    ExecInitResultTupleSlotTL(&matstate->ss.ps, &TTSOpsMinimalTuple);
    matstate->ss.ps.ps_ProjInfo = NULL;

    /*
     * initialize tuple type.
     */
    ExecCreateScanSlotFromOuterPlan(estate, &matstate->ss, &TTSOpsMinimalTuple);

    return matstate;
}

Execution

在 ExecProcNode() 生命周期中被执行。只要目前访问的是 tuple store 的尾部,就要递归调用子节点的 ExecProcNode() 获取一个元组缓存到 tuple store 中并返回。Tuple store 本身只有在以下情况中才会被读取:

  • 反向扫描
  • 重新扫描
  • Mark / restore

首先,如果是第一次进入物化节点,那么 tuple store 暂时还为空,需要调用 tuplestore_begin_heap() 初始化。接着判断当前是否已经到达 tuple store 的最后 (EOF):

  • 如果不是,那么可以直接调用 tuplestore_gettupleslot() 从 tuple store 取得缓存元组
  • 如果是,那么对子节点调用 ExecProcNode() 获取元组
    • 如果从子节点返回空元组,说明扫描结束,将 eof_underlying 设置为 true
    • 如果从子节点返回元组,则调用 tuplestore_puttupleslot() 将元组放入 tuple store 后返回元组

下次可以重点研究一下 tuple store 是怎么放置元组的。

/* ----------------------------------------------------------------
 *      ExecMaterial
 *
 *      As long as we are at the end of the data collected in the tuplestore,
 *      we collect one new row from the subplan on each call, and stash it
 *      aside in the tuplestore before returning it.  The tuplestore is
 *      only read if we are asked to scan backwards, rescan, or mark/restore.
 *
 * ----------------------------------------------------------------
 */
static TupleTableSlot *         /* result tuple from subplan */
ExecMaterial(PlanState *pstate)
{
    MaterialState *node = castNode(MaterialState, pstate);
    EState     *estate;
    ScanDirection dir;
    bool        forward;
    Tuplestorestate *tuplestorestate;
    bool        eof_tuplestore;
    TupleTableSlot *slot;

    CHECK_FOR_INTERRUPTS();

    /*
     * get state info from node
     */
    estate = node->ss.ps.state;
    dir = estate->es_direction;
    forward = ScanDirectionIsForward(dir);
    tuplestorestate = node->tuplestorestate;

    /*
     * If first time through, and we need a tuplestore, initialize it.
     */
    if (tuplestorestate == NULL && node->eflags != 0)
    {
        tuplestorestate = tuplestore_begin_heap(true, false, work_mem);
        tuplestore_set_eflags(tuplestorestate, node->eflags);
        if (node->eflags & EXEC_FLAG_MARK)
        {
            /*
             * Allocate a second read pointer to serve as the mark. We know it
             * must have index 1, so needn't store that.
             */
            int         ptrno PG_USED_FOR_ASSERTS_ONLY;

            ptrno = tuplestore_alloc_read_pointer(tuplestorestate,
                                                  node->eflags);
            Assert(ptrno == 1);
        }
        node->tuplestorestate = tuplestorestate;
    }

    /*
     * If we are not at the end of the tuplestore, or are going backwards, try
     * to fetch a tuple from tuplestore.
     */
    eof_tuplestore = (tuplestorestate == NULL) ||
        tuplestore_ateof(tuplestorestate);

    if (!forward && eof_tuplestore)
    {
        if (!node->eof_underlying)
        {
            /*
             * When reversing direction at tuplestore EOF, the first
             * gettupleslot call will fetch the last-added tuple; but we want
             * to return the one before that, if possible. So do an extra
             * fetch.
             */
            if (!tuplestore_advance(tuplestorestate, forward))
                return NULL;    /* the tuplestore must be empty */
        }
        eof_tuplestore = false;
    }

    /*
     * If we can fetch another tuple from the tuplestore, return it.
     */
    slot = node->ss.ps.ps_ResultTupleSlot;
    if (!eof_tuplestore)
    {
        if (tuplestore_gettupleslot(tuplestorestate, forward, false, slot))
            return slot;
        if (forward)
            eof_tuplestore = true;
    }

    /*
     * If necessary, try to fetch another row from the subplan.
     *
     * Note: the eof_underlying state variable exists to short-circuit further
     * subplan calls.  It's not optional, unfortunately, because some plan
     * node types are not robust about being called again when they've already
     * returned NULL.
     */
    if (eof_tuplestore && !node->eof_underlying)
    {
        PlanState  *outerNode;
        TupleTableSlot *outerslot;

        /*
         * We can only get here with forward==true, so no need to worry about
         * which direction the subplan will go.
         */
        outerNode = outerPlanState(node);
        outerslot = ExecProcNode(outerNode);
        if (TupIsNull(outerslot))
        {
            node->eof_underlying = true;
            return NULL;
        }

        /*
         * Append a copy of the returned tuple to tuplestore.  NOTE: because
         * the tuplestore is certainly in EOF state, its read position will
         * move forward over the added tuple.  This is what we want.
         */
        if (tuplestorestate)
            tuplestore_puttupleslot(tuplestorestate, outerslot);

        ExecCopySlot(slot, outerslot);
        return slot;
    }

    /*
     * Nothing left ...
     */
    return ExecClearTuple(slot);
}

Clean Up

在 ExecEndNode() 生命周期中被调用。调用 tuplestore_end() 释放 tuple store,然后对子节点递归调用 ExecEndNode()。

/* ----------------------------------------------------------------
 *      ExecEndMaterial
 * ----------------------------------------------------------------
 */
void
ExecEndMaterial(MaterialState *node)
{
    /*
     * clean out the tuple table
     */
    ExecClearTuple(node->ss.ss_ScanTupleSlot);

    /*
     * Release tuplestore resources
     */
    if (node->tuplestorestate != NULL)
        tuplestore_end(node->tuplestorestate);
    node->tuplestorestate = NULL;

    /*
     * shut down the subplan
     */
    ExecEndNode(outerPlanState(node));
}
Edit this page on GitHub
Prev
PostgreSQL - Executor: Limit
Next
PostgreSQL - Executor: Nest Loop Join