Mr Dk.'s BlogMr Dk.'s Blog
  • 🦆 About Me
  • ⛏️ Technology Stack
  • 🔗 Links
  • 🗒️ About Blog
  • Algorithm
  • C++
  • Compiler
  • Cryptography
  • DevOps
  • Docker
  • Git
  • Java
  • Linux
  • MS Office
  • MySQL
  • Network
  • Operating System
  • Performance
  • PostgreSQL
  • Productivity
  • Solidity
  • Vue.js
  • Web
  • Wireless
  • 🐧 How Linux Works (notes)
  • 🐧 Linux Kernel Comments (notes)
  • 🐧 Linux Kernel Development (notes)
  • 🐤 μc/OS-II Source Code (notes)
  • ☕ Understanding the JVM (notes)
  • ⛸️ Redis Implementation (notes)
  • 🗜️ Understanding Nginx (notes)
  • ⚙️ Netty in Action (notes)
  • ☁️ Spring Microservices (notes)
  • ⚒️ The Annotated STL Sources (notes)
  • ☕ Java Development Kit 8
GitHub
  • 🦆 About Me
  • ⛏️ Technology Stack
  • 🔗 Links
  • 🗒️ About Blog
  • Algorithm
  • C++
  • Compiler
  • Cryptography
  • DevOps
  • Docker
  • Git
  • Java
  • Linux
  • MS Office
  • MySQL
  • Network
  • Operating System
  • Performance
  • PostgreSQL
  • Productivity
  • Solidity
  • Vue.js
  • Web
  • Wireless
  • 🐧 How Linux Works (notes)
  • 🐧 Linux Kernel Comments (notes)
  • 🐧 Linux Kernel Development (notes)
  • 🐤 μc/OS-II Source Code (notes)
  • ☕ Understanding the JVM (notes)
  • ⛸️ Redis Implementation (notes)
  • 🗜️ Understanding Nginx (notes)
  • ⚙️ Netty in Action (notes)
  • ☁️ Spring Microservices (notes)
  • ⚒️ The Annotated STL Sources (notes)
  • ☕ Java Development Kit 8
GitHub
  • 📝 Notes
    • Algorithm
      • Algorithm - Bloom Filter
      • Algorithm - Disjoint Set
      • Algorithm - Fast Power
      • Algorithm - KMP
      • Algorithm - Monotonic Stack
      • Algorithm - RB-Tree
      • Algorithm - Regular Expression
      • Algorithm - Sliding Window
      • Online Judge - I/O
    • C++
      • C++ - Const
      • C++ File I/O
      • C++ - Object Layout
      • C++ - Operator Overload
      • C++ - Polymorphism
      • C++ STL algorithm
      • C++ STL map
      • C++ STL multimap
      • C++ STL priority_queue
      • C++ STL set
      • C++ STL string
      • C++ STL unordered_map
      • C++ STL vector
      • C++ - Smart Pointer
      • C++ - Template & Genericity
    • Compiler
      • ANTLR - Basic
      • Compiler - LLVM Architecture
      • Compiler - Multi-version GCC
    • Cryptography
      • Cryptography - Certbot
      • Cryptography - Digital Signature & PKCS #7
      • Cryptography - GPG
      • Cryptography - JWT
      • Cryptography - Keystore & Certificates
      • Cryptography - OAuth 2.0
      • Cryptography - Java 实现对称与非对称加密算法
      • Cryptography - TLS
    • DevOps
      • DevOps - Travis CI
    • Docker
      • Docker - Image & Storage Management
      • Docker - Image
      • Docker - Libcontainer
      • Docker - Multi-Arch Image
      • Docker - Multi-Stage Build
      • Docker - Network
      • Docker - Orchestration & Deployment
      • Docker - Overview
      • Docker - Service Building
      • Docker - Volume & Network Usage
      • Docker - Volume
      • Linux - Control Group
      • Linux - Namespace
    • Git
      • Git - Branch & Merge
      • Git - Cached
      • Git - Cherry Pick
      • Git - Commit
      • Git - Patch
      • Git - Proxy
      • Git - Rebase
      • Git - Reset
      • Git - Stash
      • Git - Theme for Git-Bash
    • Java
      • JVM - Synchronized
      • JVM - Volatile
      • Java - Annotation 注解
      • Java - BIO & NIO
      • Java - Class Path
      • Java - Condition and LockSupport
      • Java - Current Timestamp
      • Java - Deep Copy
      • Java - 运行环境配置
      • Java - Equals
      • Java - Exporting JAR
      • Java - Javadoc
      • Java - Lock
      • Java - Maven 项目构建工具
      • Java - References
      • Java - Reflection Mechanism
      • Java - String Split
      • Java - Thread Pool
      • Java - Thread
      • Tomcat - Class Loader
      • Tomcat - Container
    • Linux
      • addr2line
      • cut
      • df
      • du
      • fallocate
      • find
      • fio
      • grep
      • groupadd
      • gzip
      • head / tail
      • hexdump
      • iostat
      • iotop
      • kill
      • ldd
      • lsof
      • ltrace / strace
      • mpstat
      • netstat
      • nm
      • pidstat
      • pmap
      • readlink
      • readlink
      • rpm2cpio / rpm2archive
      • sort
      • tee
      • uniq
      • useradd
      • usermod
      • watch
      • wc
      • which
      • xargs
    • MS Office
      • MS Office - Add-in Dev
      • MS Office - Application
    • MySQL
      • InnoDB - Architecture
      • InnoDB - Backup
      • InnoDB - Checkpoint
      • InnoDB - Critical Features
      • InnoDB - Files
      • InnoDB - Index
      • InnoDB - Insert Buffer
      • InnoDB - Lock
      • InnoDB - Partition Table
      • InnoDB - Table Storage
      • MySQL - Server Configuration
      • MySQL - Storage Engine
    • Network
      • Network - ARP
      • Network - FTP
      • Network - GitHub Accelerating
      • HTTP - Message Format
      • HTTP - POST 提交表单的两种方式
      • Network - Proxy Server
      • Network - SCP
      • Network - SSH
      • Network - TCP Congestion Control
      • Network - TCP Connection Management
      • Network - TCP Flow Control
      • Network - TCP Retransmission
      • Network - Traceroute
      • Network - V2Ray
      • Network - WebSocket
      • Network - Windows 10 Mail APP
      • Network - frp
    • Operating System
      • Linux - Kernel Compilation
      • Linux - Multi-OS
      • Linux - Mutex & Condition
      • Linux - Operations
      • Linux: Package Manager
      • Linux - Process Manipulation
      • Linux - User ID
      • Linux - Execve
      • OS - Compile and Link
      • OS - Dynamic Linking
      • OS - ELF
      • Linux - Image
      • OS - Loading
      • OS - Shared Library Organization
      • OS - Static Linking
      • Syzkaller - Architecture
      • Syzkaller - Description Syntax
      • Syzkaller - Usage
      • Ubuntu - Desktop Recover (Python)
      • WSL: CentOS 8
    • Performance
      • Linux Performance - Perf Event
      • Linux Performance - Perf Record
      • Linux Performance - Perf Report
      • Linux Performance - Flame Graphs
      • Linux Performance - Off CPU Analyze
    • PostgreSQL
      • PostgreSQL - ANALYZE
      • PostgreSQL - Atomics
      • PostgreSQL - CREATE INDEX CONCURRENTLY
      • PostgreSQL - COPY FROM
      • PostgreSQL - COPY TO
      • PostgreSQL - Executor: Append
      • PostgreSQL - Executor: Group
      • PostgreSQL - Executor: Limit
      • PostgreSQL - Executor: Material
      • PostgreSQL - Executor: Nest Loop Join
      • PostgreSQL - Executor: Result
      • PostgreSQL - Executor: Sequential Scan
      • PostgreSQL - Executor: Sort
      • PostgreSQL - Executor: Unique
      • PostgreSQL - FDW Asynchronous Execution
      • PostgreSQL - GUC
      • PostgreSQL - Locking
      • PostgreSQL - LWLock
      • PostgreSQL - Multi Insert
      • PostgreSQL - Plan Hint GUC
      • PostgreSQL - Process Activity
      • PostgreSQL - Query Execution
      • PostgreSQL - Spinlock
      • PostgreSQL - Storage Management
      • PostgreSQL - VFD
      • PostgreSQL - WAL Insert
      • PostgreSQL - WAL Prefetch
    • Productivity
      • LaTeX
      • Venn Diagram
      • VuePress
    • Solidity
      • Solidity - ABI Specification
      • Solidity - Contracts
      • Solidity - Expressions and Control Structures
      • Solidity - Layout and Structure
      • Solidity - Remix IDE
      • Solidity - Slither
      • Solidity - Types
      • Solidity - Units and Globally Available Variables
    • Vue.js
      • Vue.js - Environment Variable
    • Web
      • Web - CORS
      • Web - OpenAPI Specification
    • Wireless
      • Wireless - WEP Cracking by Aircrack-ng
      • Wireless - WPS Cracking by Reaver
      • Wireless - wifiphisher

PostgreSQL - Executor: Sort

Created by : Mr Dk.

2021 / 07 / 11 21:07

Hangzhou, Zhejiang, China


Tuple Sort

排序节点也是一种物化节点,因为需要把下层节点所有的元组读取出来以后再进行排序。对于元组的保存,使用了 Tuplesortstate 结构。该结构在 Tuplestorestate 结构的基础上增加了排序的功能:

  • 如果内存中放得下,那么直接进行 快速排序
  • 如果内存中放不下,那么使用临时文件 + 外部 归并排序

Tuplesortstate 在 B-Tree 索引构建的代码中也有出现过,眼熟。之后单独分析它的功能,目前仅明确其用途:

  • tuplesort_begin_heap() 初始化元组缓存结构
  • tuplesort_puttupleslot() 将元组放入缓存结构
  • tuplesort_performsort() 对缓存结构中的元组进行排序
  • tuplesort_gettupleslot() 从缓存结构中获取元组
  • tuplesort_end() 释放缓存结构

Plan Node

排序节点继承自 Plan 结构体,并扩展了与排序相关的字段。主要包含进行排序的列数、进行排序的列索引、排序运算符及 NULL 值处理方式:

/* ----------------
 *      sort node
 * ----------------
 */
typedef struct Sort
{
    Plan        plan;
    int         numCols;        /* number of sort-key columns */
    AttrNumber *sortColIdx;     /* their indexes in the target list */
    Oid        *sortOperators;  /* OIDs of operators to sort them by */
    Oid        *collations;     /* OIDs of collations */
    bool       *nullsFirst;     /* NULLS FIRST/LAST directions */
} Sort;

Plan State

节点 state 继承自 ScanState。其中 ScanState 继承自 PlanState,扩展了与扫描相关的信息;另外 SortState 扩展了与 (并行) 排序相关的状态信息:

/* ----------------
 *   SortState information
 * ----------------
 */
typedef struct SortState
{
    ScanState   ss;             /* its first field is NodeTag */
    bool        randomAccess;   /* need random access to sort output? */
    bool        bounded;        /* is the result set bounded? */
    int64       bound;          /* if bounded, how many tuples are needed */
    bool        sort_Done;      /* sort completed yet? */
    bool        bounded_Done;   /* value of bounded we did the sort with */
    int64       bound_Done;     /* value of bound we did the sort with */
    void       *tuplesortstate; /* private state of tuplesort.c */
    bool        am_worker;      /* are we a worker? */
    SharedSortInfo *shared_info;    /* one entry per worker */
} SortState;

Initialization

在 ExecInitNode() 生命周期中被调用。构造 SortState 节点并初始化,然后递归调用 ExecInitNode()。排序状态 sort_Done 被初始化为 false。

同样,排序节点不需要初始化投影信息。返回的元组不做投影操作,直接返回。

/* ----------------------------------------------------------------
 *      ExecInitSort
 *
 *      Creates the run-time state information for the sort node
 *      produced by the planner and initializes its outer subtree.
 * ----------------------------------------------------------------
 */
SortState *
ExecInitSort(Sort *node, EState *estate, int eflags)
{
    SortState  *sortstate;

    SO1_printf("ExecInitSort: %s\n",
               "initializing sort node");

    /*
     * create state structure
     */
    sortstate = makeNode(SortState);
    sortstate->ss.ps.plan = (Plan *) node;
    sortstate->ss.ps.state = estate;
    sortstate->ss.ps.ExecProcNode = ExecSort;

    /*
     * We must have random access to the sort output to do backward scan or
     * mark/restore.  We also prefer to materialize the sort output if we
     * might be called on to rewind and replay it many times.
     */
    sortstate->randomAccess = (eflags & (EXEC_FLAG_REWIND |
                                         EXEC_FLAG_BACKWARD |
                                         EXEC_FLAG_MARK)) != 0;

    sortstate->bounded = false;
    sortstate->sort_Done = false;
    sortstate->tuplesortstate = NULL;

    /*
     * Miscellaneous initialization
     *
     * Sort nodes don't initialize their ExprContexts because they never call
     * ExecQual or ExecProject.
     */

    /*
     * initialize child nodes
     *
     * We shield the child node from the need to support REWIND, BACKWARD, or
     * MARK/RESTORE.
     */
    eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);

    outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);

    /*
     * Initialize scan slot and type.
     */
    ExecCreateScanSlotFromOuterPlan(estate, &sortstate->ss, &TTSOpsVirtual);

    /*
     * Initialize return slot and type. No need to initialize projection info
     * because this node doesn't do projections.
     */
    ExecInitResultTupleSlotTL(&sortstate->ss.ps, &TTSOpsMinimalTuple);
    sortstate->ss.ps.ps_ProjInfo = NULL;

    SO1_printf("ExecInitSort: %s\n",
               "sort node initialized");

    return sortstate;
}

Execution

在 ExecProcNode() 生命周期中被调用。当节点第一次被执行时:

  • 调用 tuplesort_begin_heap() 初始化缓存排序结构
  • 在一个死循环中不断对子节点调用 ExecProcNode() 获取元组,调用 tuplesort_puttupleslot() 将元组放入缓存排序结构中,直到子节点返回空元组
  • 调用 tuplesort_performsort() 完成排序
  • 将节点的 sort_Done 设置为 true

之后再进入节点时,只需要调用 tuplesort_gettupleslot() 从缓存排序结构中依次获取排序后的元组即可:

/* ----------------------------------------------------------------
 *      ExecSort
 *
 *      Sorts tuples from the outer subtree of the node using tuplesort,
 *      which saves the results in a temporary file or memory. After the
 *      initial call, returns a tuple from the file with each call.
 *
 *      Conditions:
 *        -- none.
 *
 *      Initial States:
 *        -- the outer child is prepared to return the first tuple.
 * ----------------------------------------------------------------
 */
static TupleTableSlot *
ExecSort(PlanState *pstate)
{
    SortState  *node = castNode(SortState, pstate);
    EState     *estate;
    ScanDirection dir;
    Tuplesortstate *tuplesortstate;
    TupleTableSlot *slot;

    CHECK_FOR_INTERRUPTS();

    /*
     * get state info from node
     */
    SO1_printf("ExecSort: %s\n",
               "entering routine");

    estate = node->ss.ps.state;
    dir = estate->es_direction;
    tuplesortstate = (Tuplesortstate *) node->tuplesortstate;

    /*
     * If first time through, read all tuples from outer plan and pass them to
     * tuplesort.c. Subsequent calls just fetch tuples from tuplesort.
     */

    if (!node->sort_Done)
    {
        Sort       *plannode = (Sort *) node->ss.ps.plan;
        PlanState  *outerNode;
        TupleDesc   tupDesc;

        SO1_printf("ExecSort: %s\n",
                   "sorting subplan");

        /*
         * Want to scan subplan in the forward direction while creating the
         * sorted data.
         */
        estate->es_direction = ForwardScanDirection;

        /*
         * Initialize tuplesort module.
         */
        SO1_printf("ExecSort: %s\n",
                   "calling tuplesort_begin");

        outerNode = outerPlanState(node);
        tupDesc = ExecGetResultType(outerNode);

        tuplesortstate = tuplesort_begin_heap(tupDesc,
                                              plannode->numCols,
                                              plannode->sortColIdx,
                                              plannode->sortOperators,
                                              plannode->collations,
                                              plannode->nullsFirst,
                                              work_mem,
                                              NULL,
                                              node->randomAccess);
        if (node->bounded)
            tuplesort_set_bound(tuplesortstate, node->bound);
        node->tuplesortstate = (void *) tuplesortstate;

        /*
         * Scan the subplan and feed all the tuples to tuplesort.
         */

        for (;;)
        {
            slot = ExecProcNode(outerNode);

            if (TupIsNull(slot))
                break;

            tuplesort_puttupleslot(tuplesortstate, slot);
        }

        /*
         * Complete the sort.
         */
        tuplesort_performsort(tuplesortstate);

        /*
         * restore to user specified direction
         */
        estate->es_direction = dir;

        /*
         * finally set the sorted flag to true
         */
        node->sort_Done = true;
        node->bounded_Done = node->bounded;
        node->bound_Done = node->bound;
        if (node->shared_info && node->am_worker)
        {
            TuplesortInstrumentation *si;

            Assert(IsParallelWorker());
            Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
            si = &node->shared_info->sinstrument[ParallelWorkerNumber];
            tuplesort_get_stats(tuplesortstate, si);
        }
        SO1_printf("ExecSort: %s\n", "sorting done");
    }

    SO1_printf("ExecSort: %s\n",
               "retrieving tuple from tuplesort");

    /*
     * Get the first or next tuple from tuplesort. Returns NULL if no more
     * tuples.  Note that we only rely on slot tuple remaining valid until the
     * next fetch from the tuplesort.
     */
    slot = node->ss.ps.ps_ResultTupleSlot;
    (void) tuplesort_gettupleslot(tuplesortstate,
                                  ScanDirectionIsForward(dir),
                                  false, slot, NULL);
    return slot;
}

Clean Up

在 ExecEndNode() 结构中被调用。其中调用 tuplesort_end() 释放缓存排序结构,并递归调用 ExecEndNode() 完成清理。

/* ----------------------------------------------------------------
 *      ExecEndSort(node)
 * ----------------------------------------------------------------
 */
void
ExecEndSort(SortState *node)
{
    SO1_printf("ExecEndSort: %s\n",
               "shutting down sort node");

    /*
     * clean out the tuple table
     */
    ExecClearTuple(node->ss.ss_ScanTupleSlot);
    /* must drop pointer to sort result tuple */
    ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);

    /*
     * Release tuplesort resources
     */
    if (node->tuplesortstate != NULL)
        tuplesort_end((Tuplesortstate *) node->tuplesortstate);
    node->tuplesortstate = NULL;

    /*
     * shut down the subplan
     */
    ExecEndNode(outerPlanState(node));

    SO1_printf("ExecEndSort: %s\n",
               "sort node shutdown");
}
Edit this page on GitHub
Prev
PostgreSQL - Executor: Sequential Scan
Next
PostgreSQL - Executor: Unique