Home > mailing lists

VALUES nodes and expression initialization - Mailing list pgsql-hackers

From	Andres Freund
Subject	VALUES nodes and expression initialization
Date	January 30, 2018 00:46:52
Msg-id	20180129214652.pqr23ykzemohpkpw@alap3.anarazel.de Whole thread Raw
List	pgsql-hackers

Tree view

Hi,

In contrast to most other nodes, nodeValuescan.c does expression
initialization at "runtime" rather than in initialization:

        /*
         * Get rid of any prior cycle's leftovers.  We use ReScanExprContext
         * not just ResetExprContext because we want any registered shutdown
         * callbacks to be called.
         */
        ReScanExprContext(econtext);

        /*
         * Build the expression eval state in the econtext's per-tuple memory.
         * This is a tad unusual, but we want to delete the eval state again
         * when we move to the next row, to avoid growth of memory
         * requirements over a long values list.
         */
        oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

this does make a good bit of sense for things like
INSERT ... VALUES (...), (...), ...
or even just a plain long
VALUES (...), (...), ...
statement (albeit the latter being a bit pointless).

but when *joining* something with a VALUES(), that's far less
beneficial. That can lead to the same VALUES() node being scanned over
and over, doing a lot of redundant initialization.

I noticed this when JITing the regression tests, and forcing every
single expression to be JITed. So maybe, just maybe, not the most
relevant case.

But it's not particularly hard to think of scenarios where that leads to
spending a good chunk of time in expression initialization, even leading
JIT aside.

The JIT case is easy enough to fix / work around, I'm however wondering
if we should do something about the repeated scan case. It's a
reasonably common pattern to join to VALUES to e.g. add additional
information to the results of an aggregate.


The case that trgiggered me looking at this is:
regression[17061][1]=# explain select count(*) from tenk1 a,  tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on
b.unique2= ss.x;
 
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                               QUERY PLAN                                                │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=1437.69..1437.70 rows=1 width=8)                                                       │
│   ->  Hash Join  (cost=394.91..1387.81 rows=19952 width=0)                                              │
│         Hash Cond: ("*VALUES*".column1 = b.unique2)                                                     │
│         ->  Nested Loop  (cost=0.29..718.84 rows=19952 width=4)                                         │
│               ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..269.93 rows=9976 width=4) │
│               ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)                           │
│         ->  Hash  (cost=269.93..269.93 rows=9976 width=4)                                               │
│               ->  Index Only Scan using tenk1_unique2 on tenk1 b  (cost=0.29..269.93 rows=9976 width=4) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Here materialization doesn't help, because there's a lateral dependency
preventing VALUES from materializing.

Does anybody think such cases are common enough that we should do
something about the constant re-initialization?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Simon Riggs
Date: 30 January 2018, 00:34:45
Subject: Re: [HACKERS] MERGE SQL Statement for PG11

From: Tomas Vondra
Date: 30 January 2018, 00:51:38
Subject: Re: JIT compiling with LLVM v9.0

VALUES nodes and expression initialization - Mailing list pgsql-hackers

Previous

Next