VALUES nodes and expression initialization - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | VALUES nodes and expression initialization |
Date | |
Msg-id | 20180129214652.pqr23ykzemohpkpw@alap3.anarazel.de Whole thread Raw |
List | pgsql-hackers |
Hi, In contrast to most other nodes, nodeValuescan.c does expression initialization at "runtime" rather than in initialization: /* * Get rid of any prior cycle's leftovers. We use ReScanExprContext * not just ResetExprContext because we want any registered shutdown * callbacks to be called. */ ReScanExprContext(econtext); /* * Build the expression eval state in the econtext's per-tuple memory. * This is a tad unusual, but we want to delete the eval state again * when we move to the next row, to avoid growth of memory * requirements over a long values list. */ oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory); this does make a good bit of sense for things like INSERT ... VALUES (...), (...), ... or even just a plain long VALUES (...), (...), ... statement (albeit the latter being a bit pointless). but when *joining* something with a VALUES(), that's far less beneficial. That can lead to the same VALUES() node being scanned over and over, doing a lot of redundant initialization. I noticed this when JITing the regression tests, and forcing every single expression to be JITed. So maybe, just maybe, not the most relevant case. But it's not particularly hard to think of scenarios where that leads to spending a good chunk of time in expression initialization, even leading JIT aside. The JIT case is easy enough to fix / work around, I'm however wondering if we should do something about the repeated scan case. It's a reasonably common pattern to join to VALUES to e.g. add additional information to the results of an aggregate. The case that trgiggered me looking at this is: regression[17061][1]=# explain select count(*) from tenk1 a, tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2= ss.x; ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Aggregate (cost=1437.69..1437.70 rows=1 width=8) │ │ -> Hash Join (cost=394.91..1387.81 rows=19952 width=0) │ │ Hash Cond: ("*VALUES*".column1 = b.unique2) │ │ -> Nested Loop (cost=0.29..718.84 rows=19952 width=4) │ │ -> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..269.93 rows=9976 width=4) │ │ -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) │ │ -> Hash (cost=269.93..269.93 rows=9976 width=4) │ │ -> Index Only Scan using tenk1_unique2 on tenk1 b (cost=0.29..269.93 rows=9976 width=4) │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) Here materialization doesn't help, because there's a lateral dependency preventing VALUES from materializing. Does anybody think such cases are common enough that we should do something about the constant re-initialization? Greetings, Andres Freund
pgsql-hackers by date: