Re: Improve EXPLAIN output for multicolumn B-Tree Index - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Improve EXPLAIN output for multicolumn B-Tree Index |
Date | |
Msg-id | CAEze2WgooitYVHO7tdVnewLM=hu8t4Eah2WYyAs25t5KCf7-oQ@mail.gmail.com Whole thread Raw |
In response to | Re: Improve EXPLAIN output for multicolumn B-Tree Index (Jelte Fennema-Nio <postgres@jeltef.nl>) |
Responses |
Re: Improve EXPLAIN output for multicolumn B-Tree Index
|
List | pgsql-hackers |
On Mon, 24 Jun 2024 at 11:58, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > +1 for the idea. > > On Mon, 24 Jun 2024 at 11:11, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I think this is too easy to confuse with the pre-existing 'Filter' > > condition, which you'll find on indexes with INCLUDE-d columns or > > filters on non-index columns. > > Why not combine them? And both call them Filter? In a sense this > filtering acts very similar to INCLUDE based filtering (for btrees at > least). It does not really behave similar: index scan keys (such as the id3=101 scankey) don't require visibility checks in the btree code, while the Filter condition _does_ require a visibility check, and delegates the check to the table AM if the scan isn't Index-Only, or if the VM didn't show all-visible during the check. Furthermore, the index could use the scankey to improve the number of keys to scan using "skip scans"; by realising during a forward scan that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you can skip forward to (1, 3, _), rather than having to go through tuples (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for INCLUDE-d columns, because their datatypes and structure are opaque to the index AM; the AM cannot assume anything about or do anything with those values. > Although I might be wrong about that, because when I try to > confirm the same perf using the following script I do get quite > different timings (maybe you have an idea what's going on here). But > even if it does mean something slightly different perf wise, I think > using Filter for both is unlikely to confuse anyone. I don't want A to to be the plan, while showing B' to the user, as the performance picture for the two may be completely different. And, as I mentioned upthread, the differences between AMs in the (lack of) meaning in index column order also makes it quite wrong to generally separate prefixes equalities from the rest of the keys. > Since, while > allowed, it seems extremely unlikely in practice that someone will use > the same column as part of the indexed columns and as part of the > INCLUDE-d columns (why would you store the same info twice). Yeah, people don't generally include the same index column more than once in the same index. > CREATE INDEX test_idx_include ON test(id1, id2) INCLUDE (id3); > CREATE INDEX test_idx ON test(id1, id2, id3); > > QUERY PLAN > ─────────────────────────────────────── > Index Only Scan using test_idx_include on public.test [...] > Time: 7.139 ms > QUERY PLAN > ───────────────────────────────────── > Index Only Scan using test_idx on public.test (cost=0.42..2591.77 [...] > Time: 2.645 ms As you can see, there's a huge difference in performance. Putting both non-bound and "normal" filter clauses in the same Filter clause will make it more difficult to explain performance issues based on only the explain output. Kind regards, Matthias van de Meent Neon (https://neon.tech)
pgsql-hackers by date: