Re: Index Skip Scan - Mailing list pgsql-hackers
From | Jesper Pedersen |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 920e3aa1-c838-1608-79d7-392f200c98e0@redhat.com Whole thread Raw |
In response to | Re: Index Skip Scan (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Index Skip Scan
|
List | pgsql-hackers |
Hi Tomas, On 11/10/19 4:18 PM, Tomas Vondra wrote: > I've looked at the patch again - in general it seems in pretty good > shape, all the issues I found are mostly minor. > > Firstly, I'd like to point out that not all of the things I complained > about in my 2019/06/23 review got addressed. Those were mostly related > to formatting and code style, and the attached patch fixes some (but > maybe not all) of them. > Sorry about that ! > The patch also tweaks wording of some bits in the docs and comments that > I found unclear. Would be nice if a native speaker could take a look. > > A couple more comments: > > > 1) pathkey_is_unique > > The one additional issue I found is pathkey_is_unique - it's not really > explained what "unique" means and hot it's different from "redundant" > (which has quite a long explanation before pathkey_is_redundant). > > My understanding is that pathkey is "unique" when it's EC does not match > an EC of another pathkey in the list. But if that's the case, then the > function name is wrong - it does exactly the opposite (i.e. it returns > 'true' when the pathkey is *not* unique). > Yeah, you are correct - forgot to move that part from the _uniquekey version of the patch. > > 2) explain > > I wonder if we should print the "Skip scan" info always, or similarly to > "Inner Unique" which does this: > > /* try not to be too chatty about this in text mode */ > if (es->format != EXPLAIN_FORMAT_TEXT || > (es->verbose && ((Join *) plan)->inner_unique)) > ExplainPropertyBool("Inner Unique", > ((Join *) plan)->inner_unique, > es); > break; > > I'd do the same thing for skip scan - print it only in verbose mode, or > when using non-text output format. > I think it is of benefit to see if skip scan kicks in, but used your "Skip scan" suggestion. > > 3) There's an annoying limitation that for this to kick in, the order of > expressions in the DISTINCT clause has to match the index, i.e. with > index on (a,b,c) the skip scan will only kick in for queries using > > DISTINCT a > DISTINCT a,b > DISTINCT a,b,c > > but not e.g. DISTINCT a,c,b. I don't think there's anything forcing us > to sort result of DISTINCT in any particular case, except that we don't > consider the other orderings "interesting" so we don't really consider > using the index (so no chance of using the skip scan). > > That leads to pretty annoying speedups/slowdowns due to seemingly > irrelevant changes: > > -- everything great, a,b,c matches an index > test=# explain (analyze, verbose) select distinct a,b,c from t; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------- > > Index Only Scan using t_a_b_c_idx on public.t (cost=0.42..565.25 > rows=1330 width=12) (actual time=0.016..10.387 rows=1331 loops=1) > Output: a, b, c > Skip scan: true > Heap Fetches: 1331 > Planning Time: 0.106 ms > Execution Time: 10.843 ms > (6 rows) > > -- slow, mismatch with index > test=# explain (analyze, verbose) select distinct a,c,b from t; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------- > > HashAggregate (cost=22906.00..22919.30 rows=1330 width=12) (actual > time=802.067..802.612 rows=1331 loops=1) > Output: a, c, b > Group Key: t.a, t.c, t.b > -> Seq Scan on public.t (cost=0.00..15406.00 rows=1000000 > width=12) (actual time=0.010..355.361 rows=1000000 loops=1) > Output: a, b, c > Planning Time: 0.076 ms > Execution Time: 803.078 ms > (7 rows) > > -- fast again, the extra ordering allows using the index again > test=# explain (analyze, verbose) select distinct a,c,b from t order by > a,b,c; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------- > > Index Only Scan using t_a_b_c_idx on public.t (cost=0.42..565.25 > rows=1330 width=12) (actual time=0.035..12.120 rows=1331 loops=1) > Output: a, c, b > Skip scan: true > Heap Fetches: 1331 > Planning Time: 0.053 ms > Execution Time: 12.632 ms > (6 rows) > > > This is a more generic issue, not specific to this patch, of course. I > think we saw it with the incremental sort patch, IIRC. I wonder how > difficult would it be to fix this here (not necessarily in v1). > Yeah, I see it as separate to this patch as well. But definitely something that should be revisited. Thanks for your patch ! v29 using UniqueKey attached. Best regards, Jesper
Attachment
pgsql-hackers by date: