Re: Index Skip Scan - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 20191110211832.vv4jbscszz74jete@development Whole thread Raw |
In response to | Re: Index Skip Scan (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: Index Skip Scan
|
List | pgsql-hackers |
Hi, I've looked at the patch again - in general it seems in pretty good shape, all the issues I found are mostly minor. Firstly, I'd like to point out that not all of the things I complained about in my 2019/06/23 review got addressed. Those were mostly related to formatting and code style, and the attached patch fixes some (but maybe not all) of them. The patch also tweaks wording of some bits in the docs and comments that I found unclear. Would be nice if a native speaker could take a look. A couple more comments: 1) pathkey_is_unique The one additional issue I found is pathkey_is_unique - it's not really explained what "unique" means and hot it's different from "redundant" (which has quite a long explanation before pathkey_is_redundant). My understanding is that pathkey is "unique" when it's EC does not match an EC of another pathkey in the list. But if that's the case, then the function name is wrong - it does exactly the opposite (i.e. it returns 'true' when the pathkey is *not* unique). 2) explain I wonder if we should print the "Skip scan" info always, or similarly to "Inner Unique" which does this: /* try not to be too chatty about this in text mode */ if (es->format != EXPLAIN_FORMAT_TEXT || (es->verbose && ((Join *) plan)->inner_unique)) ExplainPropertyBool("Inner Unique", ((Join *) plan)->inner_unique, es); break; I'd do the same thing for skip scan - print it only in verbose mode, or when using non-text output format. 3) There's an annoying limitation that for this to kick in, the order of expressions in the DISTINCT clause has to match the index, i.e. with index on (a,b,c) the skip scan will only kick in for queries using DISTINCT a DISTINCT a,b DISTINCT a,b,c but not e.g. DISTINCT a,c,b. I don't think there's anything forcing us to sort result of DISTINCT in any particular case, except that we don't consider the other orderings "interesting" so we don't really consider using the index (so no chance of using the skip scan). That leads to pretty annoying speedups/slowdowns due to seemingly irrelevant changes: -- everything great, a,b,c matches an index test=# explain (analyze, verbose) select distinct a,b,c from t; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using t_a_b_c_idx on public.t (cost=0.42..565.25 rows=1330 width=12) (actual time=0.016..10.387 rows=1331loops=1) Output: a, b, c Skip scan: true Heap Fetches: 1331 Planning Time: 0.106 ms Execution Time: 10.843 ms (6 rows) -- slow, mismatch with index test=# explain (analyze, verbose) select distinct a,c,b from t; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=22906.00..22919.30 rows=1330 width=12) (actual time=802.067..802.612 rows=1331 loops=1) Output: a, c, b Group Key: t.a, t.c, t.b -> Seq Scan on public.t (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.010..355.361 rows=1000000 loops=1) Output: a, b, c Planning Time: 0.076 ms Execution Time: 803.078 ms (7 rows) -- fast again, the extra ordering allows using the index again test=# explain (analyze, verbose) select distinct a,c,b from t order by a,b,c; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using t_a_b_c_idx on public.t (cost=0.42..565.25 rows=1330 width=12) (actual time=0.035..12.120 rows=1331loops=1) Output: a, c, b Skip scan: true Heap Fetches: 1331 Planning Time: 0.053 ms Execution Time: 12.632 ms (6 rows) This is a more generic issue, not specific to this patch, of course. I think we saw it with the incremental sort patch, IIRC. I wonder how difficult would it be to fix this here (not necessarily in v1). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: