Re: Index Skip Scan (new UniqueKeys) - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan (new UniqueKeys)
Date
Msg-id 20200714161852.s27ikts2dppvbm3h@localhost
Whole thread Raw
In response to RE: Index Skip Scan (new UniqueKeys)  (Floris Van Nee <florisvannee@Optiver.com>)
Responses RE: Index Skip Scan (new UniqueKeys)
List pgsql-hackers
> On Sun, Jul 12, 2020 at 12:48:47PM +0000, Floris Van Nee wrote:
> >
> > Good point, thanks for looking at this. With the latest planner version there
> > are indeed more possibilities to use skipping. It never occured to me that
> > some of those paths will still rely on index scan returning full data set. I'll look
> > in details and add verification to prevent putting something like this on top of
> > skip scan in the next version.
>
> I believe the required changes are something like in attached patch. There were a few things I've changed:
> - build_uniquekeys was constructing the list incorrectly. For a DISTINCT a,b, it would create two unique keys, one
witha and one with b. However, it should be one unique key with (a,b).
 

Yes, I've also noticed that while preparing fix for index scan not
covered by index and included it.

> - the uniquekeys that is built, still contains some redundant keys, that are normally eliminated from the path keys
lists.

I guess you're talking about:

+ if (EC_MUST_BE_REDUNDANT(ec))
+   continue;

Can you add some test cases to your changes to show the effect of it? It
seem to me redundant keys are already eliminated at this point by either
make_pathkeys_for_uniquekeys or even earlier for distinct on, but could
be I've missed something.

Along the lines I'm also curious about this part:

-    ListCell   *k;
-    List *exprs = NIL;
-
-    foreach(k, ec->ec_members)
-    {
-        EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
-        exprs = lappend(exprs, mem->em_expr);
-    }
-
-    result = lappend(result, makeUniqueKey(exprs, false, false));
+    EquivalenceMember *mem = (EquivalenceMember*) lfirst(list_head(ec->ec_members));

I'm curious about this myself, maybe someone can clarify. It looks like
generaly speaking there could be more than one member (if not
ec_has_volatile), which "representing knowledge that multiple items are
effectively equal". Is this information is not interesting enough to
preserve it in unique keys?

> - the distinct_pathkeys may be NULL, even though there's a possibility for skipping. But it wouldn't create the
uniquekeysin this case. This makes the planner not choose skip scans even though it could. For example in queries that
doSELECT DISTINCT ON (a) * FROM t1 WHERE a=1 ORDER BY a,b; Since a is constant, it's eliminated from regular pathkeys.
 

What would be the point of skipping if it's a constant?

> - to combat the issues mentioned earlier, there's now a check in build_index_paths that checks if the query_pathkeys
matchesthe useful_pathkeys. Note that we have to use the path keys here rather than any of the unique keys. The unique
keysare only Expr nodes - they do not contain the necessary information about ordering. Due to elimination of some
constantpath keys, we have to search the attributes of the index to find the correct prefix to use in skipping.
 

IIUC here you mean this function, right?

+ prefix = find_index_prefix_for_pathkey(root,
+                                          index,
+                                          BackwardScanDirection,
+                                          llast_node(PathKey,
+                                          root->distinct_pathkeys));

Doesn't it duplicate the job already done in build_index_pathkeys by
building those pathkeys again? If yes, probably it's possible to reuse
useful_pathkeys. Not sure about unordered indexes, but looks like
query_pathkeys should also match in this case.

Will also look at the follow up questions in the next email.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit