Re: Index Skip Scan - Mailing list pgsql-hackers

From Jesper Pedersen
Subject Re: Index Skip Scan
Date
Msg-id 994cbf82-333c-56d7-2860-9c8c85aba790@redhat.com
Whole thread Raw
In response to Re: Index Skip Scan  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
Hi David,

On 7/11/19 7:38 AM, David Rowley wrote:
> The UniqueKeys idea is quite separate from pathkeys.  Currently, a
> Path can have a List of PathKeys which define the order that the
> tuples will be read from the Plan node that's created from that Path.
> The idea with UniqueKeys is that a Path can also have a non-empty List
> of UniqueKeys to define that there will be no more than 1 row with the
> same value for the Paths UniqueKey column/exprs.
> 
> As of now, if you look at standard_qp_callback() sets
> root->query_pathkeys, the idea here would be that the callback would
> also set a new List field named "query_uniquekeys" based on the
> group_pathkeys when non-empty and !root->query->hasAggs, or by using
> the distinct clause if it's non-empty. Then in build_index_paths()
> around the call to match_pathkeys_to_index() we'll probably also want
> to check if the index can support UniqueKeys that would suit the
> query_uniquekeys that were set during standard_qp_callback().
> 
> As for setting query_uniquekeys in standard_qp_callback(), this should
> be simply a matter of looping over either group_pathkeys or
> distinct_pathkeys and grabbing the pk_eclass from each key and making
> a canonical UniqueKey from that. To have these canonical you'll need
> to have a new field in PlannerInfo named canon_uniquekeys which will
> do for UniqueKeys what canon_pathkeys does for PathKeys. So you'll
> need an equivalent of make_canonical_pathkey() in uniquekey.c
> 
> With this implementation, the code that the patch adds in
> create_distinct_paths() can mostly disappear. You'd only need to look
> for a path that uniquekeys_contained_in() matches the
> root->query_uniquekeys and then just leave it to
> set_cheapest(distinct_rel); to pick the cheapest path.
> 
> It would be wasted effort to create paths with UniqueKeys if there's
> multiple non-dead base rels in the query as the final rel in
> create_distinct_paths would be a join rel, so it might be worth
> checking that before creating such index paths in build_index_paths().
> However, down the line, we could carry the uniquekeys forward into the
> join paths if the join does not duplicate rows from the other side of
> the join... That's future stuff though, not for this patch, I don't
> think.
> 
> I think a UniqueKey can just be a struct similar to PathKey, e.g be
> located in pathnodes.h around where PathKey is defined.  Likely we'll
> need a uniquekeys.c file that has the equivalent to
> pathkeys_contained_in() ... uniquekeys_contained_in(), which return
> true if arg1 is a superset of arg2 rather than check for one set being
> a prefix of another. As you mention above: UniqueKeys { x, y } ==
> UniqueKeys { y, x }.  That superset check could perhaps be optimized
> by sorting UniqueKey lists in memory address order, that'll save
> having a nested loop, but likely that's not going to be required for a
> first cut version.  This would work since you'd want UniqueKeys to be
> canonical the same as PathKeys are (Notice that compare_pathkeys()
> only checks memory addresses of pathkeys and not equals()).
> 
> I think the UniqueKey struct would only need to contain an
> EquivalenceClass field. I think all the other stuff that's in PathKey
> is irrelevant to UniqueKey.
> 

Thanks for the feedback ! I'll work on these changes for the next 
uniquekey patch.

Best regards,
  Jesper



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: buildfarm's typedefs list has gone completely nutso
Next
From: Robert Haas
Date:
Subject: Re: accounting for memory used for BufFile during hash joins