Re: Converting SetOp to read its two inputs separately - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Converting SetOp to read its two inputs separately
Date
Msg-id 534399.1732909465@sss.pgh.pa.us
Whole thread Raw
In response to Re: Converting SetOp to read its two inputs separately  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Converting SetOp to read its two inputs separately
List pgsql-hackers
David Rowley <dgrowleyml@gmail.com> writes:
> Here's a quick review of all 5 patches together.

Thanks for looking at it!

> 1. In setop_load_group(), the primary concern with the result of
> setop_compare_slots() seems to be if the tuples match or not. If
> you're not too concerned with keeping the Assert checking for
> mis-sorts, then it could be much more efficient for many cases to
> check the final column first.

Hm.  setop_compare_slots() has to deliver a 3-way compare result for
its usage in setop_retrieve_sorted(), where we're actually doing
a merge.  We could potentially use a different subroutine within
setop_load_group(), but I'm not sure it's worth added complexity.

> 2. (related to #1) I'm unsure if it's also worth deforming all the
> needed attributes in one go rather than calling slot_getattr() each
> time, which will result in incrementally deforming the tuple.

Good point; probably worth doing given that the low-order
column is often going to determine the result.

>> /* XXX hack: force the tuple into minimal form */
>> /* XXX should not have to do this */

> nodeAgg.c seems to do this by using prepare_hash_slot() which deforms
> the heap tuple and sets the Datums verbatim rather than making copies
> of any byref ones.

I'll take a look at that, thanks for the pointer.

> 4. Since the EXPLAIN output is going to change for SetOps, what are
> your thoughts on being more explicit about the setop strategy being
> used?  Showing "SetOp Except" isn't that informative.  You've got to
> look at the non-text format to know it's using the merge method.  How
> about "MergeSetOp Except"?

I'm confused here.  explain.c already uses "SetOp" versus "HashSetOp"
to show the strategy.  Are you saying we should instead print
"MergeSetOp" and "HashSetOp"?  That seems like change for the sake
of change, really.

> I'm also happy if you just push the 0004 patch. Are you planning to
> backpatch that?

I've been debating whether to propose a back-patch of that.  It's
clearly a bug fix in the sense that the current code isn't doing
what was intended.  On the other hand, it isn't (I think) resulting
in any wrong answers, just plans that are a bit less efficient
than they could be, and perhaps some planning time wasted on making
Paths that can't be used.  And generally we refrain from making
unnecessary changes of plan choices in released branches, because
users value plan stability.  So on the whole I'm leaning towards
not back-patching, but I could be persuaded otherwise if anyone
feels strongly that we should.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jim Jones
Date:
Subject: Re: Truncate logs by max_log_size
Next
From: Masahiko Sawada
Date:
Subject: Re: Count and log pages set all-frozen by vacuum