Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] path toward faster partition pruning
Date
Msg-id CA+TgmoZb_PPyZ4W2MWsS+71aC2UFiVvKJ_H0oDqRzbeen2fMZg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] path toward faster partition pruning  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Tue, Mar 6, 2018 at 8:34 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> One thing I've learned in my time working with PostgreSQL is that, if
> there's a known hole, someone's probably going to fall down it
> eventually.  I like working with PostgreSQL because we're pretty
> careful to not make holes that people can fall down, or if there is
> some hole that cannot be filled in, we try to put a fence around it
> with a sign, (e.g rename pg_xlog to pg_wal).  I'm not strongly opposed
> to your ideas, I probably don't have a complete understanding of the
> idea anyway. But from what I understand it looks like you want to take
> something that works quite well and make it work less well, and there
> appears not to be a good reason provided of why you want to do that.
>
> Is it because you want to simplify the patch due to concerns about it
> being too much logic to get right for PG11?

My understanding is that the patch as submitted is fundamentally
broken in multiple ways.

As Amit said a few emails upthread, "So while the patch's previous
approach to convert the query's constant value to the desired type was
wrong, this is wronger. :-("  I agree with that analysis.  As I tried
to explain in my last email, if you've got something like foo >
'blarfle'::type1 and foo > 'hoge'::type2, there may actually be no way
at all to determine which of those clauses is more restrictive.  The
fact that > was used in the query to compare foo with a value of type1
and, separately, with a value of type2 means that those operators
exist, but it does not follow that the opfamily provides an operator
which can compare type1 to type2.  As far as I can see, what this
means is that, in general, the approach the patch takes to eliminating
redundant clauses just doesn't work; and in the general case I don't
think there's much hope of saving it.  The question of whether the
patch does too much work at execution time or not is maybe arguable --
my position is that it does -- but working properly in the face of
cross-type comparisons is non-negotiable.

The use of evaluate_expr() is also completely wrong and has got to be
fixed.  I already wrote about that upthread so I won't repeat it here.
I'm pretty sure that the current design, if allowed to stand, would
have lots of bad consequences.

As I understand it, Amit is currently hacking on the patch to try to
fix these issues.  If he comes up with something that works properly
with cross-type comparisons and doesn't abuse evaluate_expr() but
still does more work than I'd ideally prefer at execution time, I'll
consider holding my nose and consider it anyway.  But considering the
amount of rework that I think is needed, I don't really see why we
wouldn't adopt a design that minimizes execution time work, too.

In short, I don't think I'm trying to make something that works quite
well work less well, because I don't think the patch as it stands can
be correctly described as working quite well.

> Let's say there was some view like:
>
> CREATE VIEW vw_ledger_2018 AS SELECT * FROM ledger WHERE postdate
> BETWEEN '2018-01-01' AND '2018-12-13';
>
> And a user comes along and does:
>
> SELECT * FROM vw_ledger_2018 WHERE postdate BETWEEN '2018-03-01' AND
> '2018-03-31'
>
> If we just take the first from each op strategy then we'll not have
> managed to narrow the case down to just the March partition. You might
> argue that this should be resolved at some higher level in the
> planner, but that does nothing for the run-time pruning case.

Yeah, that's a good example of how this could happen in real life.

So far I see three ways forward here:

1. If we've got multiple redundant quals, ignore all but one of them
for purposes of partition pruning.  Hope users don't get mad.

2. If we've got multiple redundant quals, do multiple checks.  Hope
this isn't annoyingly slow (or too much ugly code).

3. If we've got multiple redundant quals but no cross-type operators
are in use, evaluate all of the expressions and pick the highest or
lowest value as appropriate.  Otherwise fall back to #1 or #2.  For
bonus points, do this when cross-type operators ARE in use and the
additional cross-type operators that we need to figure out the highest
or lowest value, as appropriate, is also available.

I'm OK with any of those approaches; that is, I will not seek to block
a merely patch on the basis of which of those options it chooses.  I
think they are all defensible.  Options that are not OK include (a)
trying to cast a value of one type to another type, because that could
turn a query that would have simply returned no rows into an error
case or (b) supposing that all types in an opfamily are binary
coercible to each other, because that's just plain wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Charles Cui
Date:
Subject: Re: GSOC 2018 ideas
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping