Re: Parameterized paths vs index clauses extracted from OR clauses - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parameterized paths vs index clauses extracted from OR clauses
Date
Msg-id CA+Tgmoa+RSsRbePxfsxxbQOyj3haEVSZO77=hJCTJUE+RQfE1g@mail.gmail.com
Whole thread Raw
In response to Re: Parameterized paths vs index clauses extracted from OR clauses  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Mar 5, 2013 at 11:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Mar 5, 2013 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If "foo OR bar" is useful as an indexqual condition in the inner scan,
>>> that's one thing.  But if it isn't, the cycles expended to check it in
>>> the inner scan are possibly wasted, because we'll still have to check
>>> the full original OR clause later.  It's possible that the filter
>>> condition removes enough rows from the inner scan's result to justify
>>> the redundant checks, but it's at least as possible that it doesn't.
>
>> Yeah, that's pretty unappealing.  It probably doesn't matter much if
>> foo is just a column reference, but what if it's an expensive
>> function?  For that matter, what if it's a volatile function that we
>> can't execute twice without changing the results?
>
> Well, *that* worry at least is a nonissue: if the clause contains
> volatile functions then it's not a candidate to be an indexqual anyway.
> The code that pulls out these simplified OR clauses is only looking for
> clauses that can be shown to match existing indexes, so it won't pick
> anything like that.  But expensive functions could be a hazard.
>
>>> (Hm, maybe what we need is a marker for "enforce this clause
>>> only if you feel like it"?)
>
>> Not sure I get the parenthesized bit.
>
> I was thinking that we'd extract the simplified clause and then mark it
> as something to be used only if it can be used as an indexqual.
> However, on second thought that still leaves us with the problem that
> some parameterized paths yield more rows than others.

OK, that's what I was thinking, and why I was confused.

> Maybe that
> assumption simply has to go ...

That assumption is pretty darn important from a planning-time
perspective, though.  I think if we rip it out we'd better have some
idea what we're going to do instead.  This parameterized path stuff is
very cool, but just like any other planner trick it has to be worth
the cycles spent at runtime.

On further reflection, I'm wondering about this part from your original email:

> We do get as far as finding that out and
> building a bitmap scan path, but the path is marked as yielding 2918
> rows (the whole product table), not the 2 rows it actually will
> produce.  That's because the parameterized path code is designed to
> assume that all identically-parameterized scans will produce the same
> number of rows, and it's already computed that row estimate without
> the benefit of the extracted OR clause.

Can't we fix this in some more-localized way?  I mean, bitmap index
scans are a bit of a special case, because the individual quals get
their own plan tree nodes.   With a sequential scan, index scan, or
index-only scan, any scan of the same relation can incorporate all of
the same quals.  But if we implement a scan for a = 1 or b = 1 as a
bitmap-or of two bitmap-scans, then suddenly there's a node in the
tree that can only accommodate only the a = 1 qual and another that
can accommodate only the b = 1 qual.  Expecting those to have the the
same row-count is clearly nonsense, but it feels like an artifact of
the representational choice.  One can imagine a database where a
sequential scan with a complex filter condition is represented as a
filtering node atop a sequential-scan node rather than as a sequential
scan node with a filter condition glued on to it; conversely, one
could imagine a representation for bitmap scans where a bitmap-or
operation comes out as a single node with a more-complex internal
structure.  I'm not actually proposing that we change the
representational choices we've made here (though I wouldn't be averse
to it), but the point I'm making is that it seems like the behavior
that's causing the problem is a representational artifact, and
thinking about the problem that way might suggest a narrower fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: sql_drop Event Triggerg
Next
From: Bruce Momjian
Date:
Subject: Re: Materialized views WIP patch