Home > mailing lists
Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Use of additional index columns in rows filtering
Date	July 23, 2023 19:56:11
Msg-id	CAH2-WzmAJsGqkfa0fEhk20NowmduwAim43ajBDRU_BAPaBB85w@mail.gmail.com Whole thread Raw
In response to	Re: Use of additional index columns in rows filtering (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List	pgsql-hackers
Tree view
On Sun, Jul 23, 2023 at 5:04 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> Sorry, I think I meant 'cost estimates', not the selectivity estimates.
> If you convert the original "simple" clauses into the more complex list,
> presumably we'd cost that differently, right? I may be entirely wrong,
> but my intuition is that costing these tiny clauses will be much more
> difficult than costing the original clauses.

I think that that's definitely true (it is more difficult), but that
there may be a bigger picture.

> Right, I agree with this reasoning in principle.
>
> But I'm getting a bit lost regarding what's the proposed costing
> strategy.

To be clear, I don't have a clue how to better estimate the
cardinality of multiple attributes from a composite index. The big and
immediate change to the SAOP costing with my patch is that
genericcostestimate/btcostestimate can safely assume that visiting
each leaf page more than once is a physical impossibility. Because it
is. It is no longer necessary to treat SAOPs similarly to a nested
loop join during costing, which is how it works today.

Now, whenever you add increasingly complicated clauses to a
multi-column SAOP query (like the ones I've shown you), it makes sense
for the cost to "saturate" at a certain point. That should be
representative of the physical reality, for both CPU costs and I/O
costs. Right now the worst case is really relevant to the average
case, since the risk of the costs just exploding at runtime is very
real.

If the only problem in this area was the repeated accesses to the same
leaf pages (accesses that happen in very close succession anyway),
then all of this would be a nice win, but not much more. It certainly
wouldn't be expected to change the way we think about stuff that isn't
directly and obviously relevant. But, it's not just the index pages.
Once you start to consider the interactions with filter/qpquals, it
gets much more interesting. Now you're talking about completely
avoiding physical I/Os for heap accesses, which has the potential to
make a dramatic difference to some types of queries, particularly in
the worst case.

> It's hard to follow threads spanning days, with various other
> distractions, etc.

I have to admit that my thinking on this very high level stuff is
rather unrefined. As much as anything else, I'm trying to invent (or
discover) a shared vocabulary for discussing these issues. I might
have gone about it clumsily, at times. I appreciate being able to
bounce this stuff off you.

> I don't have a very good idea how sensitive the cost is to selectivity
> changes, which I think is crucial for making judgments.

I'm not trying to find a way for the optimizer to make better
judgments about costing with a multi-column index. What I'm suggesting
(rather tentatively) is to find a way for it to make fewer (even no)
judgements at all.

If you can find a way of reducing the number of possible choices
without any real downside -- in particular by just not producing index
paths that cannot possibly be a win in some cases -- then you reduce
the number of bad choices. The challenge is making that kind of
approach in the optimizer actually representative of the executor in
the real world. The executor has to have robust performance under a
variety of runtime conditions for any of this to make sense.

> > It's natural to think of things like this _bt_readpage optimization as
> > something that makes existing types of plan shapes run faster. But you
> > can also think of them as things that make new and fundamentally
> > better plan shapes feasible, by making risky things much less risky.
> >
>
> That'd be an interesting optimization, but I don't think that matters
> for this patch, as it's not messing with index scan keys at all. I mean,
> it does not affect what scan keys get passed to the AM at all, or what
> scan keys are required. And it does not influence what the AM does. So
> all this seems interesting, but rather orthogonal to this patch.

Your patch is approximately the opposite of what I'm talking about, in
terms of its structure. The important commonality is that each patch
adds "superfluous" quals that can be very useful at runtime, under the
right circumstances -- which can be hard to predict. Another
similarity is that both patches inspire some of the same kind of
lingering doubts about extreme cases -- cases where (somehow) the
usual/expected cost asymmetry that usually works in our favor doesn't
apply.

My current plan is to post v1 of my patch early next week. It would be
better to discuss this on the thread that I create for that patch.

You're right that "exploiting index ordering" on the index AM side is
totally unrelated to your patch. Sorry about that.

> I was rather thinking about maybe relaxing the rules about which index
> paths we create, to include indexes that might be interesting thanks to
> index-only filters (unless already picked thanks to sorting).

That seems like it would make sense. In general I think that we
overuse and over rely on bitmap index scans -- we should try to chip
away at the artificial advantages that bitmap index scans have, so
that we can get the benefit of index scans more often -- accessing the
heap/VM inline does open up a lot of possibilities that bitmap scans
will never be able to match. (BTW, I'm hoping that your work on index
prefetching will help with that.)

I see that your patch has this diff change (which is 1 out of only 2
or 3 plan changes needed by the patch):

--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1838,18 +1838,13 @@ DROP TABLE onek_with_null;
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);
-                                                               QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
- Bitmap Heap Scan on tenk1
-   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR ((thousand
= 42) AND (tenthous = 3)) OR ((thousand = 42) AND (tenthous = 42)))
-   ->  BitmapOr
-         ->  Bitmap Index Scan on tenk1_thous_tenthous
-               Index Cond: ((thousand = 42) AND (tenthous = 1))
-         ->  Bitmap Index Scan on tenk1_thous_tenthous
-               Index Cond: ((thousand = 42) AND (tenthous = 3))
-         ->  Bitmap Index Scan on tenk1_thous_tenthous
-               Index Cond: ((thousand = 42) AND (tenthous = 42))
-(9 rows)
+                              QUERY PLAN
+-----------------------------------------------------------------------
+ Index Scan using tenk1_thous_tenthous on tenk1
+   Index Cond: (thousand = 42)
+   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
+   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
+(4 rows)

That does seem like an improvement to me. But, an even better plan is
possible. Or would be possible once this SAOP-OR-transformation patch
is in place (if it was then combined with my own SAOP patch):


https://www.postgresql.org/message-id/flat/919bfbcb-f812-758d-d687-71f89f0d9a68%40postgrespro.ru#9d877caf48c4e331e507b5c63914228e

That could give us an index scan plan that is perfectly efficient.
Like the new plan shown here, it would pin/lock a single leaf page
from the tenk1_thous_tenthous index, once. But unlike the plan shown
here, it would be able to terminate as soon as the index scan reached
an index tuple where thousand=42 and tenthous>42. That makes a
significant difference if we have to do heap accesses for those extra
tuples.

Plus this hypothetical other plan of mine would be more robust: it
would tolerate misestimates. It happens to be the case that there just
aren't that many tuples with thousand=42 -- they take up only a
fraction of one leaf page. But...why take a chance on that being true,
if we don't have to? The old bitmap index scan plan has this same
advantage already -- it is robust in the very same way. Because it
managed to pass down specific "tenthous" index quals. It would be nice
to have both advantages, together. (To be clear, I'm certainly not
suggesting that the example argues against what you want to do here --
it's just an example that jumped out at me.)

Perhaps this example will make my confusion about the boundaries
between each of our patches a bit more understandable. I was confused
-- and I still am. I look forward to being less confused at some point
in the future.

--
Peter Geoghegan
pgsql-hackers by date:
From: Pavel Luzanov
Date: 23 July 2023, 18:28:46
Subject: multiple membership grants and information_schema.applicable_roles
From: Tom Lane
Date: 23 July 2023, 20:03:36
Subject: Re: multiple membership grants and information_schema.applicable_roles
Re: Use of additional index columns in rows filtering - Mailing list pgsql-hackers

Previous

Next