Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance

From Robert Haas
Subject Re: anti-join chosen even when slower than old plan
Date
Msg-id AANLkTin2KxVit+eKb3Dc45ZitD5YACMvt9oVGe6K1vTP@mail.gmail.com
Whole thread Raw
In response to Re: anti-join chosen even when slower than old plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: anti-join chosen even when slower than old plan  (Kenneth Marshall <ktm@rice.edu>)
List pgsql-performance
On Wed, Nov 10, 2010 at 6:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> Unfortunately, to know how much data we're going to grovel
>>> through, we need to know the plan; and to decide on the right
>>> plan, we need to know how much data we're going to grovel through.
>
>> And that's where they've been ending.
>
>> The only half-sane answer I've thought of is to apply a different
>> cost to full-table or full-index scans based on the ratio with
>> effective cache size.

Kevin, yes, good point.  Bravo!  Let's do that.  Details TBD, but
suppose effective_cache_size = 1GB.  What we know for sure is that a 4
GB table is not going to be fully cached but a 4 MB table may well be.
 In fact, I think we should assume that the 4 MB table IS cached,
because the point is that if it's used at all, it soon will be.  It's
almost certainly a bad idea to build a plan around the idea of
minimizing reads from that 4 MB table in favor of doing a substantial
amount of additional work somewhere else.  I suppose this could break
down if you had hundreds and hundreds of 4 MB tables all of which were
accessed regularly, but that's an unusual situation, and anyway it's
not clear that assuming them all uncached is going to be any better
than assuming them all cached.

> This might have some connection to some rather half-baked ideas I've
> been having in connection with the generalized-inner-indexscan problem.
> I don't have anything in the way of a coherent presentation to make yet,
> but the thing I'm being forced to realize is that sane modeling of a
> complex subplan that's on the inside of a nestloop join requires
> treating *every* scan type as having different costs "the first time"
> versus "during rescan".  If the total amount of data touched in the
> query is less than effective_cache_size, it's not unreasonable to
> suppose that I/O costs during rescan might be zero, even for a seqscan or
> a non-parameterized indexscan.  In fact, only parameterized indexscans
> would be able to touch pages they'd not touched the first time, and so
> they ought to have higher not lower rescan costs in this environment.
> But once the total data volume exceeds effective_cache_size, you have to
> do something different since you shouldn't any longer assume the data is
> all cached from the first scan.  (This isn't quite as hard as the case
> you're talking about, since I think the relevant data volume is the sum
> of the sizes of the tables used in the query; which is easy to
> estimate at the start of planning, unlike the portion of the tables
> that actually gets touched.)

Well, we don't want the costing model to have sharp edges.
effective_cache_size can't be taken as much more than an educated
guess, and what actually happens will depend a lot on what else is
going on on the system.  If only one query is running on a system at a
time and it is repeatedly seq-scanning a large table, the cost of
reading pages in will be very small until the table grows large enough
that you can't fit the whole thing in memory at once, and then will
abruptly go through the roof.  But realistically you're not going to
know exactly where that edge is going to be, because you can't predict
exactly how much concurrent activity there will be, for example, or
how much work_mem allocations will push out of the OS buffer cache.
So I'm thinking we should start the costs at something like 0.05/0.05
for tables that are much smaller than effective_cache_size and ramp up
to 4/1 for tables that are larger than effective_cache_size.  Maybe
just by linearly ramping up, although that has a certain feeling of
being without mathemetical soundness.

> An idea that isn't even half-baked yet is that once we had a cost model
> like that, we might be able to produce plans that are well-tuned for a
> heavily cached environment by applying the "rescan" cost model even to
> the first scan for a particular query.  So that might lead to some sort
> of "assume_cached" GUC parameter, and perhaps Kevin could tune his
> reporting queries by turning that off instead of messing with individual
> cost constants.

I think the real goal here should be to try to avoid needing a GUC.  A
lot of people could benefit if the system could make some attempt to
recognize on its own which queries are likely to be cached.  We
already have parameters you can hand-tune for each query as necessary.
 Being able to set some parameters system-wide and then get sensible
behavior automatically would be much nicer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: anti-join chosen even when slower than old plan
Next
From: Mladen Gogala
Date:
Subject: Re: anti-join chosen even when slower than old plan