Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance

From Vitalii Tymchyshyn
Subject Re: anti-join chosen even when slower than old plan
Date
Msg-id 4CDD2DC1.80107@gmail.com
Whole thread Raw
In response to Re: anti-join chosen even when slower than old plan  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Responses Re: anti-join chosen even when slower than old plan  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
List pgsql-performance
12.11.10 12:56, Cédric Villemain написав(ла):
> I supposed it was an answer to my mail but not sure... please keep
> CC'ed people, it is easier to follow threads (at least for me)
>
OK
> 2010/11/12 Vitalii Tymchyshyn<tivv00@gmail.com>:
>
>> I'd say there are two Qs here:
>>
>> 1) Modify costs based on information on how much of the table is in cache.
>> It would be great  if this can be done, but I'd prefer to have it as admin
>> knobs (because of plan stability). May be both admin and automatic ways can
>> be followed with some parallel (disableable) process modify knobs on admin
>> behalf. In this case different strategies to automatically modify knobs can
>> be applied.
>>
> OS cache is usualy stable enough to keep your plans stable too, I think.
>
Not if it is on edge. There are always edge cases where data fluctuates
near some threshold.
>
>> 2) Modify costs for part of table retrieval. Then you need to define "part".
>> Current ways are partitioning and partial indexes. Some similar to partial
>> index thing may be created, that has only "where" clause and no data. But
>> has statistics and knobs (and may be personal bufferspace if they are
>> introduced). I don't like to gather data about "last X percents" or like,
>> because it works only in clustering and it's hard for optimizer to decide if
>> it will be enough to scan only this percents for given query.
>>
> Modifying random_page_cost and sequential_page_cost thanks to
> statistics about cached blocks can be improved if we know the
> distribution.
>
> It does not mean : we know we have last 15% in cache, and we are goign
> to request those 15%.
>

You mean *_cost for the whole table, don't you? That is case (1) for me.
Case (2) is when different cost values are selected based on what
portion of table is requested in the query. E.g. when we have data for
the whole day in one table, data for the last hour is cached and all the
other data is not. Optimizer then may use different *_cost for query
that requires all the data and for query that requires only last hour
data. But, as I've said, that is much more complex task then (1).

Best regards, Vitalii Tymchyshyn


pgsql-performance by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: anti-join chosen even when slower than old plan
Next
From: Cédric Villemain
Date:
Subject: Re: anti-join chosen even when slower than old plan