Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
Date
Msg-id CAFiTN-sXB6RZWhaYySpFrPJe3-9f0KBu+zAu4OOGDM5_q+pxqA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
List pgsql-hackers
On Fri, Oct 6, 2017 at 9:21 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
>> <a.kuzmenkov@postgrespro.ru> wrote:
>>>
>>>> Analysis: The estimated value of the lossy_pages is way higher than
>>>> its actual value and reason is that the total_pages calculated by the
>>>> "Mackert and Lohman formula" is not correct.
>>>
>>>
>>> I think the problem might be that the total_pages includes cache effects and
>>> rescans. For bitmap entries we should use something like relation pages *
>>> selectivity.
>>
>> I have noticed that for the TPCH case if I use "pages * selectivity"
>> it give me better results, but IMHO directly multiplying the pages
>> with selectivity may not be the correct way to calculate the number of
>> heap pages it can only give the correct result when all the TID being
>> fetched are clustered.  But on the other hand "Mackert and Lohman
>> formula" formulae consider that all the TID's are evenly distributed
>> across the heap pages which can also give the wrong estimation like we
>> are seeing in our TPCH case.
>
> I agree with the point that the total_pages included the cache effects
> and rescan when loop_count > 1, that can be avoided if we always
> calculate heap_pages as it is calculated in the else part
> (loop_count=0).  Fortunately, in all the TPCH query plan what I posted
> up thread bitmap scan was never at the inner side of the NLJ so
> loop_count was always 0.  I will fix this.

I have fixed the issue. Now, for calculating the lossy pages it will
not consider the rescan.  As mentioned above it will not affect the
TPCH plan so haven't measured the performance again.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] [PATCH] Lockable views
Next
From: Andres Freund
Date:
Subject: [HACKERS] Re: [COMMITTERS] pgsql: Add configure infrastructure to detectsupport for C99's restric