Bitmap table scan cost per page formula - Mailing list pgsql-hackers

From Haisheng Yuan
Subject Bitmap table scan cost per page formula
Date
Msg-id CAPW_87H0X2ZAr_oZu6MTS62h=_oyi9NWU=jnPApcxjB8BfcK0w@mail.gmail.com
Whole thread Raw
Responses Re: Bitmap table scan cost per page formula  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Bitmap table scan cost per page formula  (Robert Haas <robertmhaas@gmail.com>)
Re: Bitmap table scan cost per page formula  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
Hi hackers,

This is Haisheng Yuan from Greenplum Database.

We had some query in production showing that planner favors seqscan over bitmapscan, and the execution of seqscan is 5x slower than using bitmapscan, but the cost of bitmapscan is 2x the cost of seqscan. The statistics were updated and quite accurate. 

Bitmap table scan uses a formula to interpolate between random_page_cost and seq_page_cost to determine the cost per page. In Greenplum Database, the default value of random_page_cost is 100, the default value of seq_page_cost is 1. With the original cost formula, random_page_cost dominates in the final cost result, even the formula is declared to be non-linear. However, it is still more like linear, which can't reflect the real cost per page when a majority of pages are fetched. Therefore, the cost formula is updated to real non-linear function to reflect both random_page_cost and seq_page_cost for different percentage of pages fetched. 

Even though PostgreSQL has much smaller default value of random_page_cost (4), the same problem exists there if we change the default value. 

Old cost formula:
cost_per_page = random_page_cost - (random_page_cost - seq_page_cost) * sqrt(pages_fetched / T);
Inline image 1

New cost formula:
cost_per_page = seq_page_cost * pow(random_page_cost / seq_page_cost, 1 - sqrt(pages_fetched / T));
Inline image 2

Below is the graph (credit to Heikki) that plots the total estimated cost of a bitmap heap scan, where table size is 10000 pages, and random_page_cost=10 and seq_page_cost=1. X axis is the number of pages fetche. I.e. on the left, no pages are fetched, and on the right end, at 10000, all pages are fetched. The original formula is in black, the new formula in red, and the horizontal line, in blue, shows the cost of a Seq Scan.
Inline image 3


Thoughts? Any better ideas?

Thanks!
Haisheng Yuan
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] replace GrantObjectType with ObjectType