Re: Too many BitmapAnds in the wild - Mailing list pgsql-general

From Tom Lane
Subject Re: Too many BitmapAnds in the wild
Date
Msg-id 28692.1537222312@sss.pgh.pa.us
Whole thread Raw
In response to Too many BitmapAnds in the wild  (Seamus Abshere <seamus@abshere.net>)
List pgsql-general
Seamus Abshere <seamus@abshere.net> writes:
> hey,
> We see a fair number of incorrectly chosen BitmapAnd plans in the wild at Faraday... enough that googling the problem
endsup at our old posts to this mailing list 😀. An attractive solution was proposed by Jeff Janes [1] 

> - *cost += 0.1 * cpu_operator_cost * path->rows;
> + *cost += 6 * cpu_operator_cost * path->rows;

> It appears this constant hasn't been changed for 7 years [2].

AFAICT you are pointing at this:

cost_bitmap_tree_node(Path *path, Cost *cost, Selectivity *selec)
{
    if (IsA(path, IndexPath))
    {
        *cost = ((IndexPath *) path)->indextotalcost;
        *selec = ((IndexPath *) path)->indexselectivity;

        /*
         * Charge a small amount per retrieved tuple to reflect the costs of
         * manipulating the bitmap.  This is mostly to make sure that a bitmap
         * scan doesn't look to be the same cost as an indexscan to retrieve a
         * single tuple.
         */
        *cost += 0.1 * cpu_operator_cost * path->rows;
    }

which does not seem like the place to be putting your thumb on the scale
if you want to discourage bitmap ANDs.  That would increase the estimate
for plain bitmap scans as well as ANDs.  Moreover, there's no plausible
reasoning for this adjustment being more than a minimal one.

There's a different fudge factor in cost_bitmap_and_node (also in
cost_bitmap_or_node) that would probably be more plausible to twiddle:

     * The runtime cost of the BitmapAnd itself is estimated at 100x
     * cpu_operator_cost for each tbm_intersect needed.  Probably too small,
     * definitely too simplistic?

        if (l != list_head(path->bitmapquals))
            totalCost += 100.0 * cpu_operator_cost;

I'd be the first to say that that number has no experimental basis,
plus modeling it as a quasi-constant is theoretically wrong; surely
it ought to vary depending on how big we think the bitmap might be.

I'm not, however, very enamored of just replacing the "100" with some
other random constant without any evidence to back up the change.
Let's see some test cases, at least.

            regards, tom lane


pgsql-general by date:

Previous
From: marcelo
Date:
Subject: Re: Logical locking beyond pg_advisory
Next
From: Michael Paquier
Date:
Subject: Re: scram-sha-256 authentication broken in FIPS mode