Re: Bitmap scan is undercosted? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Bitmap scan is undercosted?
Date
Msg-id CAMkU=1zVsx-D4NVnpUpWcRehEk8GxA9zdLxe9XWCak_gyvOY4w@mail.gmail.com
Whole thread Raw
In response to Re: Bitmap scan is undercosted?  (Vitaliy Garnashevich <vgarnashevich@gmail.com>)
Responses Re: Bitmap scan is undercosted?
List pgsql-performance
On Tue, Dec 5, 2017 at 11:06 PM, Vitaliy Garnashevich <vgarnashevich@gmail.com> wrote:


This is very cool, thanks.
 
I've tried to create a better test case:
- Increase shared_buffers and effective_cache_size to fit whole database, including indexes.
- Use random(), to avoid correlation between the filtered values.
- Make both columns of integer type, to avoid special cases with boolean (like the one happened with CREATE STATISTICS).
- Flush OS disk cache and then try running the query several times, to get both cold-cache results and all-in-ram results.
- There are several tests, with different frequency of the selected values in the two columns: [1/10, 1/10], [1/50, 1/10], [1/100, 1/10].
- There is a split of cost by contribution of each of its components: seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_index_tuple_cost, cpu_operator_cost. The EXPLAIN is run for each component, every time with only one of the components set to non-zero.

Where you have question marks, that means you could not force it into the plan you wanted with all-but-one settings being zero?

 
- The test was run on a Digitalocean VM: Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-101-generic x86_64), 2 GB RAM,  2 core CPU, SSD; PostgreSQL 9.5.10.


shared_buffers = 1024MB
effective_cache_size = 1024MB

I would set this even higher.

 

work_mem = 100MB

create table aaa as select floor(random()*10)::int num, (random()*10 < 1)::int flag from generate_series(1, 10000000) id;
create table aaa as select floor(random()*50)::int num, (random()*10 < 1)::int flag from generate_series(1, 10000000) id;
create table aaa as select floor(random()*100)::int num, (random()*10 < 1)::int flag from generate_series(1, 10000000) id;

create index i1 on aaa  (num);
create index i2 on aaa  (flag);

set enable_bitmapscan = on; set enable_indexscan = off;  set enable_seqscan = off;
set enable_bitmapscan = off; set enable_indexscan = on;  set enable_seqscan = off;
set enable_bitmapscan = off; set enable_indexscan = off;  set enable_seqscan = on;

set seq_page_cost = 1.0; set random_page_cost = 1.0; set cpu_tuple_cost = 0.01; set cpu_index_tuple_cost = 0.005; set cpu_operator_cost = 0.0025;

explain (analyze,verbose,costs,buffers) select * from aaa where num = 1 and flag = 1;

One thing to try is to use explain (analyze, timing off), and then get the total execution time from the summary line at the end of the explain, rather than from "actual time" fields.  Collecting the times of each individual step in the execution can impose a lot of overhead, and some plans have more if this artificial overhead than others.  It might change the results, or it might not.  I know that sorts and hash joins are very sensitive to this, but I don't know about bitmap scans.

....

What seems odd to me is that in different kinds of tests (with different frequency of column values):

i1 Rows Removed by Filter = 900156, 179792, 89762 (decreased a lot)
i1 buffers = 46983, 44373, 39928 (decreased, but not a lot)

To filter out 89762 tuples, you first have to look them up in the table, and since they are randomly scattered that means you hit nearly every page in the table at least once.  In fact, I don't understand how the empirical number of buffers hits can be only 46983 in the first case, when it has to visit 1,000,000 rows (and reject 90% of them).  I'm guessing that it is because your newly created index is sorted by ctid order within a given index value, and that the scan holds a pin on the table page between visits, and so doesn't count as a hit if it already holds the pin.  

You could try to create an empty table, create the indexes, then populate the table with your random select query, to see if that changes the buffer hit count.  (Note that this wouldn't change the cost estimates much even it does change the measured number of buffer hits, because of effective_cache_size.  It knows you will be hitting ~47,000 pages ~25 times each, and so only charges you for the first time each one is hit.)
 
i1 best case time = 756.045, 127.814, 79.492 (decreased a lot, as well as probably average case too)
i1 cost estimates = 67358.15, 48809.34, 46544.80 (did not decrease a lot)

Right.  Your best case times are when the data is completely in cache.  But your cost estimates are dominated by *_page_cost, which are irrelevant when the data is entirely in cache.  You are telling it to estimate the worse-case costs, and it is doing that pretty well (within this one plan).
 

i2 Rows Removed by Filter = 900835, 980350, 991389
i2 buffers = 46985, 46985, 46987
i2 best case time = 377.554, 346.481, 387.874
i2 cost estimates = 39166.34, 39247.83, 40167.34

It's odd that increase in actual execution time for "i1" was not reflected enough in cost estimates.

No, that's entirely expected given your settings.  As long as you are charging disk-read costs for reading data from RAM, you will never get realistic cost estimates.  Remember, you aren't trying to tune your production server here, you are trying to get a test case that can be dissected.

Perhaps you think that effective_cache_size is supposed to fix this for you.  But it only accounts for blocks hit repeatedly within the same query.  It has no idea that you are running a bunch of other queries on the same data back to back, and so it will likely find that data already in memory from one query to the next.  That knowledge (currently) has to be baked into your *_page_cost setting.  There is also no way to say that data for one table is more likely to be found in cache than data for another table is.
 

The cost even didn't go below "i2" costs.

That is partially because i2 benefits from a spuriously large correlation due to an artifact of how stats are computed.  See another thread that has spun off of this one.  

If you want to see the effect of this on your cost estimates, you can do something like:

update pg_statistic set stanumbers2='{0}' where starelid='aaa'::regclass and staattnum=2;
 
Cheers,

Jeff

pgsql-performance by date:

Previous
From: jwhiting@redhat.com
Date:
Subject: Re: Prepared Transactions
Next
From: Jeff Janes
Date:
Subject: Re: Bitmap scan is undercosted? - overestimated correlation and cost_index