Thread: Slow query + why bitmap index scan??
This will be simple question to answer. :-) There is a single table: select count(*) from product_price_history -- 12982555 rows This table has exactly one index and on primary key constraint: CREATE INDEX idx_product_price_history_id_hdate ON product_price_history USING btree (id, hdate); ALTER TABLE product_price_history ADD CONSTRAINT pk_product_price_history PRIMARY KEY(hid); No more constraints or indexes defined on this table. Rows are never updated or deleted in this table, they are only inserted. It was vacuum-ed and reindex-ed today. Stats on the table: seq scans=13, index scans=108, table size=3770MB, toast table size=8192 bytes, indexes size=666MB This query: select hid from product_price_history where id=35547581 Returns 759 rows in 8837 msec! How can this be that slow??? The query plan is: "Bitmap Heap Scan on product_price_history (cost=13.90..1863.51 rows=472 width=8)" " Recheck Cond: (id = 35547581)" " -> Bitmap Index Scan on idx_product_price_history_id_hdate (cost=0.00..13.78 rows=472 width=0)" " Index Cond: (id = 35547581)" I don't understand why PostgreSQL uses bitmap heap scan + bitmap index scan? Why not just use an regular index scan? Data in a btree index is already sorted. A normal index scan should take no more than a few page reads. This sould never take 8 seconds. Thanks, Laszlo
* Laszlo Nagy: > This query: > > select hid from product_price_history where id=35547581 > > Returns 759 rows in 8837 msec! How can this be that slow??? If most records are on different heap pages, processing this query requires many seeks. 11ms per seek is not too bad if most of them are cache misses. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On 2011-01-12 14:42, Florian Weimer wrote: > * Laszlo Nagy: > >> This query: >> >> select hid from product_price_history where id=35547581 >> >> Returns 759 rows in 8837 msec! How can this be that slow??? > If most records are on different heap pages, processing this query > requires many seeks. 11ms per seek is not too bad if most of them are > cache misses. How about this: select id,hdate from product_price_history where id=35547581 -- 759 rows, 8837 ms Query time average: 3 sec. Query plan: "Bitmap Heap Scan on product_price_history (cost=13.91..1871.34 rows=474 width=16)" " Recheck Cond: (id = 35547582)" " -> Bitmap Index Scan on idx_product_price_history_id_hdate (cost=0.00..13.79 rows=474 width=0)" " Index Cond: (id = 35547582)" Why still the heap scan here? All fields in the query are in the index... Wouldn't a simple index scan be faster? (This is only a theoretical question, just I'm curious.) My first idea to speed things up is to cluster this table regularly. That would convert (most of the) rows into a few pages. Few page reads -> faster query. Is it a good idea? Another question. Do you think that increasing shared_mem would make it faster? Currently we have: shared_mem = 6GB work_mem = 512MB total system memory=24GB Total database size about 30GB, but there are other programs running on the system, and many other tables. Thanks, Laszlo
On Wed, Jan 12, 2011 at 03:21:45PM +0100, Laszlo Nagy wrote: > On 2011-01-12 14:42, Florian Weimer wrote: >> * Laszlo Nagy: >> >>> This query: >>> >>> select hid from product_price_history where id=35547581 >>> >>> Returns 759 rows in 8837 msec! How can this be that slow??? >> If most records are on different heap pages, processing this query >> requires many seeks. 11ms per seek is not too bad if most of them are >> cache misses. > How about this: > > select id,hdate from product_price_history where id=35547581 -- 759 rows, > 8837 ms > Query time average: 3 sec. > Query plan: > > "Bitmap Heap Scan on product_price_history (cost=13.91..1871.34 rows=474 > width=16)" > " Recheck Cond: (id = 35547582)" > " -> Bitmap Index Scan on idx_product_price_history_id_hdate > (cost=0.00..13.79 rows=474 width=0)" > " Index Cond: (id = 35547582)" > > Why still the heap scan here? All fields in the query are in the index... > Wouldn't a simple index scan be faster? (This is only a theoretical > question, just I'm curious.) > Because of PostgreSQL's MVCC design, it must visit each heap tuple to check its visibility as well as look it up in the index. > My first idea to speed things up is to cluster this table regularly. That > would convert (most of the) rows into a few pages. Few page reads -> faster > query. Is it a good idea? > Yes, clustering this table would greatly speed up this type of query. > Another question. Do you think that increasing shared_mem would make it > faster? I doubt it. > > Currently we have: > > shared_mem = 6GB > work_mem = 512MB > total system memory=24GB > > Total database size about 30GB, but there are other programs running on the > system, and many other tables. > > Thanks, > > Laszlo > Clustering is your best option until we get indexes with visibility information. Cheers, Ken
Laszlo Nagy <gandalf@shopzeus.com> wrote: > shared_mem = 6GB > work_mem = 512MB > total system memory=24GB In addition to the good advice from Ken, I suggest that you set effective_cache_size (if you haven't already). Add whatever the OS shows as RAM used for cache to the shared_mem setting. But yeah, for your immediate problem, if you can cluster the table on the index involved, it will be much faster. Of course, if the table is already in a useful order for some other query, that might get slower, and unlike some other products, CLUSTER in PostgreSQL doesn't *maintain* that order for the data as new rows are added -- so this should probably become a weekly (or monthly or some such) maintenance operation. -Kevin
On 2011-01-12 15:36, Kevin Grittner wrote: > Laszlo Nagy<gandalf@shopzeus.com> wrote: > >> shared_mem = 6GB >> work_mem = 512MB >> total system memory=24GB > > In addition to the good advice from Ken, I suggest that you set > effective_cache_size (if you haven't already). Add whatever the OS > shows as RAM used for cache to the shared_mem setting. It was 1GB. Now I changed to 2GB. Although the OS shows 9GB inactive memory, we have many concurrent connections to the database server. I hope it is okay to use 2GB. > > But yeah, for your immediate problem, if you can cluster the table > on the index involved, it will be much faster. Of course, if the > table is already in a useful order for some other query, that might > get slower, and unlike some other products, CLUSTER in PostgreSQL > doesn't *maintain* that order for the data as new rows are added -- > so this should probably become a weekly (or monthly or some such) > maintenance operation. Thank you! After clustering, queries are really fast. I don't worry about other queries. This is the only way we use this table - get details for a given id value. I put the CLUSTER command into a cron script that runs daily. For the second time, it took 2 minutes to run so I guess it will be fine. Thank you for your help. Laszlo
Laszlo Nagy <gandalf@shopzeus.com> wrote: >> In addition to the good advice from Ken, I suggest that you set >> effective_cache_size (if you haven't already). Add whatever the >> OS shows as RAM used for cache to the shared_mem setting. > It was 1GB. Now I changed to 2GB. Although the OS shows 9GB > inactive memory, we have many concurrent connections to the > database server. I hope it is okay to use 2GB. effective_cache_size doesn't cause any RAM to be allocated, it's just a hint to the costing routines. Higher values tend to favor index use, while lower values tend to favor sequential scans. I suppose that if you regularly have many large queries running at the same moment you might not want to set it to the full amount of cache space available, but I've usually had good luck setting to the sum of shared_buffers space and OS cache. Since it only affects plan choice, not memory allocations, changing it won't help if good plans are already being chosen. -Kevin
2011/1/12 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Laszlo Nagy <gandalf@shopzeus.com> wrote: > >>> In addition to the good advice from Ken, I suggest that you set >>> effective_cache_size (if you haven't already). Add whatever the >>> OS shows as RAM used for cache to the shared_mem setting. >> It was 1GB. Now I changed to 2GB. Although the OS shows 9GB >> inactive memory, we have many concurrent connections to the >> database server. I hope it is okay to use 2GB. > > effective_cache_size doesn't cause any RAM to be allocated, it's > just a hint to the costing routines. Higher values tend to favor > index use, while lower values tend to favor sequential scans. I > suppose that if you regularly have many large queries running at the > same moment you might not want to set it to the full amount of cache > space available, > but I've usually had good luck setting to the sum > of shared_buffers space and OS cache. What is the OS used ? Do you have windows ? if yes the current parameters are not good, and linux should not have 9GB of 'inactive' (?) memory. > > Since it only affects plan choice, not memory allocations, changing > it won't help if good plans are already being chosen. > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support