Thread: wildcard makes seq scan on prod db but not in test
Dear list, I have a table with a few million rows and this index: CREATE INDEX bond_item_common_x7 ON bond_item_common ((lower(original_filename))); There are about 2M rows on bonddump and 4M rows on bond90. bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine. The table is analyzed properly both places. I'm an index hint zealot, but aware of our different stances in the matter. :) Dropping the wildcard for the like, both databases uses the index. Is there a way to convince Postgres to try not to do full table scan as much? This is just one of several examples when it happily spends lots of time sequentially going thru tables. Thanks, Marcus psql (9.0.4) Type "help" for help. bonddump=# explain analyze select pic2.objectid bonddump-# from bond_item_common pic2 bonddump-# where bonddump-# lower(pic2.original_filename) like 'this is a test%' ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using bond_item_common_x7 on bond_item_common pic2 (cost=0.01..8.69 rows=208 width=4) (actual time=26.415..26.415 rows=0 loops=1) Index Cond: ((lower((original_filename)::text) >= 'this is a test'::text) AND (lower((original_filename)::text) < 'this is a tesu'::text)) Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text) Total runtime: 26.519 ms (4 rows) psql (9.0.4) bond90=> explain analyze select pic2.objectid bond90-> from bond_item_common pic2 bond90-> where bond90-> lower(pic2.original_filename) like 'this is a test%' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Seq Scan on bond_item_common pic2 (cost=0.00..839226.81 rows=475 width=4) (actual time=10599.401..10599.401 rows=0 loops=1) Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text) Total runtime: 10599.425 ms (3 rows)
Marcus Engene <mengpg2@engene.se> writes: > There are about 2M rows on bonddump and 4M rows on bond90. > bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine. > The table is analyzed properly both places. I'll bet one database was initialized in C locale and the other not. regards, tom lane
Marcus Engene <mengpg2@engene.se> wrote: > I have a table with a few million rows and this index: > CREATE INDEX bond_item_common_x7 ON bond_item_common > ((lower(original_filename))); > Dropping the wildcard for the like, both databases uses the index. > > Is there a way to convince Postgres to try not to do full table > scan as much? That could be a difference is collations. What do you get from the query on this page for each database?: http://wiki.postgresql.org/wiki/Server_Configuration -Kevin
On 5/9/11 8:57 , Kevin Grittner wrote: > > That could be a difference is collations. What do you get from the > query on this page for each database?: > > http://wiki.postgresql.org/wiki/Server_Configuration > > -Kevin > > There's indeed a different collation. Why is this affecting? Can i force a column to be ascii? The (fast) test server: version | PostgreSQL 9.0.4 on x86_64-apple-darwin10.7.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit effective_cache_size | 512MB lc_collate | C lc_ctype | UTF-8 maintenance_work_mem | 128MB max_connections | 100 max_stack_depth | 2MB port | 5435 server_encoding | UTF8 shared_buffers | 512MB temp_buffers | 8192 TimeZone | Europe/Zurich wal_buffers | 1MB work_mem | 128MB (14 rows) The (slow) production server: version | PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit checkpoint_completion_target | 0.9 checkpoint_segments | 64 effective_cache_size | 48GB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | localhost,10.0.0.3,74.50.57.76 maintenance_work_mem | 1GB max_connections | 600 max_stack_depth | 2MB port | 5435 server_encoding | UTF8 shared_buffers | 8GB temp_buffers | 32768 TimeZone | UTC work_mem | 128MB (16 rows) Thanks, Marcus
Marcus Engene <mengpg2@engene.se> wrote: > On 5/9/11 8:57 , Kevin Grittner wrote: >> >> That could be a difference is collations. What do you get from >> the query on this page for each database?: >> >> http://wiki.postgresql.org/wiki/Server_Configuration > There's indeed a different collation. Why is this affecting? If the index isn't sorted in an order which leaves the rows you are requesting near one another, it's not very useful for the query. Try this query on both: create temp table order_example (val text); insert into order_example values ('a z'),('ab'),('123'),(' 456'); select * from order_example order by val; > Can i force a column to be ascii? You don't need to do that; you can specify an opclass for the index to tell it that you don't want to order by the normal collation, but rather in a way which will allow the index to be useful for pattern matching: http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html > The (fast) test server: > effective_cache_size | 512MB > lc_collate | C > lc_ctype | UTF-8 > maintenance_work_mem | 128MB > max_connections | 100 > server_encoding | UTF8 > shared_buffers | 512MB > temp_buffers | 8192 > TimeZone | Europe/Zurich > wal_buffers | 1MB > The (slow) production server: > checkpoint_completion_target | 0.9 > checkpoint_segments | 64 > effective_cache_size | 48GB > lc_collate | en_US.UTF-8 > lc_ctype | en_US.UTF-8 > listen_addresses | localhost,10.0.0.3,74.50.57.76 > maintenance_work_mem | 1GB > max_connections | 600 > server_encoding | UTF8 > shared_buffers | 8GB > temp_buffers | 32768 > TimeZone | UTC As you've discovered, with that many differences, performance tests on one machine may have very little to do with actual performance on the other. -Kevin
On 5/9/11 9:59 , Kevin Grittner wrote: > > You don't need to do that; you can specify an opclass for the index > to tell it that you don't want to order by the normal collation, but > rather in a way which will allow the index to be useful for pattern > matching: > > http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html > -Kevin > > Hi, Thanks for the explanation. Works brilliantly! Best regards, Marcus For future googlers: http://www.postgresonline.com/journal/archives/78-Why-is-my-index-not-being-used.html drop index bond_item_common_x7; CREATE INDEX bond_item_common_x7 ON bond_item_common USING btree(lower(original_filename) varchar_pattern_ops); bond90=> explain analyze select pic2.objectid from bond_item_common pic2 where lower(pic2.original_filename) like 'this is a test%' ; QUERY PLAN --------------------------------------------------------------... Bitmap Heap Scan on bond_item_common pic2 (cost=705.84..82746.05 rows=23870 width=4) (actual time=0.015..0.015 rows=0 loops=1) Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text) -> Bitmap Index Scan on bond_item_common_x7 (cost=0.00..699.87 rows=23870 width=0) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: ((lower((original_filename)::text) ~>=~ 'this is a test'::text) AND (lower((original_filename)::text) ~<~ 'this is a tesu'::text)) Total runtime: 0.033 ms