Thread: wildcard makes seq scan on prod db but not in test

wildcard makes seq scan on prod db but not in test

From

Marcus Engene

Date:

09 May 2011, 15:45:24

Dear list,

I have a table with a few million rows and this index:
CREATE INDEX bond_item_common_x7 ON bond_item_common
((lower(original_filename)));

There are about 2M rows on bonddump and 4M rows on bond90.

bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine.

The table is analyzed properly both places.

I'm an index hint zealot, but aware of our different stances in the
matter. :)

Dropping the wildcard for the like, both databases uses the index.

Is there a way to convince Postgres to try not to do full table scan as
much? This is just one of several examples when it happily spends lots
of time sequentially going thru tables.

Thanks,
Marcus




psql (9.0.4)
Type "help" for help.

bonddump=# explain analyze          select pic2.objectid
bonddump-#          from bond_item_common pic2
bonddump-#          where
bonddump-#              lower(pic2.original_filename) like 'this is a
test%' ;
                                                                   QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using bond_item_common_x7 on bond_item_common pic2
(cost=0.01..8.69 rows=208 width=4) (actual time=26.415..26.415 rows=0
loops=1)
    Index Cond: ((lower((original_filename)::text) >= 'this is a
test'::text) AND (lower((original_filename)::text) < 'this is a
tesu'::text))
    Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text)
  Total runtime: 26.519 ms
(4 rows)




psql (9.0.4)
bond90=> explain analyze          select pic2.objectid
bond90->          from bond_item_common pic2
bond90->          where
bond90->              lower(pic2.original_filename) like 'this is a test%' ;
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on bond_item_common pic2  (cost=0.00..839226.81 rows=475
width=4) (actual time=10599.401..10599.401 rows=0 loops=1)
    Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text)
  Total runtime: 10599.425 ms
(3 rows)

Re: wildcard makes seq scan on prod db but not in test

From

Tom Lane

Date:

09 May 2011, 16:08:30

Marcus Engene <mengpg2@engene.se> writes:
> There are about 2M rows on bonddump and 4M rows on bond90.
> bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine.
> The table is analyzed properly both places.

I'll bet one database was initialized in C locale and the other not.

            regards, tom lane

Re: wildcard makes seq scan on prod db but not in test

From

"Kevin Grittner"

Date:

09 May 2011, 16:10:32

Marcus Engene <mengpg2@engene.se> wrote:

> I have a table with a few million rows and this index:
> CREATE INDEX bond_item_common_x7 ON bond_item_common
> ((lower(original_filename)));

> Dropping the wildcard for the like, both databases uses the index.
>
> Is there a way to convince Postgres to try not to do full table
> scan as much?

That could be a difference is collations.  What do you get from the
query on this page for each database?:

http://wiki.postgresql.org/wiki/Server_Configuration

-Kevin

Re: wildcard makes seq scan on prod db but not in test

From

Marcus Engene

Date:

09 May 2011, 16:19:14

On 5/9/11 8:57 , Kevin Grittner wrote:
>
> That could be a difference is collations.  What do you get from the
> query on this page for each database?:
>
> http://wiki.postgresql.org/wiki/Server_Configuration
>
> -Kevin
>
>
There's indeed a different collation. Why is this affecting? Can i force
a column to be ascii?

The (fast) test server:
  version              | PostgreSQL 9.0.4 on x86_64-apple-darwin10.7.0,
compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc.
build 5664), 64-bit
  effective_cache_size | 512MB
  lc_collate           | C
  lc_ctype             | UTF-8
  maintenance_work_mem | 128MB
  max_connections      | 100
  max_stack_depth      | 2MB
  port                 | 5435
  server_encoding      | UTF8
  shared_buffers       | 512MB
  temp_buffers         | 8192
  TimeZone             | Europe/Zurich
  wal_buffers          | 1MB
  work_mem             | 128MB
(14 rows)

The (slow) production server:
  version                      | PostgreSQL 9.0.4 on
x86_64-unknown-linux-gnu, compiled by
GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit
  checkpoint_completion_target | 0.9
  checkpoint_segments          | 64
  effective_cache_size         | 48GB
  lc_collate                   | en_US.UTF-8
  lc_ctype                     | en_US.UTF-8
  listen_addresses             | localhost,10.0.0.3,74.50.57.76
  maintenance_work_mem         | 1GB
  max_connections              | 600
  max_stack_depth              | 2MB
  port                         | 5435
  server_encoding              | UTF8
  shared_buffers               | 8GB
  temp_buffers                 | 32768
  TimeZone                     | UTC
  work_mem                     | 128MB
(16 rows)

Thanks,
Marcus

Re: wildcard makes seq scan on prod db but not in test

From

"Kevin Grittner"

Date:

09 May 2011, 17:20:15

Marcus Engene <mengpg2@engene.se> wrote:
> On 5/9/11 8:57 , Kevin Grittner wrote:
>>
>> That could be a difference is collations.  What do you get from
>> the query on this page for each database?:
>>
>> http://wiki.postgresql.org/wiki/Server_Configuration

> There's indeed a different collation. Why is this affecting?

If the index isn't sorted in an order which leaves the rows you are
requesting near one another, it's not very useful for the query.
Try this query on both:

create temp table order_example (val text);
insert into order_example values ('a  z'),('ab'),('123'),('   456');
select * from order_example order by val;

> Can i force a column to be ascii?

You don't need to do that; you can specify an opclass for the index
to tell it that you don't want to order by the normal collation, but
rather in a way which will allow the index to be useful for pattern
matching:

http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html

> The (fast) test server:

>   effective_cache_size | 512MB
>   lc_collate           | C
>   lc_ctype             | UTF-8
>   maintenance_work_mem | 128MB
>   max_connections      | 100
>   server_encoding      | UTF8
>   shared_buffers       | 512MB
>   temp_buffers         | 8192
>   TimeZone             | Europe/Zurich
>   wal_buffers          | 1MB

> The (slow) production server:

>   checkpoint_completion_target | 0.9
>   checkpoint_segments          | 64
>   effective_cache_size         | 48GB
>   lc_collate                   | en_US.UTF-8
>   lc_ctype                     | en_US.UTF-8
>   listen_addresses             | localhost,10.0.0.3,74.50.57.76
>   maintenance_work_mem         | 1GB
>   max_connections              | 600
>   server_encoding              | UTF8
>   shared_buffers               | 8GB
>   temp_buffers                 | 32768
>   TimeZone                     | UTC

As you've discovered, with that many differences, performance tests
on one machine may have very little to do with actual performance on
the other.

-Kevin

Re: wildcard makes seq scan on prod db but not in test

From

Marcus Engene

Date:

09 May 2011, 18:29:05

On 5/9/11 9:59 , Kevin Grittner wrote:
>
> You don't need to do that; you can specify an opclass for the index
> to tell it that you don't want to order by the normal collation, but
> rather in a way which will allow the index to be useful for pattern
> matching:
>
> http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html
> -Kevin
>
>

Hi,

Thanks for the explanation. Works brilliantly!

Best regards,
Marcus


For future googlers:

http://www.postgresonline.com/journal/archives/78-Why-is-my-index-not-being-used.html

drop index bond_item_common_x7;

CREATE INDEX bond_item_common_x7 ON bond_item_common USING
btree(lower(original_filename) varchar_pattern_ops);

bond90=> explain analyze
select pic2.objectid
from bond_item_common pic2
where
  lower(pic2.original_filename) like 'this is a test%' ;
  QUERY PLAN
--------------------------------------------------------------...
  Bitmap Heap Scan on bond_item_common pic2  (cost=705.84..82746.05
rows=23870 width=4) (actual time=0.015..0.015 rows=0 loops=1)
    Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text)
    ->  Bitmap Index Scan on bond_item_common_x7  (cost=0.00..699.87
rows=23870 width=0) (actual time=0.014..0.014 rows=0 loops=1)
          Index Cond: ((lower((original_filename)::text) ~>=~ 'this is a
test'::text) AND (lower((original_filename)::text) ~<~ 'this is a
tesu'::text))
  Total runtime: 0.033 ms