Thread: simple case using index on windows but not on linux

simple case using index on windows but not on linux

From

"simon godden"

Date:

04 October 2006, 04:48:12

I have a simple case, selecting on a LIKE where clause over a single
column that has an index on it.  On windows it uses the index - on
linux it does not.  I have exactly the same scema and data in each,
and I have run the necessary analyze commands on both.

Windows is running 8.1.4
Linux is running from RPM postgresql-server-8.1.4-1.FC5.1

There are 1 million rows in the table - a number I would expect to
lower the score of a sequential scan for the planner.  There is an
index on 'c_number'.

On windows I get this:

orderstest=# explain analyze select * from t_order where c_number like '0001%';
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_order_c_number on t_order  (cost=0.00..26.53
rows=928 width=43) (actual time=0.029..2.857 rows=1000 loops=1)
   Index Cond: (((c_number)::text >= '0001'::character varying) AND
((c_number)::text < '0002'::character varying))
   Filter: ((c_number)::text ~~ '0001%'::text)
 Total runtime: 4.572 ms
(4 rows)

Great - the index is used, and the query is lightning fast.

On Linux I get this:

orderstest=# explain analyze select c_number from t_order where
c_number like '0001%';
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on t_order  (cost=0.00..20835.00 rows=983 width=11) (actual
time=1.364..1195.064 rows=1000 loops=1)
   Filter: ((c_number)::text ~~ '0001%'::text)
 Total runtime: 1197.312 ms
(3 rows)

I just can't use this level of performance in my application.

On my linux box, the only way I can get it to use the index is to use
the = operator.  If I use anything else, a seq scan is used.

Disabling sequence scans in the config has no effect.  It still does
not use the index for anything other than an = comparison.

Here is a dump of the table description:

orderstest=# \d t_order;
                   Table "public.t_order"
        Column         |          Type          | Modifiers
-----------------------+------------------------+-----------
 id                    | bigint                 | not null
 c_number              | character varying(255) |
 customer_id           | bigint                 |
 origincountry_id      | bigint                 |
 destinationcountry_id | bigint                 |
Indexes:
    "t_order_pkey" PRIMARY KEY, btree (id)
    "t_order_c_number" btree (c_number)
    "zzzz_3" btree (destinationcountry_id)
    "zzzz_4" btree (origincountry_id)
    "zzzz_5" btree (customer_id)
Foreign-key constraints:
    "fk9efdd3a33dbb666c" FOREIGN KEY (destinationcountry_id)
REFERENCES go_country(id)
    "fk9efdd3a37d3dd384" FOREIGN KEY (origincountry_id) REFERENCES
go_country(id)
    "fk9efdd3a38654c9d3" FOREIGN KEY (customer_id) REFERENCES t_party(id)

That dump is exactly the same on both machines.

The only major difference between the hardware is that the windows
machine has 2gb RAM and a setting of 10000 shared memory pages,
whereas the linux machine has 756Mb RAM and a setting of 3000 shared
memory pages (max. shared memory allocation of 32Mb).  I can't see any
other differences in configuration.

Disk throughput on both is reasonable (40Mb/second buffered reads)

Can anyone explain the difference in the planner behaviour on the two
systems, using what appears to be the same version of postgres?

--
Simon Godden

Re: simple case using index on windows but not on linux

From

Heikki Linnakangas

Date:

04 October 2006, 06:11:50

simon godden wrote:
> The only major difference between the hardware is that the windows
> machine has 2gb RAM and a setting of 10000 shared memory pages,
> whereas the linux machine has 756Mb RAM and a setting of 3000 shared
> memory pages (max. shared memory allocation of 32Mb).  I can't see any
> other differences in configuration.
You can increase the max shared memory size if you have root access. See

http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS

Scroll down for Linux-specific instructions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Re: simple case using index on windows but not on linux

From

"simon godden"

Date:

04 October 2006, 06:19:09

(Sending again because I forgot to reply to all)

On 10/4/06, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> You can increase the max shared memory size if you have root access. See
>
> http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS
>
> Scroll down for Linux-specific instructions.

Thanks for the link.

Are you saying that the shared memory size is the issue here?  Please
can you explain how it would cause a seq scan rather than an index
scan.

I would like to understand the issue before making changes.

--
Simon Godden

Re: simple case using index on windows but not on linux

From

"simon godden"

Date:

04 October 2006, 06:40:54

On 10/4/06, Richard Huxton <dev@archonet.com> wrote:
>
> Issue "set enable_seqscan=false" and then run your explain analyse. If
> your query uses the index, what is the estimated cost? If the estimated
> cost is larger than a seq-scan that would indicate your configuration
> settings are badly out-of-range.

I did that and it still used seq-scan.

>
> If the index isn't used, then we have problem #3. I think this is what
> you are actually seeing. Your locale is something other than "C" and PG
> doesn't know how to use like with indexes. Read up on operator classes
> or change your locale.
> http://www.postgresql.org/docs/8.1/static/indexes-opclass.html
>

Aha - that sounds like it - this is the output from locale

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I guess it cannot determine the collating sequence?

I'm not too familiar with unix locale issues - does this output match
your problem description?

Can you explain how to change my locale to 'C'?  (I'm quite happy for
you to tell me to RTFM, as I know this is not a linux user mailing
list :)

--
Simon Godden

Re: simple case using index on windows but not on linux

From

Richard Huxton

Date:

04 October 2006, 06:51:00

simon godden wrote:
> (Sending again because I forgot to reply to all)
>
> On 10/4/06, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> You can increase the max shared memory size if you have root access. See
>>
>> http://www.postgresql.org/docs/8.1/interactive/kernel-resources.html#SYSVIPC-PARAMETERS
>>
>>
>> Scroll down for Linux-specific instructions.
>
> Thanks for the link.
>
> Are you saying that the shared memory size is the issue here?  Please
> can you explain how it would cause a seq scan rather than an index
> scan.
>
> I would like to understand the issue before making changes.

It *might* be shared-memory settings. It's almost certainly something to
do with setup. If you have the same data and the same query and can
reliably produce different results then something else must be different.

If you look at the explain output from both, PG knows the seq-scan is
going to be expensive (cost=20835) so the Linux box either
1. Doesn't have the index (and you say it does, so it's not this).
2. Thinks the index will be even more expensive.
3. Can't use the index at all.

Issue "set enable_seqscan=false" and then run your explain analyse. If
your query uses the index, what is the estimated cost? If the estimated
cost is larger than a seq-scan that would indicate your configuration
settings are badly out-of-range.

If the index isn't used, then we have problem #3. I think this is what
you are actually seeing. Your locale is something other than "C" and PG
doesn't know how to use like with indexes. Read up on operator classes
or change your locale.
http://www.postgresql.org/docs/8.1/static/indexes-opclass.html

--
   Richard Huxton
   Archonet Ltd

Re: simple case using index on windows but not on linux

From

Richard Huxton

Date:

04 October 2006, 07:36:57

simon godden wrote:
>> If the index isn't used, then we have problem #3. I think this is what
>> you are actually seeing. Your locale is something other than "C" and PG
>> doesn't know how to use like with indexes. Read up on operator classes
>> or change your locale.
>> http://www.postgresql.org/docs/8.1/static/indexes-opclass.html
>
> Aha - that sounds like it - this is the output from locale
>
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
..
> I guess it cannot determine the collating sequence?

It can, but isn't sure that it can rely on LIKE 'A%' being the same as
 >= 'A' and < 'B' (not always true). Re-creating the index with the
right opclass will tell it this is the case.

> I'm not too familiar with unix locale issues - does this output match
> your problem description?

OK - quick intro to locales. Create a file /tmp/sortthis containing the
following:
---begin file---
BBB
CCC
AAA
A CAT
A DOG
ACAT
---end file---
Now run "sort /tmp/sortthis". You'll probably see spaces get ignored.
Now run "LANG=C sort /tmp/sortthis". You'll probably see a traditional
ASCII ("C") sort. If not try LC_COLLATE rather than LANG.

> Can you explain how to change my locale to 'C'?  (I'm quite happy for
> you to tell me to RTFM, as I know this is not a linux user mailing
> list :)

You'll want to dump your databases and re-run initdb with a locale of
"C" (or no locale). See:
   http://www.postgresql.org/docs/8.1/static/app-initdb.html

That will mean all sorting will be on ASCII value. The problem is that
the database picks up the operating-system's default locale when you
install it from package. Not always what you want, but then until you
understand the implications you can't really decide one way or the other.

HTH

--
   Richard Huxton
   Archonet Ltd

Re: simple case using index on windows but not on linux

From

Richard Huxton

Date:

04 October 2006, 10:45:27

simon godden wrote:
> I did that, e.g. initdb --locale=C, re-created all my data and have
> exactly the same problem.
>
> I have two indexes, one with no options, and one with the varchar
> operator options.
>
> So the situation now is:
> If I do a like query it uses the index with the varchar options;
> If I do a = query, it uses the index with no options;
> If I do a < or > or any other operator, it reverts back to a seq-scan!
>
> I am on FC5 - any further ideas?  Did I need to do anything specific
> about collating sequence?  I thought that the --locale=C would set
> that for all options.

 From psql, a "show all" command will list all your config settings and
let you check the lc_xxx values are correct.

Make sure you've analysed the database after restoring, otherwise it
will have bad stats available.

--
   Richard Huxton
   Archonet Ltd

Re: simple case using index on windows but not on linux

From

"simon godden"

Date:

04 October 2006, 10:48:12

On 10/4/06, Richard Huxton <dev@archonet.com> wrote:
> simon godden wrote:
>
>  From psql, a "show all" command will list all your config settings and
> let you check the lc_xxx values are correct.

lc_collate is C, as are all the other lc settings.

I have run the analyze commands.

Still the same.

--
Simon Godden

Re: simple case using index on windows but not on linux

From

Richard Huxton

Date:

04 October 2006, 11:05:37

simon godden wrote:
> On 10/4/06, Richard Huxton <dev@archonet.com> wrote:
>> simon godden wrote:
>>
>>  From psql, a "show all" command will list all your config settings and
>> let you check the lc_xxx values are correct.
>
> lc_collate is C, as are all the other lc settings.
>
> I have run the analyze commands.
>
> Still the same.

Can you post EXPLAIN ANALYSE for the LIKE and <> queries that should be
using the index? With enable_seqscan on and off please.

--
   Richard Huxton
   Archonet Ltd

Re: simple case using index on windows but not on linux

From

"Dave Dutcher"

Date:

04 October 2006, 11:12:24

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
>
> lc_collate is C, as are all the other lc settings.
>
> I have run the analyze commands.
>
> Still the same.


That is strange.  I figured it had to be related to the locale and the LIKE
operator.  I'm not an expert on these locale issues, but I'd be curious to
see if it would start using an index if you added an index like this:

CREATE INDEX test_index ON t_order (c_number varchar_pattern_ops);

Dave

Re: simple case using index on windows but not on linux

From

"simon godden"

Date:

04 October 2006, 11:46:27

> Can you post EXPLAIN ANALYSE for the LIKE and <> queries that should be
> using the index? With enable_seqscan on and off please.
>

OK - I don't know what happened, but now my linux installation is
behaving like the windows one.  I honestly don't know what changed,
which I know doesn't help people determine the cause of my issue....

But I still have a problem with > and <, on both environments.

Now, both LIKE and = are using the index with no options on it.

But the other operators are not.

Firstly, with enable_seqscan on:

orderstest=# explain analyze select c_number from t_order where
c_number like '00001%';
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_order_c_number on t_order  (cost=0.00..3.01 rows=1
width=11) (actual time=0.167..0.610 rows=100 loops=1)
   Index Cond: (((c_number)::text >= '00001'::character varying) AND
((c_number)::text < '00002'::character varying))
   Filter: ((c_number)::text ~~ '00001%'::text)
 Total runtime: 0.921 ms
(4 rows)

orderstest=# explain analyze select c_number from t_order where
c_number > '0001';
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on t_order  (cost=0.00..18312.50 rows=878359 width=11)
(actual time=1.102..4364.704 rows=878000 loops=1)
   Filter: ((c_number)::text > '0001'::text)
 Total runtime: 6431.968 ms
(3 rows)

And now with enable_seqscan off:

orderstest=# explain analyze select c_number from t_order where
c_number like '00001%';
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_order_c_number on t_order  (cost=0.00..3.01 rows=1
width=11) (actual time=0.245..0.674 rows=100 loops=1)
   Index Cond: (((c_number)::text >= '00001'::character varying) AND
((c_number)::text < '00002'::character varying))
   Filter: ((c_number)::text ~~ '00001%'::text)
 Total runtime: 0.971 ms
(4 rows)

(Just the same)

orderstest=# explain analyze select c_number from t_order where
c_number > '0001';
                                                                 QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_order_c_number on t_order  (cost=0.00..22087.31
rows=878912 width=11) (actual time=0.230..3504.909 rows=878000
loops=1)
   Index Cond: ((c_number)::text > '0001'::text)
 Total runtime: 5425.931 ms
(3 rows)

(Now using the index but getting awful performance out of it - how's that?)

The difference seems to be whether it is treating the index condition
as 'character varying' or 'text'.

Basically, can I do > < >= <= on a varchar without causing a seq-scan?

--
Simon Godden

Re: simple case using index on windows but not on linux

From

"simon godden"

Date:

04 October 2006, 11:49:04

I think I am being stupid now.

The > query was returning so many rows (87% of the rows in the table)
that a seq-scan was of course the best way.

Sorry - all is now working and the problem was the locale issue.

Thanks so much for your help everyone.

--
Simon Godden