Thread: query optimizer questions

query optimizer questions

From
Robert Berger
Date:
Why do the following three queries use different scan types under 7.1.2?
Under
6.5.3 they all used index (and my system ran MUCH faster, since I often
use the trailing %)

I've done a vacuum analyze.
I installed 7.1.2 on RedHat 6.2 using the RPMs.


anderson=# explain select value from s_n_grantor where value='foobar';
NOTICE:  QUERY PLAN:

Index Scan using s_n_grantor_key on s_n_grantor  (cost=0.00..823.58
rows=247 width=12)

EXPLAIN
anderson=# explain select value from s_n_grantor where value like
'foobar';
NOTICE:  QUERY PLAN:

Seq Scan on s_n_grantor  (cost=0.00..8178.46 rows=247 width=12)

anderson=# explain select value from s_n_grantor where value like
'foobar%';
NOTICE:  QUERY PLAN:

Seq Scan on s_n_grantor  (cost=0.00..8178.46 rows=1 width=12)


EXPLAIN
anderson=# \d s_n_grantor_key
      Index "s_n_grantor_key"
 Attribute |         Type
-----------+-----------------------
 value     | character varying(80)
btree

anderson=# \d s_n_grantor
             Table "s_n_grantor"
 Attribute |         Type          | Modifier
-----------+-----------------------+----------
 doc       | character varying(16) | not null
 entry     | smallint              | not null
 value     | character varying(80) |
Indices: s_n_grantor_key,
         s_n_grantor_pkey

Re: query optimizer questions

From
Peter Eisentraut
Date:
Robert Berger writes:

> Why do the following three queries use different scan types under 7.1.2?
> Under
> 6.5.3 they all used index (and my system ran MUCH faster, since I often
> use the trailing %)

In 7.1, LIKE optimization (i.e., possibly using an index) has been
disabled unless you are in the C locale because it does not work
correctly.  You could set your locale to C and re-initdb to get it to work
again.

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


Re: query optimizer questions

From
Robert Berger
Date:
Thanks. How do I set the locale?

> In 7.1, LIKE optimization (i.e., possibly using an index) has been
> disabled unless you are in the C locale because it does not work
> correctly.  You could set your locale to C and re-initdb to get it to work
> again.
>
> --
> Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: query optimizer questions

From
Tom Lane
Date:
Robert Berger <rwb@vtiscan.com> writes:
> Why do the following three queries use different scan types under 7.1.2?

As Peter points out, index optimization of LIKE queries is disabled
unless your database is in C locale (if you're not sure,
contrib/pg_controldata can tell you what locale your database is using).

However, the estimated row counts seem peculiar to me in any case:

> anderson=# explain select value from s_n_grantor where value='foobar';
> NOTICE:  QUERY PLAN:

> Index Scan using s_n_grantor_key on s_n_grantor  (cost=0.00..823.58
> rows=247 width=12)

> EXPLAIN
> anderson=# explain select value from s_n_grantor where value like
> 'foobar';
> NOTICE:  QUERY PLAN:

> Seq Scan on s_n_grantor  (cost=0.00..8178.46 rows=247 width=12)

> anderson=# explain select value from s_n_grantor where value like
> 'foobar%';
> NOTICE:  QUERY PLAN:

> Seq Scan on s_n_grantor  (cost=0.00..8178.46 rows=1 width=12)

I'd expect the LIKE 'foobar' to produce the same estimate as = 'foobar',
because the LIKE estimator recognizes this as a fixed pattern.  But why
is that considered *less* selective than LIKE 'foobar%'?  Something
fishy about that.  Could we see the output of

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 's_n_grantor';

(If you get no output, it means you didn't VACUUM ANALYZE ...)

            regards, tom lane

Re: query optimizer questions

From
Allan Engelhardt
Date:
> Thanks. How do I set the locale?

Short answer:

  bash$ LANG=C initdb

Longer answer:

I find locale support in PostgreSQL confusing.  The locale of the current user at the time of initialising the database
determinesthe collating order (e.g. is 'a' < 'A').  This in turns affect things like indexing (as described below).
Thelocale of the account that is running the current  postmaster  defines format conversions (e.g. the currency symbol
into_char(1,'L99')).  The locale of the current process requesting a query defines how the user expects to see the
result,and is (of course) ignored by PostgreSQL. 

To make matters worse: On Red Hat 7.1 (US boxed set) the default locale is set in /etc/sysconf/i18n to en_US, i.e.
AmericanEnglish.  Fine, except some Americans may be surprised to learn that 'A' > 'a'.  When you start the postgresql
servicewithout doing an initdb first, the script will helpfully copy i18n to  ~postgres/initdb.i18n so you have a
recordof the locale (and can re-set it).  Except it never uses this file.  In fact, the postmaster service runs in the
defaultlocale  *for root*.  A previous version of RH did go through the trouble of sourcing initdb.i18n in
.bash_profile,but it does nothing for postmaster which is started as su -l postgres -s /bin/sh ... 

In order to start postmaster on RH7.1 with a locale that is different from the default (actually: different from root)
youshould create ~postgres/.profile with the content 

PGLIB=/usr/lib/pgsql
PGDATA=/var/lib/pgsql/data
export PGLIB PGDATA
[ -f $PGDATA/../initdb.i18n ] &&  source $PGDATA/../initdb.i18n

You may, of course, have to change the path. You may want to add that last line to your ~postgres/.bash_profile as
well,in order to avoid future confusion.... 

If you dont have initdb.i18n, just delete the content of the data directory and start the postgres service.  The result
shouldlook something like 

LANG="en_US"
export LANG LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC LC_CTYPE LC_TIME

for a default installation, but of course you want LANG="C"


Hope this makes sense to somebody....e-mail if I've managed to confuse anybody, including myself :-)


Allan.


Re: Re: query optimizer questions

From
Peter Eisentraut
Date:
Allan Engelhardt writes:

> I find locale support in PostgreSQL confusing.  The locale of the
> current user at the time of initialising the database determines the
> collating order (e.g. is 'a' < 'A').  This in turns affect things like
> indexing (as described below).

This is because if you were allowed to change this setting after the
database initialization, your indexes would get corrupted.  (Recall that
(some) indexes are sorted, and the index scanning code relies on indexes
being sorted a certain way.)

> The locale of the account that is running the current postmaster
> defines format conversions (e.g. the currency symbol in
> to_char(1,'L99')).

This is a courtesy because the above contraints do not apply to this
locale category.  I agree it's confusing, but I'm not sure what can be
done about it.

> The locale of the current process requesting a query defines how the
> user expects to see the result, and is (of course) ignored by
> PostgreSQL.

I guess it would make some sense if the client would pass along its locale
settings (in the allowable categories) to the server.

> To make matters worse: On Red Hat 7.1 (US boxed set) the default
> locale is set in /etc/sysconf/i18n to en_US, i.e. American English.
> Fine, except some Americans may be surprised to learn that 'A' > 'a'.

What's so surprising about that?  Yeah, ASCII order, blah.  Remember that
computers are for humans, not for computers.

Btw., actually the rules are more complicated:  ab < Ab < ac

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


Re: query optimizer questions

From
Allan Engelhardt
Date:
Peter Eisentraut wrote:

> Allan Engelhardt writes:
>
> > I find locale support in PostgreSQL confusing.  The locale of the
> > current user at the time of initialising the database determines the
> > collating order (e.g. is 'a' < 'A').  This in turns affect things like
> > indexing (as described below).
>
> This is because if you were allowed to change this setting after the
> database initialization, your indexes would get corrupted.  (Recall that
> (some) indexes are sorted, and the index scanning code relies on indexes
> being sorted a certain way.)

Yes, I understand WHY it is, I just wish it was documented somewhere obvious.  But nevermind - that what these mailing
listsare for! 

Thanks, all.


Allan.


Re: Re: query optimizer questions

From
Robert Berger
Date:
> This is because if you were allowed to change this setting after the
>  database initialization

Why can't it be set at database creation time, so different databases
can have different locales? Why is it a global per-server setting?