Thread: another index question

another index question

From
Sean Harding
Date:
I have another query that I expected to use an index but is doing a
sequential scan:

SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum IN (SELECT mesgnum
 FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1);

Here's the explain:

NOTICE:  QUERY PLAN:

Seq Scan on mesg_headers  (cost=0.00..46866049756.39 rows=374843 width=36)
  SubPlan
    ->  Materialize  (cost=125028.26..125028.26 rows=1 width=4)
          ->  Limit  (cost=125028.26..125028.26 rows=1 width=4)
                ->  Sort  (cost=125028.26..125028.26 rows=374843 width=4)
                      ->  Seq Scan on mesg_headers  (cost=0.00..81505.43 rows=374843 width=4)

EXPLAIN

There's an index on the mesgnum, and both queries independently use the
index:

email=# EXPLAIN SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum IN (5,20);
NOTICE:  QUERY PLAN:

Index Scan using mesg_headers_pkey, mesg_headers_pkey on mesg_headers
(cost=0.00..9.98 rows=1 width=36)

EXPLAIN

email=# EXPLAIN SELECT mesgnum FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..1.47 rows=1 width=4)
  ->  Index Scan Backward using mesg_headers_pkey on mesg_headers
  (cost=0.00..550028.43 rows=374843 width=4)

EXPLAIN

So, if both of the queries use the index, why don't they use the index when
combined?

Thanks.

sean

--
Sean Harding sharding@dogcow.org        |"It's not a habit, it's cool
http://www.dogcow.org/sean/             | I feel alive."
                                        | --k's Choice

Re: another index question

From
Sean Harding
Date:
On Sun Apr 01 at 04:13:42 PM, Tom Lane wrote:

> Sean Harding <sharding@dogcow.org> writes:
> > So, if both of the queries use the index, why don't they use the index when
> > combined?
>
> This looks like a bug.  What version are you running, exactly?

Sorry. Should have mentioned that. It's 7.1RC1 on linux 2.4.2.

sean

--
Sean Harding sharding@dogcow.org        |"Oh, the bottle has been to me, My
http://www.dogcow.org/sean              | closest friend, my worst enemy."
                                        |          --Natalie Merchant

Re: another index question

From
Tom Lane
Date:
Sean Harding <sharding@dogcow.org> writes:
> So, if both of the queries use the index, why don't they use the index when
> combined?

This looks like a bug.  What version are you running, exactly?

            regards, tom lane

Re: another index question

From
Tom Lane
Date:
Sean Harding <sharding@dogcow.org> writes:
>> This looks like a bug.  What version are you running, exactly?

> Sorry. Should have mentioned that. It's 7.1RC1 on linux 2.4.2.

Hm.  Curious.  I can't reproduce the problem:

regression=# create table mesg_headers (mesgnum int, frm text,
regression(# rcpt text, subject text);
CREATE
regression=# create index mesg_msgnum on mesg_headers(mesgnum);
CREATE
regression=# explain
regression-# SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum IN (SELECT
 mesgnum
regression(# FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1);
NOTICE:  QUERY PLAN:

Seq Scan on mesg_headers  (cost=0.00..81.50 rows=1000 width=36)
  SubPlan
    ->  Materialize  (cost=0.06..0.06 rows=1 width=4)
          ->  Limit  (cost=0.00..0.06 rows=1 width=4)
                ->  Index Scan Backward using mesg_msgnum on mesg_headers  (cost
=0.00..59.00 rows=1000 width=4)

EXPLAIN
regression=#

Could you send me the full table schema ("pg_dump -s -t mesg_headers dbname"
would do nicely) as well as the statistics from

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'mesg_headers';

            regards, tom lane

Re: another index question

From
Tom Lane
Date:
OK, indeed this is a bug --- the planner has a bad interaction for LIMIT
clauses used in "IN" subselects.  Will fix.

You could avoid the bug, and get a better plan at the top level anyway,
by writing

SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum = (SELECT mesgnum
FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1);

Since you know the subselect is going to produce exactly one tuple,
there's no need to use IN.

            regards, tom lane

Re: another index question

From
"ADBAAMD"
Date:
Tom Lane wrote:

> OK, indeed this is a bug --- the planner has a bad interaction for LIMIT
> clauses used in "IN" subselects.  Will fix.
>
> You could avoid the bug, and get a better plan at the top level anyway,
> by writing
>
> SELECT frm,rcpt,subject FROM mesg_headers WHERE mesgnum = (SELECT mesgnum
> FROM mesg_headers ORDER BY mesgnum DESC LIMIT 1);
>
> Since you know the subselect is going to produce exactly one tuple,
> there's no need to use IN.

    I don't know if this is a generic database issue, an SQL one, of pgsql's;
but wasn't relational invented so that we shouldn't need to fine tune
access paths?

    More to the point, shouldn't the statistics on tables plus the syntax be
enough for the planner to get this (not so complex) query a good plan,
even if not optimal?

    And if needed shouldn't we get away with hints or something like that
instead of rewriting?

    Sorry for the basic questions, I'm trying to learn.  No criticism of
anyone's work intended, just a general question.



--
  _
/ \   Leandro Guimarães Faria Corsetti Dutra         +55 (11) 3040 8913
\ /   Amdocs at Bell Canada                          +1 (514) 786 87 47
  X    Support Center, São Paulo, Brazil          mailto:adbaamd@bell.ca
/ \   http://terravista.pt./Enseada/1989/    mailto:leandrod@amdocs.com



Re: another index question

From
Sean Harding
Date:
On Sun Apr 01 at 05:22:34 PM, Tom Lane wrote:

> Since you know the subselect is going to produce exactly one tuple,
> there's no need to use IN.

Thanks Tom. That definitely helps a lot.

sean

--
Sean Harding sharding@dogcow.org  | "Never before and never since, I promise,
http://www.dogcow.org/sean/       |  will the whole world be warm as this."
                                  |  -- 10,000 Maniacs

Re: How do I...

From
Paul Tomblin
Date:
How do I compare a char (bpchar?) to a varchar?  I tried using "=", and I
got:

    waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b
    where b.id = a.icao;
    ERROR:  Unable to identify an operator '=' for types 'varchar' and 'bpchar'
            You will have to retype this query using an explicit cast

Why isn't this working?  Can somebody explain how to rephrase this?  It
doesn't work even if I try to cast the varchar to bpchar:

    waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b
    where b.id::char = a.icao;
    ERROR:  Unable to identify an operator '=' for types 'varchar' and 'bpchar'
            You will have to retype this query using an explicit cast

Or the bpchar to varchar:

    waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b
    where b.id = a.icao::varchar;
    ERROR:  Unable to identify an operator '=' for types 'varchar' and 'bpchar'
            You will have to retype this query using an explicit cast

Oh, and if I try and cast them both to varchar, the thing freaks out:

    waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b
    where b.id::varchar = a.icao::varchar;
    pqReadData() -- backend closed the channel unexpectedly.
            This probably means the backend terminated abnormally
            before or while processing the request.
    We have lost the connection to the backend, so further processing is
    impossible.  Terminating.

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
"I know not with what weapons World War III will be fought, but World
War IV will be fought with sticks and stones."
        -- Albert Einstein

Re: another index question

From
Tom Lane
Date:
"ADBAAMD" <adba.amdocs@bell.ca> writes:
> More to the point, shouldn't the statistics on tables plus the syntax be
> enough for the planner to get this (not so complex) query a good plan,
> even if not optimal?

We don't generate very good plans for IN at the moment.  This is on the
TODO list.

            regards, tom lane

Re: How do I...

From
Tom Lane
Date:
Paul Tomblin <ptomblin@xcski.com> writes:
> Oh, and if I try and cast them both to varchar, the thing freaks out:

>     waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b
>     where b.id::varchar = a.icao::varchar;
>     pqReadData() -- backend closed the channel unexpectedly.
>             This probably means the backend terminated abnormally
>             before or while processing the request.

What in the world...!?

What version are you running?  What is the full schema for both tables?
("pg_dump -s -t tablename dbname" is the best way to get the complete
schema info.)

            regards, tom lane

Re: How do I...

From
Tom Lane
Date:
Paul Tomblin <ptomblin@xcski.com> writes:
> How do I compare a char (bpchar?) to a varchar?  I tried using "=", and I
> got:

>     waypoint=> select b.id, a.icao,a.faa_host_id from dafif_arpt a, waypoint b
>     where b.id = a.icao;
>     ERROR:  Unable to identify an operator '=' for types 'varchar' and 'bpchar'
>             You will have to retype this query using an explicit cast

> Why isn't this working?

char and varchar are not directly comparable because they have different
ideas about whether trailing blanks are significant.  Postgres makes you
cast one or the other so that it knows which set of comparison rules to
apply.  Typically you probably want to cast the varchar side to char so
that trailing blanks are ignored.

> It doesn't work even if I try to cast the varchar to bpchar:

I believe you are also running into sundry bugs in 6.5's handling of
these casts.  Should be cleaned up in 7.0.*.

            regards, tom lane