Bug in query planer ? - Mailing list pgsql-bugs

From Clifford Wolf
Subject Bug in query planer ?
Date
Msg-id 200601311859.42586.clifford.wolf@linbit.com
Whole thread Raw
Responses Re: Bug in query planer ?
List pgsql-bugs
Hi,

philipp resiner wrote a mail about this problem yesterday. I've now traced =
it down to something that looks like a bug in the
query planer to me. Please have a look at this and let me know if this is a=
 bug or I am compleatly wrong..

(this is done right after a complete ANALYZE over the database, so the stat=
istics are up-to-date)

    sd-beta=3D> select n_tup_ins, n_tup_del from pg_stat_user_tables where rel=
name =3D 'contractelements';
     n_tup_ins | n_tup_del
    -----------+-----------
         91821 |         0
    (1 row)

    sd-beta=3D> select n_distinct, most_common_vals, most_common_freqs from pg=
_stats where tablename =3D 'contractelements' and attname =3D 'isactiv';
     n_distinct | most_common_vals |  most_common_freqs
    ------------+------------------+----------------------
              2 | {Y,N}            | {0.966467,0.0335333}
    (1 row)

    sd-beta=3D> explain analyze select 1 from contractelements where isActiv =
=3D 'Y';
                                                          QUERY PLAN
    --------------------------------------------------------------------------=
---------------------------------------------
     Seq Scan on contractelements  (cost=3D0.00..4963.76 rows=3D88742 width=3D=
0) (actual time=3D0.014..137.930 rows=3D88838 loops=3D1)
       Filter: ((isactiv)::text =3D 'Y'::text)
     Total runtime: 153.543 ms
    (3 rows)

The query planner estimates that isActiv =3D 'Y' will match 88742 rows. Thi=
s is reasonable (91821 * 0.966467 =3D 88741.966407) and
correct. However, the following case causes some troubles:

    sd-beta=3D> explain analyze select 1 from contractelements where upper(isA=
ctiv) =3D 'Y';
                                                         QUERY PLAN
    --------------------------------------------------------------------------=
-------------------------------------------
     Seq Scan on contractelements  (cost=3D0.00..5193.32 rows=3D459 width=3D0)=
 (actual time=3D0.030..198.493 rows=3D88838 loops=3D1)
       Filter: (upper((isactiv)::text) =3D 'Y'::text)
     Total runtime: 214.035 ms
    (3 rows)

Here we match on upper(isActiv) =3D 'Y' (which is totally braindead, but th=
e query is auto-generated by a customer-supplied
application, so I can not change it). Shouldn't the query planner execute u=
pper(isActiv) for both values in pg_stats
and so come to the same conclusion as in the first case?

It doesn't. Led by this misapprehension the query planner generates pretty =
creative, but unfortunately very suboptimal
query plans.

A 'CREATE INDEX clifford_temp ON contractelements ( upper(isActiv) )' follo=
wed by an 'ANALYZE contractelements' solves the
problem in this particular case. But this is not a solution to the problem =
in general..

Shouldn't the query planner be able to do the right thing without the index=
? Where does the magic 'rows=3D459' come from?

yours,
 - clifford

--=20
: Clifford Wolf            =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Tel +=
43-1-8178292-00 :
: LINBIT Information Technologies GmbH =A0 =A0 =A0 =A0 =A0Fax +43-1-8178292=
-82 :
: Sch=F6nbrunnerstr 244, 1120 Vienna, Austria =A0 =A0http://www.linbit.com :

pgsql-bugs by date:

Previous
From: Philipp Reisner
Date:
Subject: Fwd: Re: Endless loop in ExecNestLoop
Next
From: Michael Fuhr
Date:
Subject: Re: BUG #2224: unlogical syntax error