Re: 7.4.7: strange planner decision - Mailing list pgsql-general

From Roman Neuhauser
Subject Re: 7.4.7: strange planner decision
Date
Msg-id 20050713141427.GC63397@isis.sigpipe.cz
Whole thread Raw
In response to Re: 7.4.7: strange planner decision  (Roman Neuhauser <neuhauser@sigpipe.cz>)
List pgsql-general
# neuhauser@sigpipe.cz / 2005-07-13 15:58:09 +0200:
> # dev@archonet.com / 2005-07-13 14:09:34 +0100:
> > Roman Neuhauser wrote:
> > >    callrec32=# \d fix.files
> > >                  Table "fix.files"
> > >     Column |          Type          | Modifiers
> > >    --------+------------------------+-----------
> > >     dir    | character varying(255) |
> > >     base   | character varying(255) |
> > >    Indexes:
> > >        "base_storename_idx" btree (base, ((((dir)::text || '/'::text) ||
> > >        (base)::text)))
> > >        "ff_baseonly_idx" btree (base)
> > >        "ff_storename_idx" btree (((((dir)::text || '/'::text) ||
> > >        (base)::text)))
> > >
> > >    callrec32=# explain select fd.base from fix.dups fd join fix.files ff
> > >    using (base);
> > >                                     QUERY PLAN
> > >    ----------------------------------------------------------------------------
> > >     Hash Join  (cost=5340.00..292675.06 rows=176161 width=44)
> > >       Hash Cond: (("outer".base)::text = ("inner".base)::text)
> > >       ->  Seq Scan on files ff  (cost=0.00..117301.58 rows=5278458
> > >       width=41)
> > >       ->  Hash  (cost=3436.60..3436.60 rows=176160 width=44)
> > >             ->  Seq Scan on dups fd  (cost=0.00..3436.60 rows=176160
> > >             width=44)
> > >    (5 rows)
>
> > What happens to the plan if you SET enable_seqscan=false; first? It's
> > presumably getting the row-estimate right, so unless there's terrible
> > correlation on "base" in the files table I can only assume it's getting
> > the cost estimates horribly wrong.
>
> callrec32=# SET enable_seqscan=false;
> SET
> callrec32=# explain select fd.base from fix.dups fd join fix.files ff using (base);
>                                          QUERY PLAN
> --------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..1066990.93 rows=176161 width=44)
>    ->  Index Scan using dups_base_key on dups fd  (cost=0.00..5570.86 rows=176160 width=44)
>    ->  Index Scan using ff_baseonly_idx on files ff  (cost=0.00..6.01 rows=1 width=41)
>          Index Cond: (("outer".base)::text = (ff.base)::text)
> (4 rows)

    BTW, this query or its equivalent is what I'm really after:

    callrec32=# explain select c1.storename from fix.dups fd join calls
    c1 on (fd.base = basename(c1.storename) and c1.iscouple = '1') where
    not exists (select 1 from fix.files ff where c1.storename = ff.dir
    || '/' || ff.base);
                                                   QUERY PLAN
    --------------------------------------------------------------------------------------------------------
     Hash Join  (cost=7474.26..23127970.91 rows=2354719 width=60)
       Hash Cond: ((basename("outer".storename))::text = ("inner".base)::text)
       ->  Index Scan using calls2_iscouple_idx on calls c1  (cost=0.00..22982439.69 rows=2354719 width=60)
             Filter: ((iscouple = 1::smallint) AND (NOT (subplan)))
             SubPlan
               ->  Index Scan using ff_storename_idx on files ff  (cost=0.00..88570.16 rows=26393 width=0)
                     Index Cond: (($0)::text = (((dir)::text || '/'::text) || (base)::text))
       ->  Hash  (cost=5570.86..5570.86 rows=176160 width=44)
             ->  Index Scan using dups_base_key on dups fd  (cost=0.00..5570.86 rows=176160 width=44)
    (9 rows)

    calls has (among others):

    "calls2_basename_storename_idx" btree (basename(storename), storename) WHERE (iscouple = (1)::smallint)
    "calls2_storename_idx" btree (storename) WHERE (iscouple = (1)::smallint)
    "calls2_iscouple_idx" btree (id) WHERE (iscouple = (1)::smallint)

    WHy does it use the calls2_iscouple_idx index when calls.id isn't
    used anywhere in the query? I would guess that calls2_storename_idx
    would actually be more useful.

--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Array as parameter for plpgsql function
Next
From: Tom Lane
Date:
Subject: Re: 7.4.7: strange planner decision