Thread: Recent 7.4 change slowed down a query by a factor of 3

Recent 7.4 change slowed down a query by a factor of 3

From
Bruno Wolff III
Date:
The query below was running in a bit under 300ms on a version of 7.4
from less than a week ago until I updated to the version from last night.
Now it takes about 800ms using a significantly different plan.
The query is:
explain analyze
  select count(1)
    from
      (select distinct on (areaid) touched
        from crate
        order by areaid desc, touched desc)
        as current
    where touched >= localtimestamp + '10 year ago'
    group by touched >= localtimestamp + '2 year ago'
    order by touched >= localtimestamp + '2 year ago' desc;

I don't have the earlier version of 7.4 around, but I get the better plan
in 7.3.3.
                                version
------------------------------------------------------------------------
 PostgreSQL 7.4devel on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)

                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1756.33..1756.50 rows=67 width=19) (actual time=795.64..795.65 rows=2 loops=1)
   Sort Key: (touched >= (('now'::text)::timestamp(6) without time zone + '-2 years'::interval))
   ->  HashAggregate  (cost=1753.46..1754.30 rows=67 width=19) (actual time=795.48..795.48 rows=2 loops=1)
         ->  Subquery Scan current  (cost=1624.62..1737.38 rows=3216 width=19) (actual time=631.84..784.75 rows=5339
loops=1)
               Filter: (touched >= (('now'::text)::timestamp(6) without time zone + '-10 years'::interval))
               ->  Unique  (cost=1624.62..1705.22 rows=3216 width=19) (actual time=631.72..713.66 rows=5364 loops=1)
                     ->  Sort  (cost=1624.62..1664.92 rows=16119 width=19) (actual time=631.72..639.77 rows=16119
loops=1)
                           Sort Key: areaid, touched
                           ->  Seq Scan on crate  (cost=0.00..498.19 rows=16119 width=19) (actual time=0.02..48.85
rows=16119loops=1) 
 Total runtime: 800.88 msec
(10 rows)

                      Table "public.crate"
 Column  |            Type             |       Modifiers
---------+-----------------------------+------------------------
 areaid  | text                        | not null
 gameid  | text                        | not null
 rate    | integer                     | not null default 5000
 frq     | integer                     | not null default 0
 opp     | integer                     | not null default 0
 rmp     | integer                     | not null default 0
 trn     | integer                     | not null default 0
 rp      | text                        |
 gm      | text                        |
 touched | timestamp without time zone | not null default 'now'
Indexes:
    "crate_pkey" PRIMARY KEY btree (areaid, gameid),
    "crate_game" btree (gameid, areaid),
    "crate_touched" btree (areaid, touched)
Check Constraints:
    "rate_nonnegative" CHECK (rate >= 0),
    "rate_other_interested" CHECK ((frq > 0) OR (rate = 5000)),
    "frq_nonnegative" CHECK (frq >= 0),
    "opp_nonnegative" CHECK (opp >= 0),
    "rmp_nonnegative" CHECK (rmp >= 0),
    "trn_nonnegative" CHECK (trn >= 0)
Foreign Key Constraints:
    "bad_areaid" FOREIGN KEY (areaid) REFERENCES cname(areaid),
    "bad_gameid" FOREIGN KEY (gameid) REFERENCES games(gameid)

                               version
---------------------------------------------------------------------
 PostgreSQL 7.3.3 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)

                                                                                 QUERY PLAN
                                                    

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1094.46..1094.87 rows=161 width=19) (actual time=274.17..274.18 rows=2 loops=1)
   Sort Key: (touched >= (('now'::text)::timestamp(6) without time zone + '-2 years'::interval))
   ->  Aggregate  (cost=1076.46..1088.55 rows=161 width=19) (actual time=263.78..274.09 rows=2 loops=1)
         ->  Group  (cost=1076.46..1084.52 rows=1612 width=19) (actual time=255.12..269.69 rows=5339 loops=1)
               ->  Sort  (cost=1076.46..1080.49 rows=1612 width=19) (actual time=255.11..258.09 rows=5339 loops=1)
                     Sort Key: (touched >= (('now'::text)::timestamp(6) without time zone + '-2 years'::interval))
                     ->  Subquery Scan current  (cost=0.00..990.59 rows=1612 width=19) (actual time=0.12..240.81
rows=5339loops=1) 
                           Filter: (touched >= (('now'::text)::timestamp(6) without time zone + '-10 years'::interval))
                           ->  Unique  (cost=0.00..990.59 rows=1612 width=19) (actual time=0.04..159.11 rows=5364
loops=1)
                                 ->  Index Scan Backward using crate_touched on crate  (cost=0.00..950.30 rows=16119
width=19)(actual time=0.04..82.15 rows=16119 loops=1) 
 Total runtime: 275.32 msec
(11 rows)

                      Table "public.crate"
 Column  |            Type             |       Modifiers
---------+-----------------------------+------------------------
 areaid  | text                        | not null
 gameid  | text                        | not null
 rate    | integer                     | not null default 5000
 frq     | integer                     | not null default 0
 opp     | integer                     | not null default 0
 rmp     | integer                     | not null default 0
 trn     | integer                     | not null default 0
 rp      | text                        |
 gm      | text                        |
 touched | timestamp without time zone | not null default 'now'
Indexes: crate_pkey primary key btree (areaid, gameid),
         crate_game btree (gameid, areaid),
         crate_touched btree (areaid, touched)
Check constraints: "trn_nonnegative" (trn >= 0)
                   "rmp_nonnegative" (rmp >= 0)
                   "opp_nonnegative" (opp >= 0)
                   "frq_nonnegative" (frq >= 0)
                   "rate_other_interested" ((frq > 0) OR (rate = 5000))
                   "rate_nonnegative" (rate >= 0)
Foreign Key constraints: bad_gameid FOREIGN KEY (gameid) REFERENCES games(gameid) ON UPDATE NO ACTION ON DELETE NO
ACTION,
                         bad_areaid FOREIGN KEY (areaid) REFERENCES cname(areaid) ON UPDATE NO ACTION ON DELETE NO
ACTION


Re: Recent 7.4 change slowed down a query by a factor of 3

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> The query below was running in a bit under 300ms on a version of 7.4
> from less than a week ago until I updated to the version from last night.
> Now it takes about 800ms using a significantly different plan.

Something fishy here.  Will it use the right plan if you set
enable_seqscan off?

I did

bogus=# create table crate(areaid text, touched timestamp);
CREATE TABLE
bogus=# create index crate_touched on crate(areaid, touched);
CREATE INDEX

and then explained your query:

 GroupAggregate  (cost=64.14..66.48 rows=67 width=40)
   ->  Sort  (cost=64.14..64.64 rows=200 width=40)
         Sort Key: (touched >= (('now'::text)::timestamp(6) without time zone + '-2 years'::interval))
         ->  Subquery Scan current  (cost=0.00..56.50 rows=200 width=40)
               Filter: (touched >= (('now'::text)::timestamp(6) without time zone + '-10 years'::interval))
               ->  Unique  (cost=0.00..54.50 rows=200 width=40)
                     ->  Index Scan Backward using crate_touched on crate  (cost=0.00..52.00 rows=1000 width=40)

which looks perfectly reasonable.  Obviously, with no data or statistics
the estimates are not to be trusted, but it sure looks to me like CVS
tip should still be able to generate the right plan.  Did you do a full
'make clean' and rebuild when you updated?

            regards, tom lane

Re: Recent 7.4 change slowed down a query by a factor of 3

From
Bruno Wolff III
Date:
On Wed, Jun 18, 2003 at 11:18:39 -0400,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> > The query below was running in a bit under 300ms on a version of 7.4
> > from less than a week ago until I updated to the version from last night.
> > Now it takes about 800ms using a significantly different plan.
>
> Something fishy here.  Will it use the right plan if you set
> enable_seqscan off?
>
> I did
>
> bogus=# create table crate(areaid text, touched timestamp);
> CREATE TABLE
> bogus=# create index crate_touched on crate(areaid, touched);
> CREATE INDEX
>
> and then explained your query:
>
>  GroupAggregate  (cost=64.14..66.48 rows=67 width=40)
>    ->  Sort  (cost=64.14..64.64 rows=200 width=40)
>          Sort Key: (touched >= (('now'::text)::timestamp(6) without time zone + '-2 years'::interval))
>          ->  Subquery Scan current  (cost=0.00..56.50 rows=200 width=40)
>                Filter: (touched >= (('now'::text)::timestamp(6) without time zone + '-10 years'::interval))
>                ->  Unique  (cost=0.00..54.50 rows=200 width=40)
>                      ->  Index Scan Backward using crate_touched on crate  (cost=0.00..52.00 rows=1000 width=40)
>
> which looks perfectly reasonable.  Obviously, with no data or statistics
> the estimates are not to be trusted, but it sure looks to me like CVS
> tip should still be able to generate the right plan.  Did you do a full
> 'make clean' and rebuild when you updated?

I did a make distclean. I didn't do an initdb as I was able to restart
the database without a problem. I also tried a simpler query just doing
the distinct on without a where clause and the backwards index scan still
wasn't used.

I will try an initdb and then if that doesn't change things I will fetch
a new copy of the code from CVS, do another initdb and see what happens.

Re: Recent 7.4 change slowed down a query by a factor of 3

From
Bruno Wolff III
Date:
On Wed, Jun 18, 2003 at 11:18:39 -0400,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> > The query below was running in a bit under 300ms on a version of 7.4
> > from less than a week ago until I updated to the version from last night.
> > Now it takes about 800ms using a significantly different plan.
>
> Something fishy here.  Will it use the right plan if you set
> enable_seqscan off?

This got it to use the backward index scan.

Re: Recent 7.4 change slowed down a query by a factor of 3

From
Bruno Wolff III
Date:
On Wed, Jun 18, 2003 at 11:18:39 -0400,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> > The query below was running in a bit under 300ms on a version of 7.4
> > from less than a week ago until I updated to the version from last night.
> > Now it takes about 800ms using a significantly different plan.
>
> Something fishy here.  Will it use the right plan if you set
> enable_seqscan off?

After doing an initdb I got the expected plan.

This is a static db used for dynamic web pages and the script that loads
the db does a vacumm analyze at the end. So the db should have had the
information needed to pick the correct plan. Possibly something changed
that affected the information needed for planning, but the value used
to indicate an initdb was needed wasn't changed.

In the future if I see odd stuff I will try doing an initdb before reporting
a potential problem. Thanks for your help.

Re: Recent 7.4 change slowed down a query by a factor of 3

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> After doing an initdb I got the expected plan.

Hm.  I'm not sure what happened there --- I don't recall that we made
any initdb-needing changes in the past week or so.  (If we did, we
should have forced initdb by incrementing catversion, but sometimes
people forget to do that.)  The only change I can think of that's
related at all is that the outer query's "group by foo order by foo desc"
should now only require one sort step not two (which is probably why
my test went for the Sort/GroupAggregate plan not the HashAgg/Sort
plan you showed).  But that shouldn't have affected the plan for the
inner SELECT DISTINCT query, AFAICS.  Odd.

Probably not worth spending time on though.

            regards, tom lane