GROUP BY with reasonable timings in PLAN but unreasonable execution time

From: Clem Dickey
Subject: GROUP BY with reasonable timings in PLAN but unreasonable execution time
Date: ,
Msg-id: iv0h49$1var$1@news.hub.org
(view: Whole thread, Raw)
Responses: Re: GROUP BY with reasonable timings in PLAN but unreasonable execution time  (Clem Dickey)
List: pgsql-performance


I have a query which seems to be taking an extraordinarily long time
(many minutes, at least) when seemingly equivalent queries have
different plans and execute in seconds. naturally, I'd like to know why.

Version is Postgresql 8.4.8. The table, "t", is

  Column |  Type   | Modifiers
--------+---------+-----------
  y      | integer | not null
  x      | integer | not null
  k      | integer | not null
  j      | integer | not null
  z      | integer | not null
Indexes:
     "t_pkey" PRIMARY KEY, btree (j, k, x, y, z)

The table population, in pseudocode, is this:
   for x in 0..9
     for y in 0..9999
       for z in 0..29
         INSERT INTO t VALUES(y,x,0,0,z)

So the table has 300000 entries, with j and k always 0.

The query is:

  SELECT *
    FROM (
     SELECT * FROM t GROUP BY j,k,x,z,y
    ) AS f
    NATURAL JOIN t;

The plan:

  Merge Join  (cost=44508.90..66677.96 rows=1 width=20)
   Merge Cond: ((public.t.j = public.t.j) AND (public.t.k = public.t.k)
                AND (public.t.x = public.t.x))
   Join Filter: ((public.t.y = public.t.y) AND (public.t.z = public.t.z))
   -> Group (cost=44508.90..49008.90 rows=30000 width=20)
      ->  Sort (cost=44508.90..45258.90 rows=300000 width=20)
         Sort Key: public.t.j, public.t.k, public.t.x, public.t.z,
                   public.t.y
         ->  Seq Scan on t  (cost=0.00..4911.00 rows=300000 width=20)
   ->  Index Scan using t_pkey on t  (cost=0.00..14877.18 rows=300000
                                      width=20)

This query runs at least 20 minutes, with postmaster CPU utilization at
99%, without completing. System is a 3.2GHz Zeon, 3GB memory, and not
much else running.

By contrast, placing an intermediate result in a table "u" provides a
result in about 3 seconds:

  CREATE TEMPORARY TABLE u AS SELECT * FROM t GROUP BY j,k,x,z,y;
  SELECT * FROM u NATURAL JOIN t;

Changing the order of the GROUP BY clause varies the plan, sometimes
yielding shorter execution times. For example, this ordering executes in
about 1.5 seconds:

  SELECT *
    FROM (
     SELECT * FROM t GROUP BY j,k,x,y,z
    ) AS f
    NATURAL JOIN t;

With 120 permutations, I didn't try them all.

I should note that the plans tend to have similar costs, so the query
planner presumably does not know that some permutations have
significantly greater execution times.

Clem Dickey


pgsql-performance by date:

From: Samuel Gendler
Date:
Subject: Re: Query in 9.0.2 not using index in 9.0.0 works fine
From: bakkiya
Date:
Subject: Re: 100% CPU Utilization when we run queries.