Bad query plans for queries on partitioned table - Mailing list pgsql-performance

From Julian Mehnle
Subject Bad query plans for queries on partitioned table
Date
Msg-id 200712041944.13543.julian@mehnle.net
Whole thread Raw
Responses Re: Bad query plans for queries on partitioned table
Re: Bad query plans for queries on partitioned table
List pgsql-performance
Hi all,

I have a large database with e-mail meta-data (no bodies) for over 100
million messages.  I am running PostgreSQL 8.2.4 on a server with 2GB of
RAM (shared_buffers = 240MB, temp_buffers = 128MB, work_mem = 256MB,
maintenance_work_mem = 256MB).  I have the data split in two separate
tables, "email" and "email_extras":

                      Table "public.email"
        Column       |            Type             | Modifiers
  -------------------+-----------------------------+-----------
   id                | bigint                      | not null
   load_id           | integer                     | not null
   ts                | timestamp without time zone | not null
   ip                | inet                        | not null
   mfrom             | text                        | not null
   helo              | text                        |

                   Table "public.email_extras"
        Column       |            Type             | Modifiers
  -------------------+-----------------------------+-----------
   id                | bigint                      | not null
   ts                | timestamp without time zone | not null
   size              | integer                     | not null
   hdr_from          | text                        |

Each of these tables has been partitioned equally based on the "ts"
(timestamp) field into two dozen or so tables, each covering one week of
messages.  For testing purposes, I have only one week's partition filled
for each of the "email" and "email_extras" tables (email_2007_week34
{,extras}).

Now if I perform the following simple join on the "email" and "email_
extras" tables ...

  SELECT
      count(*)
  FROM
      email
      INNER JOIN email_extras USING (id, ts)
  WHERE
      mfrom <> hdr_from;

then I get the following horrendously inefficient plan:

                                                          QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
   Aggregate  (cost=391396890.89..391396890.90 rows=1 width=0)
     ->  Merge Join  (cost=9338881.64..349156398.02 rows=16896197148 width=0)
           Merge Cond: ((public.email_extras.id = public.email.id) AND (public.email_extras.ts = public.email.ts))
           Join Filter: (public.email.mfrom <> public.email_extras.hdr_from)
           ->  Sort  (cost=4592966.95..4658121.33 rows=26061752 width=48)
                 Sort Key: public.email_extras.id, public.email_extras.ts
                 ->  Append  (cost=0.00..491969.52 rows=26061752 width=48)
                       ->  Seq Scan on email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week13_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week14_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week15_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week16_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week17_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week18_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week19_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week20_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week21_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week22_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week23_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week24_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week25_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week26_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week27_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week28_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week29_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week30_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week31_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week32_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week33_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week34_extras email_extras  (cost=0.00..491597.12 rows=26052512
width=33)
                       ->  Seq Scan on email_2007_week35_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week36_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week37_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week38_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week39_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
                       ->  Seq Scan on email_2007_week40_extras email_extras  (cost=0.00..13.30 rows=330 width=48)
           ->  Sort  (cost=4745914.69..4811071.87 rows=26062872 width=48)
                 Sort Key: public.email.id, public.email.ts
                 ->  Append  (cost=0.00..644732.72 rows=26062872 width=48)
                       ->  Seq Scan on email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week13 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week14 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week15 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week16 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week17 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week18 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week19 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week20 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week21 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week22 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week23 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week24 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week25 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week26 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week27 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week28 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week29 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week30 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week31 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week32 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week33 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week34 email  (cost=0.00..644349.12 rows=26052512 width=33)
                       ->  Seq Scan on email_2007_week35 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week36 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week37 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week38 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week39 email  (cost=0.00..13.70 rows=370 width=48)
                       ->  Seq Scan on email_2007_week40 email  (cost=0.00..13.70 rows=370 width=48)
  (68 rows)

However, if I restrict the query to just the partitions that actually do
have data in them ...

  SELECT
      count(*)
  FROM
      email_2007_week34
      INNER JOIN email_2007_week34_extras USING (id, ts)
  WHERE
      mfrom <> hdr_from;

then I get a much better plan that uses a hash join:

                                                               QUERY PLAN
  

------------------------------------------------------------------------------------------------------------------------------------
   Aggregate  (cost=4266338.94..4266338.95 rows=1 width=0)
     ->  Hash Join  (cost=1111362.80..4266336.07 rows=1145 width=0)
           Hash Cond: ((email_2007_week34.ts = email_2007_week34_extras.ts) AND (email_2007_week34.id =
email_2007_week34_extras.id))
           Join Filter: (email_2007_week34.mfrom <> email_2007_week34_extras.hdr_from)
           ->  Seq Scan on email_2007_week34  (cost=0.00..644349.12 rows=26052512 width=33)
           ->  Hash  (cost=491597.12..491597.12 rows=26052512 width=33)
                 ->  Seq Scan on email_2007_week34_extras  (cost=0.00..491597.12 rows=26052512 width=33)
  (7 rows)

Yes, I have `ANALYZE`d the database before running the queries.

How come the query planner gets thrown off that far by the simple table
partitioning?  What can I do to put the query planner back on the right
track?

Julian.

Attachment

pgsql-performance by date:

Previous
From: Gregory Stark
Date:
Subject: Re: RAID arrays and performance
Next
From: Julian Mehnle
Date:
Subject: Re: Bad query plans for queries on partitioned table