Thread: Partitioned/inherited tables with check constraints causing slower query plans

Hi,
I'm seeing poor query performance using partitioned tables with check
constraints, seems like the plan is much worse than when querying the
individual partitions manually.

select version(); --> PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu,
compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit

uname -a --> Linux  2.6.32-5-amd64 #1 SMP Mon Oct 3 03:59:20 UTC 2011
x86_64 GNU/Linux

(postgresql.conf included at the end)

Output of EXPLAIN ANALYZE follows, but here are the tables in question:

Here's the empty parent table:

 \d+ ircevents

  Column   |  Type   |       Modifiers
-----------+---------+------------------------
 buffer    | integer | not null
 id        | bigint  | not null  <----------------  microsecs since 1970-01-01
 type      | text    | not null
 highlight | boolean | not null default false
 json      | text    | not null
...
Child tables: ircevents_201008,
              ircevents_201009,
            ...
              ircevents_201211,
              ircevents_201212

And one example child table (they are all the same apart from
non-overlapping check constraints):

\d+ ircevents_201204
                    Table "public.ircevents_201204"
  Column   |  Type   |       Modifiers        | Storage  | Description
-----------+---------+------------------------+----------+-------------
 buffer    | integer | not null               | plain    |
 id        | bigint  | not null               | plain    |
 type      | text    | not null               | extended |
 highlight | boolean | not null default false | plain    |
 json      | text    | not null               | extended |
Indexes:
    "ircevents_201204_idx" UNIQUE, btree (buffer, id)
    "ircevents_201204_highlight_idx" btree (highlight) WHERE highlight
= true
Check constraints:
    "ircevents_201204_id_check" CHECK (id >= (date_part('epoch'::text,
'2012-04-01 00:00:00'::timestamp without time zone)::bigint * 1000000)
AND id < (date_part('epoch'::text, '2012-05-01 00:00:00'::timestamp
without time zone)::bigint * 1000000))
Inherits: ircevents


The tables experience heavy insert/select load for the month in
question, then less selects after that. update/delete to these tables
is very rare.
The ircevents_201204 table has ~200 million rows

Let's use a 20-day range spanning only one month:

ircevents=# select date_part('epoch', '2012-04-02'::timestamp without
time zone)::bigint * 1000000;
     ?column?
------------------
 1333317600000000
(1 row)

ircevents=# select date_part('epoch', '2012-04-22'::timestamp without
time zone)::bigint * 1000000;
     ?column?
------------------
 1335045600000000
(1 row)


The next two queries are the crux of the problem for me:


EXPLAIN ANALYZE
SELECT id, type, json
FROM ircevents
WHERE
buffer = 116780 AND
id BETWEEN 1333317600000000 AND 1335045600000000
ORDER BY id DESC limit 100;


    QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=80506.28..80506.53 rows=100 width=134) (actual
time=0.179..0.196 rows=40 loops=1)
   ->  Sort  (cost=80506.28..80558.01 rows=20692 width=134) (actual
time=0.178..0.185 rows=40 loops=1)
         Sort Key: public.ircevents.id
         Sort Method: quicksort  Memory: 33kB
         ->  Result  (cost=0.00..79715.45 rows=20692 width=134)
(actual time=0.039..0.121 rows=40 loops=1)
               ->  Append  (cost=0.00..79715.45 rows=20692 width=134)
(actual time=0.037..0.111 rows=40 loops=1)
                     ->  Seq Scan on ircevents  (cost=0.00..0.00
rows=1 width=72) (actual time=0.001..0.001 rows=0 loops=1)
                           Filter: ((id >= 1333317600000000::bigint)
AND (id <= 1335045600000000::bigint) AND (buffer = 116780))
                     ->  Bitmap Heap Scan on ircevents_201204
ircevents  (cost=914.36..79715.45 rows=20691 width=134) (actual
time=0.035..0.103 rows=40 loops=1)
                           Recheck Cond: ((buffer = 116780) AND (id >=
1333317600000000::bigint) AND (id <= 1335045600000000::bigint))
                           ->  Bitmap Index Scan on
ircevents_201204_idx  (cost=0.00..909.18 rows=20691 width=0) (actual
time=0.023..0.023 rows=40 loops=1)
                                 Index Cond: ((buffer = 116780) AND
(id >= 1333317600000000::bigint) AND (id <= 1335045600000000::bigint))
 Total runtime: 0.243 ms
(13 rows)

(note that the above plan demonstrates that constraint exclusion is
active, since it only queries the empty parent table, and the
appropriate partitioned table)

Compare the cost of that vs. specifying the partitioned table manually:


EXPLAIN ANALYZE
SELECT id, type, json
FROM ircevents_201204
WHERE buffer = 116780 AND
id BETWEEN 1333317600000000 AND 1335045600000000
ORDER BY id DESC limit 100;


    QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..404.02 rows=100 width=134) (actual
time=0.024..0.071 rows=40 loops=1)
   ->  Index Scan Backward using ircevents_201204_idx on
ircevents_201204  (cost=0.00..83595.11 rows=20691 width=134) (actual
time=0.023..0.062 rows=40 loops=1)
         Index Cond: ((buffer = 116780) AND (id >=
1333317600000000::bigint) AND (id <= 1335045600000000::bigint))
 Total runtime: 0.102 ms
(4 rows)

Querying the partition directly uses an "index scan backward", which
seems the best approach.

I see similar plans if the id range spans multiple tables - and I get
a much more efficient plan if manually construct a query by UNIONing
all the relevant partitions together.


Is there anything I can do to make querying to parent table in this
fashion use "Index Scan Backward" on the appropriate partitions, and
thus be as fast as querying the partitions directly?

Thanks,

RJ

PS Here is my postgresql.conf: (the server has 16GB/ and two mirrored
pairs of disks with pg_xlog on a different pair to the data)

listen_addresses = '*'
max_connections = 20                    # (change requires restart)
shared_buffers = 3GB                    # min 128kB
work_mem = 50MB                         # min 64kB
maintenance_work_mem = 500MB            # min 1MB "50mb per gig, ish"
synchronous_commit = off                # synchronization level; on,
off, or local
wal_buffers = 16MB                      # min 32kB, -1 sets based on
shared_buffers
checkpoint_segments = 100               # in logfile segments, min 1, 16MB each
effective_cache_size = 8GB              # how much is the OS gonna use
for caching disk stuff?
log_min_duration_statement = 500        # -1 is disabled, 0 logs all statements
track_activities = on
track_counts = on
track_functions = pl                    # none, pl, all
update_process_title = on
datestyle = 'iso, mdy'
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting
default_text_search_config = 'pg_catalog.english'

Richard Jones <rj@metabrew.com> writes:
> I'm seeing poor query performance using partitioned tables with check
> constraints, seems like the plan is much worse than when querying the
> individual partitions manually.

> select version(); --> PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu,
> compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit

I get a reasonable-looking plan when I try to duplicate this issue in
9.1 branch tip.  I think the reason you're not getting the right
behavior is that you are missing this as-yet-unreleased patch:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=ef03b34550e3577c4be3baa25b70787f5646c57b
which means it can't figure out that the available index on the child
table produces the desired sort order.  If you're in a position to
compile from source, a current nightly snapshot of the 9.1 branch
ought to work for you; otherwise, wait for 9.1.4.

(Note: although that patch is a one-liner, I would *not* recommend
trying to just cherry-pick the patch by itself; I think it probably
interacts with other planner fixes made since 9.1.1.)

            regards, tom lane

On 4 May 2012 17:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I get a reasonable-looking plan when I try to duplicate this issue in
> 9.1 branch tip.  I think the reason you're not getting the right
> behavior is that you are missing this as-yet-unreleased patch:
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=ef03b34550e3577c4be3baa25b70787f5646c57b
> which means it can't figure out that the available index on the child
> table produces the desired sort order.  If you're in a position to
> compile from source, a current nightly snapshot of the 9.1 branch
> ought to work for you; otherwise, wait for 9.1.4.


Thanks, this did the trick - here's the output when I switched to 9.1 snapshot:

ircevents=# select version();
                                           version
----------------------------------------------------------------------------------------------
 PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
4.4.5-8) 4.4.5, 64-bit
(1 row)

ircevents=# explain analyze SELECT id, type, json FROM ircevents WHERE
buffer = 116780 AND id BETWEEN 1325458800000000 AND 1330642800000000
ORDER BY id DESC LIMIT 100;

                QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------
 Limit  (cost=0.05..202.45 rows=100 width=135) (actual
time=176.429..237.766 rows=100 loops=1)
   ->  Result  (cost=0.05..68161.99 rows=33677 width=135) (actual
time=176.426..237.735 rows=100 loops=1)
         ->  Merge Append  (cost=0.05..68161.99 rows=33677 width=135)
(actual time=176.426..237.708 rows=100 loops=1)
               Sort Key: public.ircevents.id
               ->  Sort  (cost=0.01..0.02 rows=1 width=72) (actual
time=0.009..0.009 rows=0 loops=1)
                     Sort Key: public.ircevents.id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on ircevents  (cost=0.00..0.00
rows=1 width=72) (actual time=0.001..0.001 rows=0 loops=1)
                           Filter: ((id >= 1325458800000000::bigint)
AND (id <= 1330642800000000::bigint) AND (buffer = 116780))
               ->  Index Scan Backward using ircevents_201201_idx on
ircevents_201201 ircevents  (cost=0.00..8811.15 rows=2181 width=133)
(actual time=76.356..136.91
7 rows=12 loops=1)
                     Index Cond: ((buffer = 116780) AND (id >=
1325458800000000::bigint) AND (id <= 1330642800000000::bigint))
               ->  Index Scan Backward using ircevents_201202_idx on
ircevents_201202 ircevents  (cost=0.00..54963.83 rows=30613 width=135)
(actual time=47.333..48.0
25 rows=88 loops=1)
                     Index Cond: ((buffer = 116780) AND (id >=
1325458800000000::bigint) AND (id <= 1330642800000000::bigint))
               ->  Index Scan Backward using ircevents_201203_idx on
ircevents_201203 ircevents  (cost=0.00..3629.22 rows=882 width=134)
(actual time=52.724..52.724
rows=0 loops=1)
                     Index Cond: ((buffer = 116780) AND (id >=
1325458800000000::bigint) AND (id <= 1330642800000000::bigint))
 Total runtime: 237.889 ms
(16 rows)


So yes, it's using "index scan backwards" - and fixes my problem, thanks!

Bit reluctant to put the machine into production with a non-release
version of postgres, I'll wait for 9.1.4 to make an official
appearance.

Regards,
RJ