BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE - Mailing list pgsql-bugs
| From | PG Bug reporting form |
|---|---|
| Subject | BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE |
| Date | |
| Msg-id | 15779-fd2fb982346b7a22@postgresql.org Whole thread Raw |
| Responses |
Re: BUG #15779: Partition elimination doesn't work as expected whenusing PARTITION BY RANGE
|
| List | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 15779
Logged by: Damir Ciganović-Janković
Email address: damir.ciganovic.jankovic@gmail.com
PostgreSQL version: 11.2
Operating system: Windows, Centos 6/7
Description:
We have an issue that our queries are slow due to bad partition elimination
when executing queries for our partitioned tables, here is an example, I
will put all explain query outputs later for each explain separately:
------------
DROP TABLE IF EXISTS test;
DROP TYPE IF EXISTS test_enum;
CREATE TYPE test_enum AS ENUM ('FIRST', 'SECOND');
CREATE TABLE test (enum_col test_enum, timestamp_col timestamp, counter int)
PARTITION BY RANGE (enum_col, timestamp_col);
CREATE TABLE test_FIRST_1 PARTITION OF test FOR VALUES FROM ('FIRST',
'2019-01-01') TO ('FIRST', '2019-01-02');
CREATE TABLE test_FIRST_2 PARTITION OF test FOR VALUES FROM ('FIRST',
'2019-01-02') TO ('FIRST', '2019-01-03');
CREATE TABLE test_SECOND_1 PARTITION OF test FOR VALUES FROM ('SECOND',
'2019-01-01') TO ('SECOND', '2019-01-02');
CREATE TABLE test_SECOND_2 PARTITION OF test FOR VALUES FROM ('SECOND',
'2019-01-02') TO ('SECOND', '2019-01-03');
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col = 'FIRST';
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col IN ('FIRST', 'SECOND');
------------
This is what we get for first explain:
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
Append (cost=0.00..132.68 rows=36 width=16) (actual time=0.044..0.044
rows=0 loops=1)
-> Seq Scan on test_first_1 (cost=0.00..33.13 rows=9 width=16) (actual
time=0.017..0.017 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
-> Seq Scan on test_first_2 (cost=0.00..33.13 rows=9 width=16) (actual
time=0.010..0.010 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
-> Seq Scan on test_second_1 (cost=0.00..33.13 rows=9 width=16) (actual
time=0.008..0.008 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
-> Seq Scan on test_second_2 (cost=0.00..33.13 rows=9 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
As we can see, we got all partitions despite setting timestamp. When we add
an enum value, partition pruning is fine:
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col = 'FIRST';
Append (cost=0.00..37.76 rows=1 width=16) (actual time=0.015..0.015 rows=0
loops=1)
-> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.014..0.014 rows=0 loops=1)
Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone) AND (enum_col = 'FIRST'::test_enum))
Another bug here is when we put both enum values in IN clause, we still go
through all partitions despite timestamp targeting only _1 partitions:
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col IN ('FIRST', 'SECOND');
Append (cost=0.00..151.02 rows=4 width=16) (actual time=0.083..0.083
rows=0 loops=1)
-> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.045..0.045 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
-> Seq Scan on test_first_2 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.012..0.012 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
-> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.012..0.012 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
-> Seq Scan on test_second_2 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.011..0.011 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
We could fix this by adding check constraints. But I think that it should
work without adding constraints, that's why it seems like a bug to me.
------------
ALTER TABLE test_FIRST_1 ADD CONSTRAINT ck_enum_col CHECK(enum_col = 'FIRST'
AND timestamp_col >='2019-01-01' AND timestamp_col < '2019-01-02');
ALTER TABLE test_FIRST_2 ADD CONSTRAINT ck_enum_col CHECK(enum_col = 'FIRST'
AND timestamp_col >='2019-01-02' AND timestamp_col < '2019-01-03');
ALTER TABLE test_SECOND_1 ADD CONSTRAINT ck_enum_col CHECK(enum_col =
'SECOND' AND timestamp_col >='2019-01-01' AND timestamp_col <
'2019-01-02');
ALTER TABLE test_SECOND_2 ADD CONSTRAINT ck_enum_col CHECK(enum_col =
'SECOND' AND timestamp_col >='2019-01-02' AND timestamp_col <
'2019-01-03');
-- same explains
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col = 'FIRST';
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col IN ('FIRST', 'SECOND');
------------
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
Append (cost=0.00..66.34 rows=18 width=16) (actual time=0.034..0.034
rows=0 loops=1)
-> Seq Scan on test_first_1 (cost=0.00..33.13 rows=9 width=16) (actual
time=0.018..0.018 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
-> Seq Scan on test_second_1 (cost=0.00..33.13 rows=9 width=16) (actual
time=0.014..0.014 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col = 'FIRST';
Append (cost=0.00..37.76 rows=1 width=16) (actual time=0.013..0.013 rows=0
loops=1)
-> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.013..0.013 rows=0 loops=1)
Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone) AND (enum_col = 'FIRST'::test_enum))
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col IN ('FIRST', 'SECOND');
Append (cost=0.00..75.51 rows=2 width=16) (actual time=0.037..0.037 rows=0
loops=1)
-> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.025..0.025 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
-> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.010..0.010 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
#######################################################
BONUS NOTE: In Postgres version 10.2 (we migrated from 10.2 to 11.2),
partition pruning were better in cases where we used no enum_col or multiple
enum_col values in where clause:
Only 2 partitions, instead of all four. Timestamp clause is still
respected
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
Append (cost=0.00..66.25 rows=18 width=16) (actual time=0.011..0.011
rows=0 loops=1)
-> Seq Scan on test_first_1 (cost=0.00..33.12 rows=9 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
-> Seq Scan on test_second_1 (cost=0.00..33.12 rows=9 width=16) (actual
time=0.001..0.001 rows=0 loops=1)
Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone)
Two partitions, instead of 1, this was fixed in
https://www.postgresql.org/message-id/2b20c220-50cb-3fae-da57-92e8cb3675dd%40lab.ntt.co.jp
Timestamp clause is respected in both versions
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col = 'FIRST';
Append (cost=0.00..75.50 rows=2 width=16) (actual time=0.007..0.007 rows=0
loops=1)
-> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.004..0.004 rows=0 loops=1)
Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone) AND (enum_col = 'FIRST'::test_enum))
-> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.002..0.002 rows=0 loops=1)
Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without
time zone) AND (enum_col = 'FIRST'::test_enum))
Two partitions, instead of 4, timestamp clause is still respected.
EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
enum_col IN ('FIRST', 'SECOND');
Append (cost=0.00..75.50 rows=2 width=16) (actual time=0.004..0.004 rows=0
loops=1)
-> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.003..0.003 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
-> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual
time=0.001..0.001 rows=0 loops=1)
Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
(timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
pgsql-bugs by date: