Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key - Mailing list pgsql-bugs
From | Damir Ciganović-Janković |
---|---|
Subject | Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key |
Date | |
Msg-id | CA+-2sJ6dZaZRnnHfKPRAqnyYF+9Pz4EtDnwNsSJ45jc10hmsgw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key
|
List | pgsql-bugs |
> Hi.
>
> On 2018/08/16 23:05, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 15334
> > Logged by: Damir Ciganović-Janković
> > Email address: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
> > PostgreSQL version: 10.2
> > Operating system: Linux (Centos), Windows
> > Description:
> >
> > Our tables are partitioned by partition key RANGE (resolution, timestamp).
> >
> > Timestamp is of type 'timestamp without time zone' and resolution is hour
> > own enum type which contains values 'HOUR' , 'DAY' and 'MONTH':
> > CREATE TYPE resolution AS ENUM (
> > 'HOUR',
> > 'DAY',
> > 'MONTH'
> > );
> >
> > Our table "my_report" has timestamp, resolution and count columns.
> >
> > This is the query I am executing:
> > select "my_report"."count" from my_report where ("my_report"."resolution" =
> > 'HOUR'::resolution and "my_report"."timestamp" >= timestamp '2018-08-16
> > 07:00:00' and "my_report"."timestamp" < timestamp '2018-08-16 10:00:00.0')
> > limit 10;
> > (NOTE: I simplified the query so real query and execution plan are
> > different, but I think you will understand me)
> >
> > I noticed that when doing this query via psql cmd I get this execution plan
> >
> > +++++
> > ...
> > -> Append (cost=0.00..159.76 rows=1 width=24) (actual time=0.585..0.585
> > rows=0 loops=1)
> > -> Seq Scan on my_report_hour_20180816 (cost=0.00..159.76 rows=1
> > width=24) (actual time=0.584..0.584 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = 'HOUR'::resolution))
> > Rows Removed by Filter: 4672
> > ...
> > ++++++
> >
> > This is the correct behaviour, we picked the partition where our timestamps
> > are in range, and also resolution is HOUR.
> > Now, when executing this same query via our client (Java), but passing the
> > resolution as a parameter (and not hardcoding HOUR value in my query like
> > first time). I would expect that execution plan will be the same, but that
> > is not the case:
> >
> > ++++++
> > ...
> > -> Append (cost=0.00..372.24 rows=3 width=24) (actual time=1.117..1.117
> > rows=0 loops=1)
> > -> Seq Scan on my_report_hour_20180816 (cost=0.00..183.12 rows=1
> > width=24) (actual time=0.589..0.589 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> > Rows Removed by Filter: 4672
> > -> Seq Scan on my_report_day_201808 (cost=0.00..94.56 rows=1 width=24)
> > (actual time=0.265..0.265 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> > Rows Removed by Filter: 2336
> > -> Seq Scan on my_report_month_201808 (cost=0.00..94.56 rows=1
> > width=24) (actual time=0.261..0.261 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> > Rows Removed by Filter: 2336
> > ...
> > ++++++
> >
> > As we can see here, our execution will do seq scan on all my_report
> > partitions for that period (resolutions: day, month and hour) even though we
> > put HOUR as our query parameter. One thing that I observed is that insead of
> > (resolution = 'HOUR'::resolution)) like we got in the first example, filters
> > look like this (resolution = ('HOUR'::cstring)::resolution)).
>
> It's strange that the expression "('HOUR'::cstring)::resolution" got
> through without getting the following error
>
> ERROR: cannot cast type cstring to resolution
> LINE 1: select ('HOUR'::cstring)::resolution;
>
> For example,
>
> select ('HOUR'::cstring)::resolution;
> ERROR: cannot cast type cstring to resolution
> LINE 1: select ('HOUR'::cstring)::resolution;
>
> > It seems to me that PostgreSQL is not removing the (::cstring) part before
> > partition elimination so that it is forced to go through all of them
> > bassically ignoring the resolution value.
> >
> > I have a workaround now by putting exact value in the query (I wrote exact
> > string `"my_report"."resolution" = 'DAY'::resolution`) into my query, and
> > not passing the value as parameter, but this looks to me as a bug.
>
> You listed 10.2 as the PG version being used, which indeed had a bug that
> caused partition pruning to not work correctly with enum type partition
> key, which got fixed in 10.5 [1].
>
> I'm mystified as to why it worked at all if you're are using 10.2 (you
> said it worked correctly when running the command using psql with the
> hard-coded value of resolution.) So maybe, it's not the bug of 10.2 that
> got fixed in 10.5 is what's causing you the trouble, but then it's also
> not clear what the problem really is.
>
> Thanks,
> Amit
>
> [1] "Fix handling of partition keys whose data type uses a polymorphic
> btree operator class, such as arrays"
>
I know about the issue and the fix, but this is a different bug, it seems to me. I will try to explain the issue to be more clear.
This is the enum, table and its partitions:
+++++++++++++++++++++++++++++
create type ab as enum ('A', 'B');
create table test (key ab not null) partition by list (key);
create table test_a partition of test for values in ('A');
create table test_b partition of test for values in ('B');
We are using Java, so this is the code:
try (Connection connection = DriverManager.getConnection(...);
PreparedStatement stmt = connection.prepareStatement("EXPLAIN SELECT * FROM test WHERE key = ?::ab")) {
stmt.setString(1, "A");
try (ResultSet rs = stmt.executeQuery()) {
while (rs.next()) {
System.out.println(rs.getString(1));
}
}
}
Simple select with enum value as a parameter. Note that we didn't put "::cstring" in query. This is the result of the EXPLAIN SELECT from the code:
Append (cost=0.00..109.25 rows=26 width=4)
-> Seq Scan on test_a (cost=0.00..54.63 rows=13 width=4)
Filter: (key = ('A'::cstring)::ab)
-> Seq Scan on test_b (cost=0.00..54.63 rows=13 width=4)
Filter: (key = ('A'::cstring)::ab)
I have put wireshark snoop in attachment where we can see that our java client is not the one adding the "::cstring" part into the code.
I hope this helps
Attachment
pgsql-bugs by date: