Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key - Mailing list pgsql-bugs

From Damir Ciganović-Janković
Subject Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key
Date
Msg-id CA+-2sJ6dZaZRnnHfKPRAqnyYF+9Pz4EtDnwNsSJ45jc10hmsgw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key
List pgsql-bugs
> Hi.
> On 2018/08/16 23:05, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> > 
> > Bug reference:      15334
> > Logged by:          Damir Ciganović-Janković
> > Email address:      damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
> > PostgreSQL version: 10.2
> > Operating system:   Linux (Centos), Windows
> > Description:        
> > 
> > Our tables are partitioned by partition key RANGE (resolution, timestamp).
> > 
> > Timestamp is of type 'timestamp without time zone' and resolution is hour
> > own enum type which contains values 'HOUR' , 'DAY' and 'MONTH':
> > CREATE TYPE resolution AS ENUM (
> >   'HOUR',
> >   'DAY',
> >   'MONTH'
> > );
> > 
> > Our table "my_report" has timestamp, resolution and count columns.
> > 
> > This is the query I am executing:
> > select "my_report"."count" from my_report where ("my_report"."resolution" =
> > 'HOUR'::resolution and "my_report"."timestamp" >= timestamp '2018-08-16
> > 07:00:00' and "my_report"."timestamp" < timestamp '2018-08-16 10:00:00.0')
> > limit 10; 
> > (NOTE: I simplified the query so real query and execution plan are
> > different, but I think you will understand me)
> > 
> > I noticed that when doing this query via psql cmd I get this execution plan
> > 
> > +++++
> > ...
> > ->  Append  (cost=0.00..159.76 rows=1 width=24) (actual time=0.585..0.585
> > rows=0 loops=1)
> >     ->  Seq Scan on my_report_hour_20180816  (cost=0.00..159.76 rows=1
> > width=24) (actual time=0.584..0.584 rows=0 loops=1)
> >         Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = 'HOUR'::resolution))
> >         Rows Removed by Filter: 4672
> > ...
> > ++++++
> > 
> > This is the correct behaviour, we picked the partition where our timestamps
> > are in range, and also resolution is HOUR.
> > Now, when executing this same query via our client (Java), but passing the
> > resolution as a parameter (and not hardcoding HOUR value in my query like
> > first time). I would expect that execution plan will be the same, but that
> > is not the case:
> > 
> > ++++++
> > ...
> > ->  Append  (cost=0.00..372.24 rows=3 width=24) (actual time=1.117..1.117
> > rows=0 loops=1)
> >     ->  Seq Scan on my_report_hour_20180816  (cost=0.00..183.12 rows=1
> > width=24) (actual time=0.589..0.589 rows=0 loops=1)
> >         Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> >         Rows Removed by Filter: 4672
> >    ->  Seq Scan on my_report_day_201808  (cost=0.00..94.56 rows=1 width=24)
> > (actual time=0.265..0.265 rows=0 loops=1)
> >         Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> >         Rows Removed by Filter: 2336
> >     ->  Seq Scan on my_report_month_201808  (cost=0.00..94.56 rows=1
> > width=24) (actual time=0.261..0.261 rows=0 loops=1)
> >         Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> >         Rows Removed by Filter: 2336
> > ...
> > ++++++
> > 
> > As we can see here, our execution will do seq scan on all my_report
> > partitions for that period (resolutions: day, month and hour) even though we
> > put HOUR as our query parameter. One thing that I observed is that insead of
> > (resolution = 'HOUR'::resolution)) like we got in the first example, filters
> > look like this (resolution = ('HOUR'::cstring)::resolution)).
> It's strange that the expression "('HOUR'::cstring)::resolution" got
> through without getting the following error
> ERROR:  cannot cast type cstring to resolution
> LINE 1: select ('HOUR'::cstring)::resolution;
> For example,
> select ('HOUR'::cstring)::resolution;
> ERROR:  cannot cast type cstring to resolution
> LINE 1: select ('HOUR'::cstring)::resolution;
> > It seems to me that PostgreSQL is not removing the (::cstring) part before
> > partition elimination so that it is forced to go through all of them
> > bassically ignoring the resolution value.
> > 
> > I have a workaround now by putting exact value in the query (I wrote exact
> > string `"my_report"."resolution" = 'DAY'::resolution`) into my query, and
> > not passing the value as parameter, but this looks to me as a bug. 
> You listed 10.2 as the PG version being used, which indeed had a bug that
> caused partition pruning to not work correctly with enum type partition
> key, which got fixed in 10.5 [1].
> I'm mystified as to why it worked at all if you're are using 10.2 (you
> said it worked correctly when running the command using psql with the
> hard-coded value of resolution.)  So maybe, it's not the bug of 10.2 that
> got fixed in 10.5 is what's causing you the trouble, but then it's also
> not clear what the problem really is.
> Thanks,
> Amit
> [1] "Fix handling of partition keys whose data type uses a polymorphic
> btree operator class, such as arrays"

I know about the issue and the fix, but this is a different bug, it seems to me. I will try to explain the issue to be more clear.
This is the enum, table and its partitions:
+++++++++++++++++++++++++++++
create type ab as enum ('A', 'B');
create table test (key ab not null) partition by list (key);
create table test_a partition of test for values in ('A');
create table test_b partition of test for values in ('B');

We are using Java, so this is the code:
try (Connection connection = DriverManager.getConnection(...);
    PreparedStatement stmt = connection.prepareStatement("EXPLAIN SELECT * FROM test WHERE key = ?::ab")) {

    stmt.setString(1, "A");
    try (ResultSet rs = stmt.executeQuery()) {
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

Simple select with enum value as a parameter. Note that we didn't put "::cstring" in query. This is the result of the EXPLAIN SELECT from the code:
Append  (cost=0.00..109.25 rows=26 width=4)
  ->  Seq Scan on test_a  (cost=0.00..54.63 rows=13 width=4)
        Filter: (key = ('A'::cstring)::ab)
  ->  Seq Scan on test_b  (cost=0.00..54.63 rows=13 width=4)
        Filter: (key = ('A'::cstring)::ab)
I have put wireshark snoop in attachment where we can see that our java client is not the one adding the "::cstring" part into the code.

I hope this helps

Attachment

pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key
Next
From: jimmy
Date:
Subject: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker