Home > mailing lists

Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key - Mailing list pgsql-bugs

From	Damir Ciganović-Janković
Subject	Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key
Date	August 17, 2018 13:17:16
Msg-id	CA+-2sJ6dZaZRnnHfKPRAqnyYF+9Pz4EtDnwNsSJ45jc10hmsgw@mail.gmail.com Whole thread Raw
In response to	Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses	Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key
List	pgsql-bugs

Tree view

> Hi.

> On 2018/08/16 23:05, PG Bug reporting form wrote:

> > The following bug has been logged on the website:

> >

> > Bug reference: 15334

> > Logged by: Damir Ciganović-Janković

> > Email address: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com

> > PostgreSQL version: 10.2

> > Operating system: Linux (Centos), Windows

> > Description:

> >

> > Our tables are partitioned by partition key RANGE (resolution, timestamp).

> >

> > Timestamp is of type 'timestamp without time zone' and resolution is hour

> > own enum type which contains values 'HOUR' , 'DAY' and 'MONTH':

> > CREATE TYPE resolution AS ENUM (

> > 'HOUR',

> > 'DAY',

> > 'MONTH'

> > );

> >

> > Our table "my_report" has timestamp, resolution and count columns.

> >

> > This is the query I am executing:

> > select "my_report"."count" from my_report where ("my_report"."resolution" =

> > 'HOUR'::resolution and "my_report"."timestamp" >= timestamp '2018-08-16

> > 07:00:00' and "my_report"."timestamp" < timestamp '2018-08-16 10:00:00.0')

> > limit 10;

> > (NOTE: I simplified the query so real query and execution plan are

> > different, but I think you will understand me)

> >

> > I noticed that when doing this query via psql cmd I get this execution plan

> >

> > +++++

> > ...

> > -> Append (cost=0.00..159.76 rows=1 width=24) (actual time=0.585..0.585

> > rows=0 loops=1)

> > -> Seq Scan on my_report_hour_20180816 (cost=0.00..159.76 rows=1

> > width=24) (actual time=0.584..0.584 rows=0 loops=1)

> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without

> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time

> > zone) AND (resolution = 'HOUR'::resolution))

> > Rows Removed by Filter: 4672

> > ...

> > ++++++

> >

> > This is the correct behaviour, we picked the partition where our timestamps

> > are in range, and also resolution is HOUR.

> > Now, when executing this same query via our client (Java), but passing the

> > resolution as a parameter (and not hardcoding HOUR value in my query like

> > first time). I would expect that execution plan will be the same, but that

> > is not the case:

> >

> > ++++++

> > ...

> > -> Append (cost=0.00..372.24 rows=3 width=24) (actual time=1.117..1.117

> > rows=0 loops=1)

> > -> Seq Scan on my_report_hour_20180816 (cost=0.00..183.12 rows=1

> > width=24) (actual time=0.589..0.589 rows=0 loops=1)

> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without

> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time

> > zone) AND (resolution = ('HOUR'::cstring)::resolution))

> > Rows Removed by Filter: 4672

> > -> Seq Scan on my_report_day_201808 (cost=0.00..94.56 rows=1 width=24)

> > (actual time=0.265..0.265 rows=0 loops=1)

> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without

> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time

> > zone) AND (resolution = ('HOUR'::cstring)::resolution))

> > Rows Removed by Filter: 2336

> > -> Seq Scan on my_report_month_201808 (cost=0.00..94.56 rows=1

> > width=24) (actual time=0.261..0.261 rows=0 loops=1)

> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without

> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time

> > zone) AND (resolution = ('HOUR'::cstring)::resolution))

> > Rows Removed by Filter: 2336

> > ...

> > ++++++

> >

> > As we can see here, our execution will do seq scan on all my_report

> > partitions for that period (resolutions: day, month and hour) even though we

> > put HOUR as our query parameter. One thing that I observed is that insead of

> > (resolution = 'HOUR'::resolution)) like we got in the first example, filters

> > look like this (resolution = ('HOUR'::cstring)::resolution)).

> It's strange that the expression "('HOUR'::cstring)::resolution" got

> through without getting the following error

> ERROR: cannot cast type cstring to resolution

> LINE 1: select ('HOUR'::cstring)::resolution;

> For example,

> select ('HOUR'::cstring)::resolution;

> ERROR: cannot cast type cstring to resolution

> LINE 1: select ('HOUR'::cstring)::resolution;

> > It seems to me that PostgreSQL is not removing the (::cstring) part before

> > partition elimination so that it is forced to go through all of them

> > bassically ignoring the resolution value.

> >

> > I have a workaround now by putting exact value in the query (I wrote exact

> > string `"my_report"."resolution" = 'DAY'::resolution`) into my query, and

> > not passing the value as parameter, but this looks to me as a bug.

> You listed 10.2 as the PG version being used, which indeed had a bug that

> caused partition pruning to not work correctly with enum type partition

> key, which got fixed in 10.5 [1].

> I'm mystified as to why it worked at all if you're are using 10.2 (you

> said it worked correctly when running the command using psql with the

> hard-coded value of resolution.) So maybe, it's not the bug of 10.2 that

> got fixed in 10.5 is what's causing you the trouble, but then it's also

> not clear what the problem really is.

> Thanks,

> Amit

> [1] "Fix handling of partition keys whose data type uses a polymorphic

> btree operator class, such as arrays"

> https://www.postgresql.org/docs/devel/static/release-10-5.html

I know about the issue and the fix, but this is a different bug, it seems to me. I will try to explain the issue to be more clear.

This is the enum, table and its partitions:

+++++++++++++++++++++++++++++

create type ab as enum ('A', 'B');

create table test (key ab not null) partition by list (key);

create table test_a partition of test for values in ('A');

create table test_b partition of test for values in ('B');

+++++++++++++++++++++++++++++

We are using Java, so this is the code:

+++++++++++++++++++++++++++++

try (Connection connection = DriverManager.getConnection(...);

PreparedStatement stmt = connection.prepareStatement("EXPLAIN SELECT * FROM test WHERE key = ?::ab")) {

stmt.setString(1, "A");

try (ResultSet rs = stmt.executeQuery()) {

while (rs.next()) {

System.out.println(rs.getString(1));

}

+++++++++++++++++++++++++++++

Simple select with enum value as a parameter. Note that we didn't put "::cstring" in query. This is the result of the EXPLAIN SELECT from the code:

+++++++++++++++++++++++++++++

Append (cost=0.00..109.25 rows=26 width=4)

-> Seq Scan on test_a (cost=0.00..54.63 rows=13 width=4)

Filter: (key = ('A'::cstring)::ab)

-> Seq Scan on test_b (cost=0.00..54.63 rows=13 width=4)

Filter: (key = ('A'::cstring)::ab)

+++++++++++++++++++++++++++++

I have put wireshark snoop in attachment where we can see that our java client is not the one adding the "::cstring" part into the code.

I hope this helps

Attachment

dump.pcap

pgsql-bugs by date:

From: Amit Langote
Date: 17 August 2018, 08:06:47
Subject: Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key

From: jimmy
Date: 17 August 2018, 14:20:46
Subject: Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker

Re: BUG #15334: Partition elimination not working as expected whenusing enum as partition key - Mailing list pgsql-bugs

Attachment

Previous

Next