Home > mailing lists

Thread: Multi-Column List Partitioning

Multi-Column List Partitioning

From

Nitin Jadhav

Date:

06 May 2021, 14:02:45

Hi,

While reviewing one of the 'Table partitioning' related patches, I found that Postgres does not support multiple column based LIST partitioning. Based on this understanding, I have started working on this feature. I also feel that 'Multi-Column List Partitioning' can be benefited to the Postgres users in future.

I am attaching the WIP patch for this feature here. It supports 'Multi-Column List Partitioning', however some tasks are still pending. I would like to know your thoughts about this, So that I can continue the work with improvising the current patch.

Following things are handled in the patch.

1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the 'list_of_columns' clause. Now we can mention the list of columns separated by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN list_of_values.

Whereas list_of_columns can be

a. (value [,...])

b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.

Ex-1:

CREATE TABLE t1(a int) PARTITION BY LIST(a);

CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:

CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 10);

Please share if any changes are required in the above syntax.

2. Modified transformation logic to support above syntax.

3. Modified the data structures to store the information caused by above syntax. Also modified the searching logic to route the tuple to the appropriate partition.

4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and SELECT are working fine.

Following items are pending and I am working on it.

1. Handling of 'NULL' values.

2. Support multi column case in partition pruning.

3. Add test cases to the regression test suite.

Please share your thoughts.

Thanks & Regards,

Nitin Jadhav

Attachment

v0_multi_column_list_partitioning.patch

Re: Multi-Column List Partitioning

From

Jeevan Ladhe

Date:

07 May 2021, 14:05:51

> While reviewing one of the 'Table partitioning' related patches,

> I found that Postgres does not support multiple column based LIST

> partitioning. Based on this understanding, I have started working on

> this feature. I also feel that 'Multi-Column List Partitioning' can

> be benefited to the Postgres users in future.

+1 for the feature. I also think this can help users deal with some

useful cases.

> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,

> 2),(2, 10);

IMHO, listing every single tuple like this might be a bit cumbersome for

the user. What about something like this:

...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set

is the list for values of column A and second list is for column B. We

can treat these lists as A X B possible values or simply (a1, b1), (a2,

b2) internally. However I see other proprietary databases already have

syntax something similar that you are proposing here. So, I leave it

open for the thoughts from experts. Also, though what I propose might be

easy from a user perspective, but might not be that easy for

implementation, given that for a larger number of columns in partition list

e.g. A X B X C X D lists become unmanageable.

I did not review the patch in detail, but a quick look at it leaves me

with following comments:

1.

> + * list. Then this function will continue the serach and return the index of

Typo:

s/serach/search

2.

A compiler warning:

partprune.c: In function ‘get_matching_list_bounds’:

partprune.c:2731:20: error: passing argument 5 of ‘partition_list_bsearch’ makes pointer from integer without a cast [-Werror=int-conversion]

2731 | nvalues, value, &is_equal);

| ^~~~~

| |

| Datum {aka long unsigned int}

In file included from partprune.c:53:

../../../src/include/partitioning/partbounds.h:120:32: note: expected ‘Datum *’ {aka ‘long unsigned int *’} but argument is of type ‘Datum’ {aka ‘long unsigned int’}

120 | int nvalues, Datum *value, bool *is_equal);

| ~~~~~~~^~~~~

3.

And, a server crash with following case:

postgres=# CREATE TABLE t1 (a int) PARTITION BY LIST (a);

CREATE TABLE

postgres=# CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES IN (1, 2, 3);

CREATE TABLE

postgres=# \d+ t1p1

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Failed.

!?>

Stacktrace:

(gdb) bt

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

#1 0x00007f5d273c5859 in __GI_abort () at abort.c:79

#2 0x000055779d2eb69d in ExceptionalCondition (conditionName=0x55779d4978d8 "ptr == NULL || nodeTag(ptr) == type", errorType=0x55779d4978c3 "FailedAssertion",

fileName=0x55779d4978a0 "../../../src/include/nodes/nodes.h", lineNumber=603) at assert.c:69

#3 0x000055779d03a684 in castNodeImpl (type=T_Const, ptr=0x55779e457b18) at ../../../src/include/nodes/nodes.h:603

#4 0x000055779d04368a in get_qual_for_list (parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:4155

#5 0x000055779d03ac60 in get_qual_from_partbound (rel=0x7f5d1df82570, parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:272

#6 0x000055779d2cf630 in generate_partition_qual (rel=0x7f5d1df82570) at partcache.c:379

#7 0x000055779d2cf468 in get_partition_qual_relid (relid=32771) at partcache.c:308

#8 0x000055779d2592bf in pg_get_partition_constraintdef (fcinfo=0x55779e44ee50) at ruleutils.c:2019

#9 0x000055779cec7221 in ExecInterpExpr (state=0x55779e44dfb0, econtext=0x55779e407fe8, isnull=0x7ffddf9b109f) at execExprInterp.c:744

#10 0x000055779cec954f in ExecInterpExprStillValid (state=0x55779e44dfb0, econtext=0x55779e407fe8, isNull=0x7ffddf9b109f) at execExprInterp.c:1819

#11 0x000055779cf1d58a in ExecEvalExprSwitchContext (state=0x55779e44dfb0, econtext=0x55779e407fe8, isNull=0x7ffddf9b109f)

at ../../../src/include/executor/executor.h:338

#12 0x000055779cf1d602 in ExecProject (projInfo=0x55779e44dfa8) at ../../../src/include/executor/executor.h:372

#13 0x000055779cf1db2f in ExecNestLoop (pstate=0x55779e407ed0) at nodeNestloop.c:241

#14 0x000055779cedf136 in ExecProcNodeFirst (node=0x55779e407ed0) at execProcnode.c:462

#15 0x000055779ced3053 in ExecProcNode (node=0x55779e407ed0) at ../../../src/include/executor/executor.h:257

#16 0x000055779ced5a87 in ExecutePlan (estate=0x55779e407c80, planstate=0x55779e407ed0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0,

direction=ForwardScanDirection, dest=0x55779e425a88, execute_once=true) at execMain.c:1551

#17 0x000055779ced372d in standard_ExecutorRun (queryDesc=0x55779e453520, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361

#18 0x000055779ced353c in ExecutorRun (queryDesc=0x55779e453520, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:305

#19 0x000055779d13d287 in PortalRunSelect (portal=0x55779e398800, forward=true, count=0, dest=0x55779e425a88) at pquery.c:912

#20 0x000055779d13cec0 in PortalRun (portal=0x55779e398800, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x55779e425a88, altdest=0x55779e425a88,

qc=0x7ffddf9b14f0) at pquery.c:756

#21 0x000055779d1361ce in exec_simple_query (

query_string=0x55779e3367a0 "SELECT inhparent::pg_catalog.regclass,\n pg_catalog.pg_get_expr(c.relpartbound, c.oid),\n inhdetachpending,\n pg_catalog.pg_get_partition_constraintdef(c.oid)\nFROM pg_catalog.pg_class c JOIN pg_catalo"...) at postgres.c:1214

#22 0x000055779d13ad8b in PostgresMain (argc=1, argv=0x7ffddf9b1710, dbname=0x55779e3626f8 "postgres", username=0x55779e3626d8 "hadoop") at postgres.c:4476

#23 0x000055779d0674d3 in BackendRun (port=0x55779e358380) at postmaster.c:4488

#24 0x000055779d066d8c in BackendStartup (port=0x55779e358380) at postmaster.c:4210

#25 0x000055779d062f9b in ServerLoop () at postmaster.c:1742

#26 0x000055779d062734 in PostmasterMain (argc=3, argv=0x55779e3308b0) at postmaster.c:1414

#27 0x000055779cf5805f in main (argc=3, argv=0x55779e3308b0) at main.c:209

Regards,

Jeevan Ladhe

On Thu, May 6, 2021 at 7:33 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

Hi,

While reviewing one of the 'Table partitioning' related patches, I found that Postgres does not support multiple column based LIST partitioning. Based on this understanding, I have started working on this feature. I also feel that 'Multi-Column List Partitioning' can be benefited to the Postgres users in future.

I am attaching the WIP patch for this feature here. It supports 'Multi-Column List Partitioning', however some tasks are still pending. I would like to know your thoughts about this, So that I can continue the work with improvising the current patch.

Following things are handled in the patch.
1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the 'list_of_columns' clause. Now we can mention the list of columns separated by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN list_of_values.

Whereas list_of_columns can be
a. (value [,...])
b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.
Ex-1:
CREATE TABLE t1(a int) PARTITION BY LIST(a);
CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:
CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 10);

Please share if any changes are required in the above syntax.

2. Modified transformation logic to support above syntax.

3. Modified the data structures to store the information caused by above syntax. Also modified the searching logic to route the tuple to the appropriate partition.

4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and SELECT are working fine.

Following items are pending and I am working on it.

1. Handling of 'NULL' values.

2. Support multi column case in partition pruning.

3. Add test cases to the regression test suite.

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

08 May 2021, 08:42:20

Thanks Jeevan for looking into this thread.

> I did not review the patch in detail, but a quick look at it leaves me

> with following comments:

I will incorporate these changes.

> ...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set

> is the list for values of column A and second list is for column B. We

> can treat these lists as A X B possible values or simply (a1, b1), (a2,

> b2) internally. However I see other proprietary databases already have

> syntax something similar that you are proposing here. So, I leave it

> open for the thoughts from experts. Also, though what I propose might be

> easy from a user perspective, but might not be that easy for

> implementation, given that for a larger number of columns in partition list

> e.g. A X B X C X D lists become unmanageable.

I feel this is also not easy from a user's perspective. For example for a partition

with 2 partition keys (a,b) for values like (1,1), (1,2), (1,3), (1,4),(1,5). This

would be converted to (1,1,1,1,1), (1,2,3,4,5). It is difficult to match the values

of column 'a' to 'b'. Anyways let's wait for the other's opinion about this.

Thanks & Regards,

Nitin Jadhav

On Fri, May 7, 2021 at 7:36 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

> While reviewing one of the 'Table partitioning' related patches,
> I found that Postgres does not support multiple column based LIST
> partitioning. Based on this understanding, I have started working on
> this feature. I also feel that 'Multi-Column List Partitioning' can
> be benefited to the Postgres users in future.

+1 for the feature. I also think this can help users deal with some
useful cases.

> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,
> 2),(2, 10);

IMHO, listing every single tuple like this might be a bit cumbersome for
the user. What about something like this:

...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set
is the list for values of column A and second list is for column B. We
can treat these lists as A X B possible values or simply (a1, b1), (a2,
b2) internally. However I see other proprietary databases already have
syntax something similar that you are proposing here. So, I leave it
open for the thoughts from experts. Also, though what I propose might be
easy from a user perspective, but might not be that easy for
implementation, given that for a larger number of columns in partition list
e.g. A X B X C X D lists become unmanageable.

I did not review the patch in detail, but a quick look at it leaves me
with following comments:

1.
> + * list. Then this function will continue the serach and return the index of
Typo:
s/serach/search

2.
A compiler warning:
partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:20: error: passing argument 5 of ‘partition_list_bsearch’ makes pointer from integer without a cast [-Werror=int-conversion]
2731 | nvalues, value, &is_equal);
| ^~~~~
| |
| Datum {aka long unsigned int}
In file included from partprune.c:53:
../../../src/include/partitioning/partbounds.h:120:32: note: expected ‘Datum *’ {aka ‘long unsigned int *’} but argument is of type ‘Datum’ {aka ‘long unsigned int’}
120 | int nvalues, Datum *value, bool *is_equal);
| ~~~~~~~^~~~~

3.
And, a server crash with following case:
postgres=# CREATE TABLE t1 (a int) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES IN (1, 2, 3);
CREATE TABLE
postgres=# \d+ t1p1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

Stacktrace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f5d273c5859 in __GI_abort () at abort.c:79
#2 0x000055779d2eb69d in ExceptionalCondition (conditionName=0x55779d4978d8 "ptr == NULL || nodeTag(ptr) == type", errorType=0x55779d4978c3 "FailedAssertion",
fileName=0x55779d4978a0 "../../../src/include/nodes/nodes.h", lineNumber=603) at assert.c:69
#3 0x000055779d03a684 in castNodeImpl (type=T_Const, ptr=0x55779e457b18) at ../../../src/include/nodes/nodes.h:603
#4 0x000055779d04368a in get_qual_for_list (parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:4155
#5 0x000055779d03ac60 in get_qual_from_partbound (rel=0x7f5d1df82570, parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:272
#6 0x000055779d2cf630 in generate_partition_qual (rel=0x7f5d1df82570) at partcache.c:379
#7 0x000055779d2cf468 in get_partition_qual_relid (relid=32771) at partcache.c:308
#8 0x000055779d2592bf in pg_get_partition_constraintdef (fcinfo=0x55779e44ee50) at ruleutils.c:2019
#9 0x000055779cec7221 in ExecInterpExpr (state=0x55779e44dfb0, econtext=0x55779e407fe8, isnull=0x7ffddf9b109f) at execExprInterp.c:744
#10 0x000055779cec954f in ExecInterpExprStillValid (state=0x55779e44dfb0, econtext=0x55779e407fe8, isNull=0x7ffddf9b109f) at execExprInterp.c:1819
#11 0x000055779cf1d58a in ExecEvalExprSwitchContext (state=0x55779e44dfb0, econtext=0x55779e407fe8, isNull=0x7ffddf9b109f)
at ../../../src/include/executor/executor.h:338
#12 0x000055779cf1d602 in ExecProject (projInfo=0x55779e44dfa8) at ../../../src/include/executor/executor.h:372
#13 0x000055779cf1db2f in ExecNestLoop (pstate=0x55779e407ed0) at nodeNestloop.c:241
#14 0x000055779cedf136 in ExecProcNodeFirst (node=0x55779e407ed0) at execProcnode.c:462
#15 0x000055779ced3053 in ExecProcNode (node=0x55779e407ed0) at ../../../src/include/executor/executor.h:257
#16 0x000055779ced5a87 in ExecutePlan (estate=0x55779e407c80, planstate=0x55779e407ed0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0x55779e425a88, execute_once=true) at execMain.c:1551
#17 0x000055779ced372d in standard_ExecutorRun (queryDesc=0x55779e453520, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#18 0x000055779ced353c in ExecutorRun (queryDesc=0x55779e453520, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:305
#19 0x000055779d13d287 in PortalRunSelect (portal=0x55779e398800, forward=true, count=0, dest=0x55779e425a88) at pquery.c:912
#20 0x000055779d13cec0 in PortalRun (portal=0x55779e398800, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x55779e425a88, altdest=0x55779e425a88,
qc=0x7ffddf9b14f0) at pquery.c:756
#21 0x000055779d1361ce in exec_simple_query (
query_string=0x55779e3367a0 "SELECT inhparent::pg_catalog.regclass,\n pg_catalog.pg_get_expr(c.relpartbound, c.oid),\n inhdetachpending,\n pg_catalog.pg_get_partition_constraintdef(c.oid)\nFROM pg_catalog.pg_class c JOIN pg_catalo"...) at postgres.c:1214
#22 0x000055779d13ad8b in PostgresMain (argc=1, argv=0x7ffddf9b1710, dbname=0x55779e3626f8 "postgres", username=0x55779e3626d8 "hadoop") at postgres.c:4476
#23 0x000055779d0674d3 in BackendRun (port=0x55779e358380) at postmaster.c:4488
#24 0x000055779d066d8c in BackendStartup (port=0x55779e358380) at postmaster.c:4210
#25 0x000055779d062f9b in ServerLoop () at postmaster.c:1742
#26 0x000055779d062734 in PostmasterMain (argc=3, argv=0x55779e3308b0) at postmaster.c:1414
#27 0x000055779cf5805f in main (argc=3, argv=0x55779e3308b0) at main.c:209

Regards,
Jeevan Ladhe

On Thu, May 6, 2021 at 7:33 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
Hi,

While reviewing one of the 'Table partitioning' related patches, I found that Postgres does not support multiple column based LIST partitioning. Based on this understanding, I have started working on this feature. I also feel that 'Multi-Column List Partitioning' can be benefited to the Postgres users in future.

I am attaching the WIP patch for this feature here. It supports 'Multi-Column List Partitioning', however some tasks are still pending. I would like to know your thoughts about this, So that I can continue the work with improvising the current patch.

Following things are handled in the patch.
1. Syntax

CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);

Earlier there was no provision to mention multiple columns as part of the 'list_of_columns' clause. Now we can mention the list of columns separated by comma.

CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN list_of_values.

Whereas list_of_columns can be
a. (value [,...])
b. (value [,...]) [,...]

I would like to list a few examples here for better understanding.
Ex-1:
CREATE TABLE t1(a int) PARTITION BY LIST(a);
CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);

Ex-2:
CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 10);

Please share if any changes are required in the above syntax.

2. Modified transformation logic to support above syntax.

3. Modified the data structures to store the information caused by above syntax. Also modified the searching logic to route the tuple to the appropriate partition.

4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and SELECT are working fine.

Following items are pending and I am working on it.

1. Handling of 'NULL' values.

2. Support multi column case in partition pruning.

3. Add test cases to the regression test suite.

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

21 May 2021, 04:02:21

Hello Nitin,

On Thu, May 6, 2021 at 11:03 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> Hi,
>
> While reviewing one of the 'Table partitioning' related patches, I found that Postgres does not support multiple
columnbased LIST partitioning. Based on this understanding, I have started working on this feature. I also feel that
'Multi-ColumnList Partitioning' can be benefited to the Postgres users in future. 

Yes, it would be nice to have this.  Thanks for picking this up.

> I am attaching the WIP patch for this feature here. It supports 'Multi-Column List Partitioning', however some tasks
arestill pending. I would like to know your thoughts about this, So that I can continue the work with improvising the
currentpatch. 
>
> Following things are handled in the patch.
> 1. Syntax
>
> CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);
>
> Earlier there was no provision to mention multiple columns as part of the 'list_of_columns' clause. Now we can
mentionthe list of columns separated by comma. 
>
> CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN list_of_values.
>
> Whereas list_of_columns can be
> a. (value [,...])
> b. (value [,...]) [,...]
>
> I would like to list a few examples here for better understanding.
> Ex-1:
> CREATE TABLE t1(a int) PARTITION BY LIST(a);
> CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);
>
> Ex-2:
> CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 10);

Hmm, why not have parentheses around these lists, that is: (
(list_of_values) [, ...] )

So your example would look like this:

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2,
2), (2, 10));

IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

I will now take a look at the patch itself.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

21 May 2021, 12:16:50

On Fri, May 21, 2021 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I will now take a look at the patch itself.

Some quick observations:

* I get a lot of instances of the following 2 warnings when compiling
the patched code:

Warning #1:

partprune.c: In function ‘get_matching_list_bounds’:
partprune.c:2731:11: warning: passing argument 5 of
‘partition_list_bsearch’ makes pointer from integer without a cast
[enabled by default]
           nvalues, value, &is_equal);
           ^
In file included from partprune.c:53:0:
../../../src/include/partitioning/partbounds.h:117:12: note: expected
‘Datum *’ but argument is of type ‘Datum’
 extern int partition_list_bsearch(FmgrInfo *partsupfunc,

Warning #2:

partprune.c:2781:12: warning: incompatible integer to pointer
conversion passing 'Datum'
      (aka 'unsigned long') to parameter of type 'Datum *' (aka
'unsigned long *'); take the
      address with & [-Wint-conversion]

          value, &is_equal);

          ^~~~~

          &
../../../src/include/partitioning/partbounds.h:120:32: note: passing
argument to parameter 'value'
      here
  ...int nvalues, Datum *value, bool *is_equal);

* I think this code:

===
        /* Get the only column's name in case we need to output an error */
        if (key->partattrs[0] != 0)
            colname = get_attname(RelationGetRelid(parent),
                                  key->partattrs[0], false);
        else
            colname = deparse_expression((Node *) linitial(partexprs),

deparse_context_for(RelationGetRelationName(parent),

RelationGetRelid(parent)),
                                         false, false);
        /* Need its type data too */
        coltype = get_partition_col_typid(key, 0);
        coltypmod = get_partition_col_typmod(key, 0);
        partcollation = get_partition_col_collation(key, 0);
===

belongs in the new function transformPartitionListBounds that you
added, because without doing so, any errors having to do with
partitioning columns other than the first one will report the first
column's name in the error message:

postgres=# create table foo (a bool, b bool) partition by list (a, b);
CREATE TABLE

-- this is fine!
postgres=# create table foo_true_true partition of foo for values in (1, true);
ERROR:  specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (1, true);

-- not this!
postgres=# create table foo_true_true partition of foo for values in (true, 1);
ERROR:  specified value cannot be cast to type boolean for column "a"
LINE 1: ...able foo_true_true partition of foo for values in (true, 1);

* The following prototype of transformPartitionListBounds() means that
all values in a given bound list are analyzed with the first
partitioning column's colname, type, typmod, etc., which is wrong:

+static List *
+transformPartitionListBounds(ParseState *pstate, PartitionBoundSpec *spec,
+                            char *colname, Oid coltype, int32 coltypmod,
+                            Oid partcollation, int partnatts)
+{

An example of wrong behavior because of that:

postgres=# create table foo (a bool, b text) partition by list (a, b);
CREATE TABLE
Time: 3.967 ms
postgres=# create table foo_true_true partition of foo for values in
(true, 'whatever');
ERROR:  invalid input syntax for type boolean: "whatever"
LINE 1: ...o_true_true partition of foo for values in (true, 'whatever'...

"whatever" should've been accepted but because it's checked with a's
type, it is wrongly flagged.

Please take a look at how transformPartitionRangeBound() handles this,
especially how it uses the correct partitioning column's info to
analyze the corresponding bound value expression.

I will continue looking next week.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

23 May 2021, 09:49:07

> Yes, it would be nice to have this.  Thanks for picking this up.

Thanks for confirming.

> Some quick observations:

Thanks for providing the comments. I will handle these cases.

> Hmm, why not have parentheses around these lists, that is: (
> (list_of_values) [, ...] )
>
> So your example would look like this:
>
> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2,
> 2), (2, 10));

I am ok with this syntax. This would be more appropriate.

> IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
> understand from this syntax that the partition constraint is something
> like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> row-wise comparison.

Thanks for suggesting to use row-wise comparison. I have few queries
with respect to handling of NULL values.

1. What should be the partition constraint for the above case. AFAIK,
row-wise comparison wont work with NULL values as shown in [1]. I mean
two rows are considered equal if all their corresponding members are
non-null and equal. The rows are unequal if any corresponding members
are non-null and unequal. Otherwise the result of the row comparison
is unknown (null). So we should generate different types of
constraints for NULL values.

Ex:
CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
(NULL, 1), (NULL, NULL);

As per my knowledge, we should consider creating partition constraints
for the above example as given below.

(a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
1)) OR ((a is NULL) AND (b is NULL)).

Kindly correct me if I am wrong.

2. In the current code we don't put the NULL value in the 'datums'
field of 'PartitionBoundInfoData' structure [2]. Since there can be
only one NULL value, we directly store the corresponding index value
in the 'null_index' field. Now we have to handle multiple NULL values
in case of Multi-Column List Partitioning. So the question is how to
handle this scenario. Following are the 2 approaches to handle this.

Approach-1:
Add another field 'bool  **isnull' in [2] and mark the corresponding
element to TRUE if it has NULL value and the corresponding location in
'datums' contains empty/No value. For example, If a partition bound is
(1, NULL), then

datums[0][0] = 1
datums[0][1] = Not assigned any value
isnull[0][0] = FALSE
is null[0][1] = TRUE

So now we have an entry in the 'datums'  field for a bound containing
NULL value, so we should handle this in all the scenarios where we are
manipulating 'datums' in order to support NULL values and avoid crash.

Approach-2:
Don't add the bound information to 'datums' field of [2] if any of the
value is NULL. Store this information separately in the structures
mentioned in [3] and process accordingly.

I feel approach-1 is the better solution as this requires less code
changes and easy to implement than approach-2. Kindly share your
thoughts about the approaches and please share if you have any better
solution than the above 2.

[1]:
postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2);
 ?column?
----------
 t
(1 row)

postgres@15890=#SELECT ROW(1, 2) = ROW(1, 1);
 ?column?
----------
 f
(1 row)

postgres@15890=#SELECT ROW(1, NULL) = ROW(1, NULL);
 ?column?
----------

(1 row)

postgres@15890=#SELECT ROW(1, 2) = ROW(1, NULL);
 ?column?
----------

(1 row)

[2] :
typedef struct PartitionBoundInfoData
{
    char        strategy;       /* hash, list or range? */
    int         ndatums;        /* Length of the datums[] array */
    Datum     **datums;
    PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
                                     * NULL for hash and list partitioned
                                     * tables */
    int         nindexes;       /* Length of the indexes[] array */
    int        *indexes;        /* Partition indexes */
    int         null_index;     /* Index of the null-accepting partition; -1
                                 * if there isn't one */
    int         default_index;  /* Index of the default partition; -1 if there
                                 * isn't one */
} PartitionBoundInfoData;

[3]:
typedef struct NullBoundDatumInfo
{
    Datum *datum;
    int         col_index;
    int.         bound_index;
} NullBoundDatumInfo;

typedef struct NullBoundIsNullInfo
{
    int      col_index;
    int.     bound_index;
} NullBoundIsNullInfo;

Add 2 fields of type 'NullBoundDatumInfo' and 'NullBoundIsNullInfo' to
the structure [2].

--
Thanks & Regards,
Nitin Jadhav

On Fri, May 21, 2021 at 5:47 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, May 21, 2021 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > I will now take a look at the patch itself.
>
> Some quick observations:
>
> * I get a lot of instances of the following 2 warnings when compiling
> the patched code:
>
> Warning #1:
>
> partprune.c: In function ‘get_matching_list_bounds’:
> partprune.c:2731:11: warning: passing argument 5 of
> ‘partition_list_bsearch’ makes pointer from integer without a cast
> [enabled by default]
>            nvalues, value, &is_equal);
>            ^
> In file included from partprune.c:53:0:
> ../../../src/include/partitioning/partbounds.h:117:12: note: expected
> ‘Datum *’ but argument is of type ‘Datum’
>  extern int partition_list_bsearch(FmgrInfo *partsupfunc,
>
> Warning #2:
>
> partprune.c:2781:12: warning: incompatible integer to pointer
> conversion passing 'Datum'
>       (aka 'unsigned long') to parameter of type 'Datum *' (aka
> 'unsigned long *'); take the
>       address with & [-Wint-conversion]
>
>           value, &is_equal);
>
>           ^~~~~
>
>           &
> ../../../src/include/partitioning/partbounds.h:120:32: note: passing
> argument to parameter 'value'
>       here
>   ...int nvalues, Datum *value, bool *is_equal);
>
> * I think this code:
>
> ===
>         /* Get the only column's name in case we need to output an error */
>         if (key->partattrs[0] != 0)
>             colname = get_attname(RelationGetRelid(parent),
>                                   key->partattrs[0], false);
>         else
>             colname = deparse_expression((Node *) linitial(partexprs),
>
> deparse_context_for(RelationGetRelationName(parent),
>
> RelationGetRelid(parent)),
>                                          false, false);
>         /* Need its type data too */
>         coltype = get_partition_col_typid(key, 0);
>         coltypmod = get_partition_col_typmod(key, 0);
>         partcollation = get_partition_col_collation(key, 0);
> ===
>
> belongs in the new function transformPartitionListBounds that you
> added, because without doing so, any errors having to do with
> partitioning columns other than the first one will report the first
> column's name in the error message:
>
> postgres=# create table foo (a bool, b bool) partition by list (a, b);
> CREATE TABLE
>
> -- this is fine!
> postgres=# create table foo_true_true partition of foo for values in (1, true);
> ERROR:  specified value cannot be cast to type boolean for column "a"
> LINE 1: ...able foo_true_true partition of foo for values in (1, true);
>
> -- not this!
> postgres=# create table foo_true_true partition of foo for values in (true, 1);
> ERROR:  specified value cannot be cast to type boolean for column "a"
> LINE 1: ...able foo_true_true partition of foo for values in (true, 1);
>
> * The following prototype of transformPartitionListBounds() means that
> all values in a given bound list are analyzed with the first
> partitioning column's colname, type, typmod, etc., which is wrong:
>
> +static List *
> +transformPartitionListBounds(ParseState *pstate, PartitionBoundSpec *spec,
> +                            char *colname, Oid coltype, int32 coltypmod,
> +                            Oid partcollation, int partnatts)
> +{
>
> An example of wrong behavior because of that:
>
> postgres=# create table foo (a bool, b text) partition by list (a, b);
> CREATE TABLE
> Time: 3.967 ms
> postgres=# create table foo_true_true partition of foo for values in
> (true, 'whatever');
> ERROR:  invalid input syntax for type boolean: "whatever"
> LINE 1: ...o_true_true partition of foo for values in (true, 'whatever'...
>
> "whatever" should've been accepted but because it's checked with a's
> type, it is wrongly flagged.
>
> Please take a look at how transformPartitionRangeBound() handles this,
> especially how it uses the correct partitioning column's info to
> analyze the corresponding bound value expression.
>
> I will continue looking next week.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

26 May 2021, 04:57:04

On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
> > understand from this syntax that the partition constraint is something
> > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> > row-wise comparison.
>
> Thanks for suggesting to use row-wise comparison.

Actually, I was just describing how the *users* may want to visualize
the partition constraint...

> I have few queries
> with respect to handling of NULL values.
>
> 1. What should be the partition constraint for the above case. AFAIK,
> row-wise comparison wont work with NULL values as shown in [1]. I mean
> two rows are considered equal if all their corresponding members are
> non-null and equal. The rows are unequal if any corresponding members
> are non-null and unequal. Otherwise the result of the row comparison
> is unknown (null). So we should generate different types of
> constraints for NULL values.
>
> Ex:
> CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
> (NULL, 1), (NULL, NULL);
>
> As per my knowledge, we should consider creating partition constraints
> for the above example as given below.
>
> (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
> 1)) OR ((a is NULL) AND (b is NULL)).

Yeah, something like that should do the trick.

Again, I was not actually suggesting that you write code to implement
the constraint using something like RowCompareExpr, only that the
users might want to view the constraint as doing row-wise comparison
of the partitioning columns and the specified value lists.

> 2. In the current code we don't put the NULL value in the 'datums'
> field of 'PartitionBoundInfoData' structure [2]. Since there can be
> only one NULL value, we directly store the corresponding index value
> in the 'null_index' field. Now we have to handle multiple NULL values
> in case of Multi-Column List Partitioning. So the question is how to
> handle this scenario. Following are the 2 approaches to handle this.
>
> Approach-1:
> Add another field 'bool  **isnull' in [2] and mark the corresponding
> element to TRUE if it has NULL value and the corresponding location in
> 'datums' contains empty/No value. For example, If a partition bound is
> (1, NULL), then
>
> datums[0][0] = 1
> datums[0][1] = Not assigned any value
> isnull[0][0] = FALSE
> is null[0][1] = TRUE
>
> So now we have an entry in the 'datums'  field for a bound containing
> NULL value, so we should handle this in all the scenarios where we are
> manipulating 'datums' in order to support NULL values and avoid crash.
>
> Approach-2:
> Don't add the bound information to 'datums' field of [2] if any of the
> value is NULL. Store this information separately in the structures
> mentioned in [3] and process accordingly.
>
> I feel approach-1 is the better solution as this requires less code
> changes and easy to implement than approach-2. Kindly share your
> thoughts about the approaches and please share if you have any better
> solution than the above 2.

Approach 1 sounds better.  It sounds like approach 1 might help us
implement support for allowing NULLs in range partition bounds in the
future, if at all.  For now, it might be better to not allocate the
isnull array except for list partitioning.

I'll wait for you to post a new patch addressing at least the comments
in my earlier email.  Also, please make sure to run `make check`
successfully before posting the patch. :)

Thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

03 June 2021, 14:45:09

> Approach 1 sounds better.  It sounds like approach 1 might help us
> implement support for allowing NULLs in range partition bounds in the
> future, if at all.  For now, it might be better to not allocate the
> isnull array except for list partitioning.

Thanks for confirming.

> I'll wait for you to post a new patch addressing at least the comments
> in my earlier email.  Also, please make sure to run `make check`
> successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres@17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres@17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres@17503=#\d+ t
                                      Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | integer |           |          |         | plain    |
     |              |
 b      | text    |           |          |         | extended |
     |              |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres@17503=#\d+ t1
                                            Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | integer |           |          |         | plain    |
     |              |
 b      | text    |           |          |         | extended |
     |              |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing  a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

Pending items:
1. Support of partition pruning for multi-column list partitioning.
2. Support of partition wise join for multi-column list partitioning.

I will continue to work on the above 2 items.
Kindly let me know if I am missing something.

Thanks & Regards,
Nitin Jadhav


On Wed, May 26, 2021 at 10:27 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > > IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
> > > understand from this syntax that the partition constraint is something
> > > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> > > row-wise comparison.
> >
> > Thanks for suggesting to use row-wise comparison.
>
> Actually, I was just describing how the *users* may want to visualize
> the partition constraint...
>
> > I have few queries
> > with respect to handling of NULL values.
> >
> > 1. What should be the partition constraint for the above case. AFAIK,
> > row-wise comparison wont work with NULL values as shown in [1]. I mean
> > two rows are considered equal if all their corresponding members are
> > non-null and equal. The rows are unequal if any corresponding members
> > are non-null and unequal. Otherwise the result of the row comparison
> > is unknown (null). So we should generate different types of
> > constraints for NULL values.
> >
> > Ex:
> > CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
> > CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
> > (NULL, 1), (NULL, NULL);
> >
> > As per my knowledge, we should consider creating partition constraints
> > for the above example as given below.
> >
> > (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
> > 1)) OR ((a is NULL) AND (b is NULL)).
>
> Yeah, something like that should do the trick.
>
> Again, I was not actually suggesting that you write code to implement
> the constraint using something like RowCompareExpr, only that the
> users might want to view the constraint as doing row-wise comparison
> of the partitioning columns and the specified value lists.
>
> > 2. In the current code we don't put the NULL value in the 'datums'
> > field of 'PartitionBoundInfoData' structure [2]. Since there can be
> > only one NULL value, we directly store the corresponding index value
> > in the 'null_index' field. Now we have to handle multiple NULL values
> > in case of Multi-Column List Partitioning. So the question is how to
> > handle this scenario. Following are the 2 approaches to handle this.
> >
> > Approach-1:
> > Add another field 'bool  **isnull' in [2] and mark the corresponding
> > element to TRUE if it has NULL value and the corresponding location in
> > 'datums' contains empty/No value. For example, If a partition bound is
> > (1, NULL), then
> >
> > datums[0][0] = 1
> > datums[0][1] = Not assigned any value
> > isnull[0][0] = FALSE
> > is null[0][1] = TRUE
> >
> > So now we have an entry in the 'datums'  field for a bound containing
> > NULL value, so we should handle this in all the scenarios where we are
> > manipulating 'datums' in order to support NULL values and avoid crash.
> >
> > Approach-2:
> > Don't add the bound information to 'datums' field of [2] if any of the
> > value is NULL. Store this information separately in the structures
> > mentioned in [3] and process accordingly.
> >
> > I feel approach-1 is the better solution as this requires less code
> > changes and easy to implement than approach-2. Kindly share your
> > thoughts about the approaches and please share if you have any better
> > solution than the above 2.
>
> Approach 1 sounds better.  It sounds like approach 1 might help us
> implement support for allowing NULLs in range partition bounds in the
> future, if at all.  For now, it might be better to not allocate the
> isnull array except for list partitioning.
>
> I'll wait for you to post a new patch addressing at least the comments
> in my earlier email.  Also, please make sure to run `make check`
> successfully before posting the patch. :)
>
> Thanks.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Attachment

v1_multi_column_list_partitioning.patch

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

11 June 2021, 03:37:54

Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > I'll wait for you to post a new patch addressing at least the comments
> > in my earlier email.  Also, please make sure to run `make check`
> > successfully before posting the patch. :)
>
> I have fixed all of the review comments given by you and Jeevan in the
> attached patch and also the attached patch contains more changes
> compared to the previous patch. Following are the implementation
> details.

Thanks for the updated version.

> 1. Regarding syntax, the existing syntax will work fine for the
> single-column list partitioning. However I have used the new syntax
> for the multi-column list partitioning as we discussed earlier. I have
> used a combination of 'AND' and 'OR' logic for the partition
> constraints as given in the below example.
>
> postgres@17503=#create table t(a int, b text) partition by list(a,b);
> CREATE TABLE
> postgres@17503=#create table t1 partition of t for values in ((1,'a'),
> (NULL,'b'));
> CREATE TABLE
> postgres@17503=#\d+ t
>                                       Partitioned table "public.t"
>  Column |  Type   | Collation | Nullable | Default | Storage  |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
>  a      | integer |           |          |         | plain    |
>      |              |
>  b      | text    |           |          |         | extended |
>      |              |
> Partition key: LIST (a, b)
> Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))
>
> postgres@17503=#\d+ t1
>                                             Table "public.t1"
>  Column |  Type   | Collation | Nullable | Default | Storage  |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
>  a      | integer |           |          |         | plain    |
>      |              |
>  b      | text    |           |          |         | extended |
>      |              |
> Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
> Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
> AND (b = 'b'::text)))
> Access method: heap

The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

> 2. In the existing code, NULL values were handled differently. It was
> not added to the 'datums' variable, rather used to store the partition
> index directly in the 'null_index' variable. Now there is a
> possibility of multiple NULL values, hence introducing  a new member
> 'isnulls' in the 'PartitionBoundInfoData' struct which indicates
> whether the corresponding element in the 'datums' is NULL. Now
> 'null_index' cannot be used directly to store the partition index, so
> removed it and made the necessary changes in multiple places.
>
> 3. I have added test cases for 'create table' and 'insert' statements
> related to multi-column list partitioning and these are working fine
> with 'make check'.
>
> 4. Handled the partition pruning code to accommodate these changes for
> single-column list partitioning. However it is pending for
> multi-column list partitioning.
>
> 5. I have done necessary changes in partition wise join related code
> to accommodate for single-column list partitioning. However it is
> pending for multi-column list partitioning.
>
> Kindly review the patch and let me know if any changes are required.

The new list bound binary search and related comparison support
function look a bit too verbose to me.  I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case.  The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function.  I'm talking about the code
fragment starting with this comment:

          /*
           * Once we find the matching for the first column but if it does not
           * match for the any of the other columns, then the binary search
           * will not work in all the cases. We should traverse just below
           * and above the mid index until we find the match or we reach the
           * first mismatch.
           */

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns.  But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers.  Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

    /*
     * For hash partitioning we can only perform pruning based on equality
     * clauses to the partition key or IS NULL clauses.  We also can only
     * prune if we got values for all keys.
     */
    if (nvalues + bms_num_members(nullkeys) == partnatts)
    {
        /* code to compute matching hash bound offset */
    }
    else
    {
        /* Report all valid offsets into the boundinfo->indexes array. */
        result->bound_offsets = bms_add_range(NULL, 0,
                                              boundinfo->nindexes - 1);
    }

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation?  That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted.  That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully.   For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

v1_delta_amit.patch

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

11 June 2021, 06:26:05

On Fri, Jun 11, 2021 at 12:37 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I will look at other parts of the patch next week hopefully.   For
> now, attached is a delta patch that applies on top of your v1, which
> does:
>
> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> * Make qsort_partition_list_value_cmp simply call
> partition_lbound_datum_cmp() instead of having its own logic to
> compare input bounds
> * Move partition_lbound_datum_cmp() into partbounds.c as a static
> function (export seems unnecessary)
> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

One more:

* Add all columns of newly added test query in insert.sql to the order
by clause to get predictably ordered output

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Zhihong Yu

Date:

11 June 2021, 17:32:25

On Thu, Jun 10, 2021 at 8:38 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > I'll wait for you to post a new patch addressing at least the comments
> > in my earlier email. Also, please make sure to run `make check`
> > successfully before posting the patch. :)
>
> I have fixed all of the review comments given by you and Jeevan in the
> attached patch and also the attached patch contains more changes
> compared to the previous patch. Following are the implementation
> details.

Thanks for the updated version.

> 1. Regarding syntax, the existing syntax will work fine for the
> single-column list partitioning. However I have used the new syntax
> for the multi-column list partitioning as we discussed earlier. I have
> used a combination of 'AND' and 'OR' logic for the partition
> constraints as given in the below example.
>
> postgres@17503=#create table t(a int, b text) partition by list(a,b);
> CREATE TABLE
> postgres@17503=#create table t1 partition of t for values in ((1,'a'),
> (NULL,'b'));
> CREATE TABLE
> postgres@17503=#\d+ t
> Partitioned table "public.t"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> b | text | | | | extended |
> | |
> Partition key: LIST (a, b)
> Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))
>
> postgres@17503=#\d+ t1
> Table "public.t1"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> b | text | | | | extended |
> | |
> Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
> Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
> AND (b = 'b'::text)))
> Access method: heap

The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

> 2. In the existing code, NULL values were handled differently. It was
> not added to the 'datums' variable, rather used to store the partition
> index directly in the 'null_index' variable. Now there is a
> possibility of multiple NULL values, hence introducing a new member
> 'isnulls' in the 'PartitionBoundInfoData' struct which indicates
> whether the corresponding element in the 'datums' is NULL. Now
> 'null_index' cannot be used directly to store the partition index, so
> removed it and made the necessary changes in multiple places.
>
> 3. I have added test cases for 'create table' and 'insert' statements
> related to multi-column list partitioning and these are working fine
> with 'make check'.
>
> 4. Handled the partition pruning code to accommodate these changes for
> single-column list partitioning. However it is pending for
> multi-column list partitioning.
>
> 5. I have done necessary changes in partition wise join related code
> to accommodate for single-column list partitioning. However it is
> pending for multi-column list partitioning.
>
> Kindly review the patch and let me know if any changes are required.

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

/*
* Once we find the matching for the first column but if it does not
* match for the any of the other columns, then the binary search
* will not work in all the cases. We should traverse just below
* and above the mid index until we find the match or we reach the
* first mismatch.
*/

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

/*
* For hash partitioning we can only perform pruning based on equality
* clauses to the partition key or IS NULL clauses. We also can only
* prune if we got values for all keys.
*/
if (nvalues + bms_num_members(nullkeys) == partnatts)
{
/* code to compute matching hash bound offset */
}
else
{
/* Report all valid offsets into the boundinfo->indexes array. */
result->bound_offsets = bms_add_range(NULL, 0,
boundinfo->nindexes - 1);
}

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

Hi, Amit:

+ * isnulls is an array of boolean-tuples with key->partnatts booleans values

+ * each. Currently only used for list partitioning, it stores whether a

I think 'booleans' should be 'boolean'.

The trailing word 'each' is unnecessary.

Cheers

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

25 August 2021, 12:42:09

> The new list bound binary search and related comparison support
> function look a bit too verbose to me.  I was expecting
> partition_list_bsearch() to look very much like
> partition_range_datum_bsearch(), but that is not the case.  The
> special case code that you wrote in partition_list_bsearch() seems
> unnecessary, at least in that function.  I'm talking about the code
> fragment starting with this comment:
>
> I will look at other parts of the patch next week hopefully.   For
> now, attached is a delta patch that applies on top of your v1, which
> does:
>
> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> * Make qsort_partition_list_value_cmp simply call
> partition_lbound_datum_cmp() instead of having its own logic to
> compare input bounds
> * Move partition_lbound_datum_cmp() into partbounds.c as a static
> function (export seems unnecessary)
> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

> I guess you're perhaps trying to address the case where the caller
> does not specify the values for all of the partition key columns,
> which can happen when the partition pruning code needs to handle a set
> of clauses matching only some of the partition key columns.  But
> that's a concern of the partition pruning code and so the special case
> should be handled there (if at all), not in the binary search function
> that is shared with other callers.  Regarding that, I'm wondering if
> we should require clauses matching all of the partition key columns to
> be found for the pruning code to call the binary search, so do
> something like get_matching_hash_bounds() does:
>
> Do you think that trying to match list partitions even with fewer keys
> is worth the complexity of the implementation?  That is, is the use
> case to search for only a subset of partition key columns common
> enough with list partitioning?
>
> If we do decide to implement the special case, remember that to do
> that efficiently, we'd need to require that the subset of matched key
> columns constitutes a prefix, because of the way the datums are
> sorted.  That is, match all partitions when the query only contains a
> clause for b when the partition key is (a, b, c), but engage the
> special case of pruning if the query contains clauses for a, or for a
> and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1.  Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.



On Fri, Jun 11, 2021 at 10:57 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Thu, Jun 10, 2021 at 8:38 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> Hi Nitin,
>>
>> On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
>> <nitinjadhavpostgres@gmail.com> wrote:
>> > > I'll wait for you to post a new patch addressing at least the comments
>> > > in my earlier email.  Also, please make sure to run `make check`
>> > > successfully before posting the patch. :)
>> >
>> > I have fixed all of the review comments given by you and Jeevan in the
>> > attached patch and also the attached patch contains more changes
>> > compared to the previous patch. Following are the implementation
>> > details.
>>
>> Thanks for the updated version.
>>
>> > 1. Regarding syntax, the existing syntax will work fine for the
>> > single-column list partitioning. However I have used the new syntax
>> > for the multi-column list partitioning as we discussed earlier. I have
>> > used a combination of 'AND' and 'OR' logic for the partition
>> > constraints as given in the below example.
>> >
>> > postgres@17503=#create table t(a int, b text) partition by list(a,b);
>> > CREATE TABLE
>> > postgres@17503=#create table t1 partition of t for values in ((1,'a'),
>> > (NULL,'b'));
>> > CREATE TABLE
>> > postgres@17503=#\d+ t
>> >                                       Partitioned table "public.t"
>> >  Column |  Type   | Collation | Nullable | Default | Storage  |
>> > Compression | Stats target | Description
>> > --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
>> >  a      | integer |           |          |         | plain    |
>> >      |              |
>> >  b      | text    |           |          |         | extended |
>> >      |              |
>> > Partition key: LIST (a, b)
>> > Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))
>> >
>> > postgres@17503=#\d+ t1
>> >                                             Table "public.t1"
>> >  Column |  Type   | Collation | Nullable | Default | Storage  |
>> > Compression | Stats target | Description
>> > --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
>> >  a      | integer |           |          |         | plain    |
>> >      |              |
>> >  b      | text    |           |          |         | extended |
>> >      |              |
>> > Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
>> > Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
>> > AND (b = 'b'::text)))
>> > Access method: heap
>>
>> The constraint expressions seem to come out correctly, though I
>> haven't checked your implementation closely yet.
>>
>> > 2. In the existing code, NULL values were handled differently. It was
>> > not added to the 'datums' variable, rather used to store the partition
>> > index directly in the 'null_index' variable. Now there is a
>> > possibility of multiple NULL values, hence introducing  a new member
>> > 'isnulls' in the 'PartitionBoundInfoData' struct which indicates
>> > whether the corresponding element in the 'datums' is NULL. Now
>> > 'null_index' cannot be used directly to store the partition index, so
>> > removed it and made the necessary changes in multiple places.
>> >
>> > 3. I have added test cases for 'create table' and 'insert' statements
>> > related to multi-column list partitioning and these are working fine
>> > with 'make check'.
>> >
>> > 4. Handled the partition pruning code to accommodate these changes for
>> > single-column list partitioning. However it is pending for
>> > multi-column list partitioning.
>> >
>> > 5. I have done necessary changes in partition wise join related code
>> > to accommodate for single-column list partitioning. However it is
>> > pending for multi-column list partitioning.
>> >
>> > Kindly review the patch and let me know if any changes are required.
>>
>> The new list bound binary search and related comparison support
>> function look a bit too verbose to me.  I was expecting
>> partition_list_bsearch() to look very much like
>> partition_range_datum_bsearch(), but that is not the case.  The
>> special case code that you wrote in partition_list_bsearch() seems
>> unnecessary, at least in that function.  I'm talking about the code
>> fragment starting with this comment:
>>
>>           /*
>>            * Once we find the matching for the first column but if it does not
>>            * match for the any of the other columns, then the binary search
>>            * will not work in all the cases. We should traverse just below
>>            * and above the mid index until we find the match or we reach the
>>            * first mismatch.
>>            */
>>
>> I guess you're perhaps trying to address the case where the caller
>> does not specify the values for all of the partition key columns,
>> which can happen when the partition pruning code needs to handle a set
>> of clauses matching only some of the partition key columns.  But
>> that's a concern of the partition pruning code and so the special case
>> should be handled there (if at all), not in the binary search function
>> that is shared with other callers.  Regarding that, I'm wondering if
>> we should require clauses matching all of the partition key columns to
>> be found for the pruning code to call the binary search, so do
>> something like get_matching_hash_bounds() does:
>>
>>     /*
>>      * For hash partitioning we can only perform pruning based on equality
>>      * clauses to the partition key or IS NULL clauses.  We also can only
>>      * prune if we got values for all keys.
>>      */
>>     if (nvalues + bms_num_members(nullkeys) == partnatts)
>>     {
>>         /* code to compute matching hash bound offset */
>>     }
>>     else
>>     {
>>         /* Report all valid offsets into the boundinfo->indexes array. */
>>         result->bound_offsets = bms_add_range(NULL, 0,
>>                                               boundinfo->nindexes - 1);
>>     }
>>
>> Do you think that trying to match list partitions even with fewer keys
>> is worth the complexity of the implementation?  That is, is the use
>> case to search for only a subset of partition key columns common
>> enough with list partitioning?
>>
>> If we do decide to implement the special case, remember that to do
>> that efficiently, we'd need to require that the subset of matched key
>> columns constitutes a prefix, because of the way the datums are
>> sorted.  That is, match all partitions when the query only contains a
>> clause for b when the partition key is (a, b, c), but engage the
>> special case of pruning if the query contains clauses for a, or for a
>> and b.
>>
>> I will look at other parts of the patch next week hopefully.   For
>> now, attached is a delta patch that applies on top of your v1, which
>> does:
>>
>> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
>> * Make qsort_partition_list_value_cmp simply call
>> partition_lbound_datum_cmp() instead of having its own logic to
>> compare input bounds
>> * Move partition_lbound_datum_cmp() into partbounds.c as a static
>> function (export seems unnecessary)
>> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com
>
>
> Hi, Amit:
>
> + * isnulls is an array of boolean-tuples with key->partnatts booleans values
> + * each.  Currently only used for list partitioning, it stores whether a
>
> I think 'booleans' should be 'boolean'.
> The trailing word 'each' is unnecessary.
>
> Cheers

Attachment

v2-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Zhihong Yu

Date:

25 August 2021, 21:10:33

On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

> The new list bound binary search and related comparison support
> function look a bit too verbose to me. I was expecting
> partition_list_bsearch() to look very much like
> partition_range_datum_bsearch(), but that is not the case. The
> special case code that you wrote in partition_list_bsearch() seems
> unnecessary, at least in that function. I'm talking about the code
> fragment starting with this comment:
>
> I will look at other parts of the patch next week hopefully. For
> now, attached is a delta patch that applies on top of your v1, which
> does:
>
> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> * Make qsort_partition_list_value_cmp simply call
> partition_lbound_datum_cmp() instead of having its own logic to
> compare input bounds
> * Move partition_lbound_datum_cmp() into partbounds.c as a static
> function (export seems unnecessary)
> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

> I guess you're perhaps trying to address the case where the caller
> does not specify the values for all of the partition key columns,
> which can happen when the partition pruning code needs to handle a set
> of clauses matching only some of the partition key columns. But
> that's a concern of the partition pruning code and so the special case
> should be handled there (if at all), not in the binary search function
> that is shared with other callers. Regarding that, I'm wondering if
> we should require clauses matching all of the partition key columns to
> be found for the pruning code to call the binary search, so do
> something like get_matching_hash_bounds() does:
>
> Do you think that trying to match list partitions even with fewer keys
> is worth the complexity of the implementation? That is, is the use
> case to search for only a subset of partition key columns common
> enough with list partitioning?
>
> If we do decide to implement the special case, remember that to do
> that efficiently, we'd need to require that the subset of matched key
> columns constitutes a prefix, because of the way the datums are
> sorted. That is, match all partitions when the query only contains a
> clause for b when the partition key is (a, b, c), but engage the
> special case of pruning if the query contains clauses for a, or for a
> and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1. Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.

Hi,

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+ isDuplicate = checkForDuplicates(result, values);
+ if (isDuplicate)
+ continue;

It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

+ //TODO: Handle for multi-column cases
+ for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Cheers

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

27 August 2021, 07:24:09

> + * isnulls is an array of boolean-tuples with key->partnatts booleans values
> + * each.  Currently only used for list partitioning, it stores whether a
>
> I think 'booleans' should be 'boolean'.
> The trailing word 'each' is unnecessary.

> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax

> +       isDuplicate = checkForDuplicates(result, values);
> +       if (isDuplicate)
> +           continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from
checkForDuplicates().

I agree that isDuplicate is not required.
Thanks for sharing the comments. I will take care of these comments in
the next patch.

> +       //TODO: Handle for multi-column cases
> +       for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?

Yes. The code changes related to partition-wise join are in progress.
I will handle these in the next patch.

Thanks & Regards,
Nitin Jadhav

On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
>>
>> > The new list bound binary search and related comparison support
>> > function look a bit too verbose to me.  I was expecting
>> > partition_list_bsearch() to look very much like
>> > partition_range_datum_bsearch(), but that is not the case.  The
>> > special case code that you wrote in partition_list_bsearch() seems
>> > unnecessary, at least in that function.  I'm talking about the code
>> > fragment starting with this comment:
>> >
>> > I will look at other parts of the patch next week hopefully.   For
>> > now, attached is a delta patch that applies on top of your v1, which
>> > does:
>> >
>> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
>> > * Make qsort_partition_list_value_cmp simply call
>> > partition_lbound_datum_cmp() instead of having its own logic to
>> > compare input bounds
>> > * Move partition_lbound_datum_cmp() into partbounds.c as a static
>> > function (export seems unnecessary)
>> > * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index
>>
>> Yes. You are right. The extra code added in partition_list_bsearch()
>> is not required and thanks for sharing the delta patch. It looks good
>> to me and I have incorporated the changes in the attached patch.
>>
>> > I guess you're perhaps trying to address the case where the caller
>> > does not specify the values for all of the partition key columns,
>> > which can happen when the partition pruning code needs to handle a set
>> > of clauses matching only some of the partition key columns.  But
>> > that's a concern of the partition pruning code and so the special case
>> > should be handled there (if at all), not in the binary search function
>> > that is shared with other callers.  Regarding that, I'm wondering if
>> > we should require clauses matching all of the partition key columns to
>> > be found for the pruning code to call the binary search, so do
>> > something like get_matching_hash_bounds() does:
>> >
>> > Do you think that trying to match list partitions even with fewer keys
>> > is worth the complexity of the implementation?  That is, is the use
>> > case to search for only a subset of partition key columns common
>> > enough with list partitioning?
>> >
>> > If we do decide to implement the special case, remember that to do
>> > that efficiently, we'd need to require that the subset of matched key
>> > columns constitutes a prefix, because of the way the datums are
>> > sorted.  That is, match all partitions when the query only contains a
>> > clause for b when the partition key is (a, b, c), but engage the
>> > special case of pruning if the query contains clauses for a, or for a
>> > and b.
>>
>> Thanks for the suggestion. Below is the implementation details for the
>> partition pruning for multi column list partitioning.
>>
>> In the existing code (For single column list partitioning)
>> 1. In gen_partprune_steps_internal(), we try to match the where
>> clauses provided by the user with the partition key data using
>> match_clause_to_partition_key(). Based on the match, this function can
>> return many values like PARTCLAUSE_MATCH_CLAUSE,
>> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
>> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
>> gen_prune_steps_from_opexps() (strategy-2) which generate and return a
>> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
>> clauses that have been matched to the partition key and it also takes
>> care handling prefix of the partition keys.
>> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
>> gen_prune_step_op() (strategy-1) which generates single
>> PartitionPruneStepOp since the earlier list partitioning supports
>> single column and there can be only one NULL value. In
>> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
>> partition index which accepts null and we used to return from here.
>>
>> In case of multi column list partitioning, we have columns more than
>> one and hence there is a possibility of more than one NULL values in
>> the where clauses. The above mentioned steps are modified like below.
>>
>> 1.  Modified the match_clause_to_partition_key() to generate an object
>> of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
>> case of clauses related to NULL. The information required to generate
>> PartClauseInfo is populated here like the constant expression
>> consisting of (Datum) 0, op_strategy, op_is_ne, etc.
>> 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
>> (gen_prune_steps_from_opexps) to generate partition pruning steps.
>> This function takes care of generating a list of pruning steps if
>> there are multiple clauses and also takes care of handling prefixes.
>> 3. Modified perform_pruning_base_step() to generate the datum values
>> and isnulls data of the where clauses. In case if any of the key
>> contains NULL value then the corresponding datum value is 0.
>> 4. Modified get_matching_list_bounds() to generate the minimum offset
>> and/or maximum offset of the matched values based on the difference
>> operation strategies. Now since the NULL containing bound values are
>> part of 'boundinfo', changed the code accordingly to include the NULL
>> containing partitions or not in different scenarios like
>> InvalidStrategy, etc.
>>
>> I have done some cosmetic changes to
>> v1_multi_column_list_partitioning.patch. So all the above code changes
>> related to partition pruning are merged with the previous patch and
>> also included the delta patch shared by you. Hence sharing a single
>> patch.
>>
>> Kindly have a look and share your thoughts.
>>
>>
> Hi,
>
> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax
>
> +       isDuplicate = checkForDuplicates(result, values);
> +       if (isDuplicate)
> +           continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from
checkForDuplicates().
>
> +       //TODO: Handle for multi-column cases
> +       for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?
>
> Cheers

Re: Multi-Column List Partitioning

From

Rajkumar Raghuwanshi

Date:

30 August 2021, 07:51:21

Hi Nitin.

I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where update row movement is not working properly.

Please find the test case below.

postgres=# create table p0 (a int, b text, c bool) partition by list (a,b,c);
CREATE TABLE
postgres=# create table p01 partition of p0 for values in ((1,1,true));
CREATE TABLE
postgres=# create table p02 partition of p0 for values in ((1,NULL,false));
CREATE TABLE
postgres=# insert into p0 values (1,'1',true);
INSERT 0 1
postgres=# insert into p0 values (1,NULL,false);
INSERT 0 1
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | 1 | t
p02 | 1 | | f
(2 rows)

postgres=# update p0 set b = NULL;
UPDATE 2
postgres=# select tableoid::regclass,* from p0;
tableoid | a | b | c
----------+---+---+---
p01 | 1 | | t
p02 | 1 | | f
(2 rows)

I think this update should fail as there is no partition satisfying update row (1,NULL,true).

Thanks & Regards,

Rajkumar Raghuwanshi

On Fri, Aug 27, 2021 at 12:53 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

> + * isnulls is an array of boolean-tuples with key->partnatts booleans values
> + * each. Currently only used for list partitioning, it stores whether a
>
> I think 'booleans' should be 'boolean'.
> The trailing word 'each' is unnecessary.

> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax

> + isDuplicate = checkForDuplicates(result, values);
> + if (isDuplicate)
> + continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

I agree that isDuplicate is not required.
Thanks for sharing the comments. I will take care of these comments in
the next patch.

> + //TODO: Handle for multi-column cases
> + for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?

Yes. The code changes related to partition-wise join are in progress.
I will handle these in the next patch.

Thanks & Regards,
Nitin Jadhav

On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
>>
>> > The new list bound binary search and related comparison support
>> > function look a bit too verbose to me. I was expecting
>> > partition_list_bsearch() to look very much like
>> > partition_range_datum_bsearch(), but that is not the case. The
>> > special case code that you wrote in partition_list_bsearch() seems
>> > unnecessary, at least in that function. I'm talking about the code
>> > fragment starting with this comment:
>> >
>> > I will look at other parts of the patch next week hopefully. For
>> > now, attached is a delta patch that applies on top of your v1, which
>> > does:
>> >
>> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
>> > * Make qsort_partition_list_value_cmp simply call
>> > partition_lbound_datum_cmp() instead of having its own logic to
>> > compare input bounds
>> > * Move partition_lbound_datum_cmp() into partbounds.c as a static
>> > function (export seems unnecessary)
>> > * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index
>>
>> Yes. You are right. The extra code added in partition_list_bsearch()
>> is not required and thanks for sharing the delta patch. It looks good
>> to me and I have incorporated the changes in the attached patch.
>>
>> > I guess you're perhaps trying to address the case where the caller
>> > does not specify the values for all of the partition key columns,
>> > which can happen when the partition pruning code needs to handle a set
>> > of clauses matching only some of the partition key columns. But
>> > that's a concern of the partition pruning code and so the special case
>> > should be handled there (if at all), not in the binary search function
>> > that is shared with other callers. Regarding that, I'm wondering if
>> > we should require clauses matching all of the partition key columns to
>> > be found for the pruning code to call the binary search, so do
>> > something like get_matching_hash_bounds() does:
>> >
>> > Do you think that trying to match list partitions even with fewer keys
>> > is worth the complexity of the implementation? That is, is the use
>> > case to search for only a subset of partition key columns common
>> > enough with list partitioning?
>> >
>> > If we do decide to implement the special case, remember that to do
>> > that efficiently, we'd need to require that the subset of matched key
>> > columns constitutes a prefix, because of the way the datums are
>> > sorted. That is, match all partitions when the query only contains a
>> > clause for b when the partition key is (a, b, c), but engage the
>> > special case of pruning if the query contains clauses for a, or for a
>> > and b.
>>
>> Thanks for the suggestion. Below is the implementation details for the
>> partition pruning for multi column list partitioning.
>>
>> In the existing code (For single column list partitioning)
>> 1. In gen_partprune_steps_internal(), we try to match the where
>> clauses provided by the user with the partition key data using
>> match_clause_to_partition_key(). Based on the match, this function can
>> return many values like PARTCLAUSE_MATCH_CLAUSE,
>> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
>> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
>> gen_prune_steps_from_opexps() (strategy-2) which generate and return a
>> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
>> clauses that have been matched to the partition key and it also takes
>> care handling prefix of the partition keys.
>> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
>> gen_prune_step_op() (strategy-1) which generates single
>> PartitionPruneStepOp since the earlier list partitioning supports
>> single column and there can be only one NULL value. In
>> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
>> partition index which accepts null and we used to return from here.
>>
>> In case of multi column list partitioning, we have columns more than
>> one and hence there is a possibility of more than one NULL values in
>> the where clauses. The above mentioned steps are modified like below.
>>
>> 1. Modified the match_clause_to_partition_key() to generate an object
>> of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
>> case of clauses related to NULL. The information required to generate
>> PartClauseInfo is populated here like the constant expression
>> consisting of (Datum) 0, op_strategy, op_is_ne, etc.
>> 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
>> (gen_prune_steps_from_opexps) to generate partition pruning steps.
>> This function takes care of generating a list of pruning steps if
>> there are multiple clauses and also takes care of handling prefixes.
>> 3. Modified perform_pruning_base_step() to generate the datum values
>> and isnulls data of the where clauses. In case if any of the key
>> contains NULL value then the corresponding datum value is 0.
>> 4. Modified get_matching_list_bounds() to generate the minimum offset
>> and/or maximum offset of the matched values based on the difference
>> operation strategies. Now since the NULL containing bound values are
>> part of 'boundinfo', changed the code accordingly to include the NULL
>> containing partitions or not in different scenarios like
>> InvalidStrategy, etc.
>>
>> I have done some cosmetic changes to
>> v1_multi_column_list_partitioning.patch. So all the above code changes
>> related to partition pruning are merged with the previous patch and
>> also included the delta patch shared by you. Hence sharing a single
>> patch.
>>
>> Kindly have a look and share your thoughts.
>>
>>
> Hi,
>
> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax
>
> + isDuplicate = checkForDuplicates(result, values);
> + if (isDuplicate)
> + continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().
>
> + //TODO: Handle for multi-column cases
> + for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?
>
> Cheers

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

31 August 2021, 04:06:22

On Mon, Aug 30, 2021 at 4:51 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> Hi Nitin.
>
> I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where
updaterow movement is not working properly.
 
> Please find the test case below.
>
> postgres=# create table p0 (a int, b text, c bool) partition by list (a,b,c);
> CREATE TABLE
> postgres=# create table p01 partition of p0 for values in ((1,1,true));
> CREATE TABLE
> postgres=# create table p02 partition of p0 for values in ((1,NULL,false));
> CREATE TABLE
> postgres=# insert into p0 values (1,'1',true);
> INSERT 0 1
> postgres=# insert into p0 values (1,NULL,false);
> INSERT 0 1
> postgres=# select tableoid::regclass,* from p0;
>  tableoid | a | b | c
> ----------+---+---+---
>  p01      | 1 | 1 | t
>  p02      | 1 |   | f
> (2 rows)
>
> postgres=# update p0 set b = NULL;
> UPDATE 2
> postgres=# select tableoid::regclass,* from p0;
>  tableoid | a | b | c
> ----------+---+---+---
>  p01      | 1 |   | t
>  p02      | 1 |   | f
> (2 rows)
>
> I think this update should fail as there is no partition satisfying update row (1,NULL,true).

Yeah, contrary to my earlier assessment, it seems the partition
constraint on each of those partitions fails to explicitly include an
IS NOT NULL test for each column that has a non-NULL value assigned.
So, for example, the constraint of p01 should actually be:

(a IS NOT NULL) AND (a = 1) AND (b IS NOT NULL) AND (b = 1) AND (c IS
NOT NULL) AND (c = true)

As per the patch's current implementation, tuple (1, NULL, true)
passes p01's partition constraint, because only (b = 1) is not
sufficient to reject a NULL value being assigned to b.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

31 August 2021, 11:02:59

> I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where
updaterow movement is not working properly.
 
> Please find the test case below.

Thanks for testing and sharing the details of the issue.

> Yeah, contrary to my earlier assessment, it seems the partition
> constraint on each of those partitions fails to explicitly include an
> IS NOT NULL test for each column that has a non-NULL value assigned.
> So, for example, the constraint of p01 should actually be:
>
> (a IS NOT NULL) AND (a = 1) AND (b IS NOT NULL) AND (b = 1) AND (c IS
> NOT NULL) AND (c = true)

Yes. It should add an IS NOT NULL test for each column. I have
modified the patch accordingly and verified with the test case shared
by Rajkumar.

> + * isnulls is an array of boolean-tuples with key->partnatts booleans values
> + * each.  Currently only used for list partitioning, it stores whether a
>
> I think 'booleans' should be 'boolean'.
> The trailing word 'each' is unnecessary.
>
> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax
>
> +       isDuplicate = checkForDuplicates(result, values);
> +       if (isDuplicate)
> +           continue;
>
> It seems the variable isDuplicate is not needed. The if statement can directly check the return value from
checkForDuplicates().

The attached patch also fixes the above comments.

Thanks & Regards,
Nitin Jadhav
On Tue, Aug 31, 2021 at 9:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Aug 30, 2021 at 4:51 PM Rajkumar Raghuwanshi
> <rajkumar.raghuwanshi@enterprisedb.com> wrote:
> >
> > Hi Nitin.
> >
> > I have been testing these patches. Patches applied cleanly on the head. While testing I found below a case where
updaterow movement is not working properly.
 
> > Please find the test case below.
> >
> > postgres=# create table p0 (a int, b text, c bool) partition by list (a,b,c);
> > CREATE TABLE
> > postgres=# create table p01 partition of p0 for values in ((1,1,true));
> > CREATE TABLE
> > postgres=# create table p02 partition of p0 for values in ((1,NULL,false));
> > CREATE TABLE
> > postgres=# insert into p0 values (1,'1',true);
> > INSERT 0 1
> > postgres=# insert into p0 values (1,NULL,false);
> > INSERT 0 1
> > postgres=# select tableoid::regclass,* from p0;
> >  tableoid | a | b | c
> > ----------+---+---+---
> >  p01      | 1 | 1 | t
> >  p02      | 1 |   | f
> > (2 rows)
> >
> > postgres=# update p0 set b = NULL;
> > UPDATE 2
> > postgres=# select tableoid::regclass,* from p0;
> >  tableoid | a | b | c
> > ----------+---+---+---
> >  p01      | 1 |   | t
> >  p02      | 1 |   | f
> > (2 rows)
> >
> > I think this update should fail as there is no partition satisfying update row (1,NULL,true).
>
> Yeah, contrary to my earlier assessment, it seems the partition
> constraint on each of those partitions fails to explicitly include an
> IS NOT NULL test for each column that has a non-NULL value assigned.
> So, for example, the constraint of p01 should actually be:
>
> (a IS NOT NULL) AND (a = 1) AND (b IS NOT NULL) AND (b = 1) AND (c IS
> NOT NULL) AND (c = true)
>
> As per the patch's current implementation, tuple (1, NULL, true)
> passes p01's partition constraint, because only (b = 1) is not
> sufficient to reject a NULL value being assigned to b.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Attachment

v3-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

01 September 2021, 05:31:07

Hi Nitin,

On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> The attached patch also fixes the above comments.

I noticed that multi-column list partitions containing NULLs don't
work correctly with partition pruning yet.

create table p0 (a int, b text, c bool) partition by list (a, b, c);
create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
create table p02 partition of p0 for values in ((1, NULL, false));
explain select * from p0 where a is null;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
   Filter: (a IS NULL)
(2 rows)

I guess that may be due to the following newly added code being incomplete:

+/*
+ * get_partition_bound_null_index
+ *
+ * Returns the partition index of the partition bound which accepts NULL.
+ */
+int
+get_partition_bound_null_index(PartitionBoundInfo boundinfo)
+{
+   int i = 0;
+   int j = 0;
+
+   if (!boundinfo->isnulls)
+       return -1;

-           if (!val->constisnull)
-               count++;
+   for (i = 0; i < boundinfo->ndatums; i++)
+   {
+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)
+       {
+           if (boundinfo->isnulls[i][j])
+               return boundinfo->indexes[i];
        }
    }

+   return -1;
+}

Maybe this function needs to return a "bitmapset" of indexes, because
multiple partitions can now contain NULL values.

Some other issues I noticed and suggestions for improvement:

+/*
+ * checkForDuplicates
+ *
+ * Returns TRUE if the list bound element is already present in the list of
+ * list bounds, FALSE otherwise.
+ */
+static bool
+checkForDuplicates(List *source, List *searchElem)

This function name may be too generic.  Given that it is specific to
implementing list bound de-duplication, maybe the following signature
is more appropriate:

static bool
checkListBoundDuplicated(List *list_bounds, List *new_bound)

Also, better if the function comment mentions those parameter names, like:

"Returns TRUE if the list bound element 'new_bound' is already present
in the target list 'list_bounds', FALSE otherwise."

+/*
+ * transformPartitionListBounds
+ *
+ * Converts the expressions of list partition bounds from the raw grammar
+ * representation.

A sentence about the result format would be helpful, like:

The result is a List of Lists of Const nodes to account for the
partition key possibly containing more than one column.

+   int             i = 0;
+   int             j = 0;

Better to initialize such loop counters closer to the loop.

+           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
+           colname[i] = get_attname(RelationGetRelid(parent),
+                                    key->partattrs[i], false);

The palloc in the 1st statement is wasteful, because the 2nd statement
overwrites its pointer by the pointer to the string palloc'd by
get_attname().

+           ListCell   *cell2 = NULL;

No need to explicitly initialize the loop variable.

+           RowExpr     *rowexpr = NULL;
+
+           if (!IsA(expr, RowExpr))
+               ereport(ERROR,
+                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+                       errmsg("Invalid list bound specification"),
+                       parser_errposition(pstate, exprLocation((Node
*) spec))));
+
+           rowexpr = (RowExpr *) expr;

It's okay to assign rowexpr at the top here instead of the dummy
NULL-initialization and write the condition as:

    if (!IsA(rowexpr, RowExpr))

+       if (isDuplicate)
+           continue;
+
+       result = lappend(result, values);

I can see you copied this style from the existing code, but how about
writing this simply as:

    if (!isDuplicate)
        result = lappend(result, values);

-/* One value coming from some (index'th) list partition */
+/* One bound of a list partition */
 typedef struct PartitionListValue
 {
    int         index;
-   Datum       value;
+   Datum      *values;
+   bool       *isnulls;
 } PartitionListValue;

Given that this is a locally-defined struct, I wonder if it makes
sense to rename the struct while we're at it.  Call it, say,
PartitionListBound?

Also, please keep part of the existing comment that says that the
bound belongs to index'th partition.

Will send more comments in a bit...

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

03 September 2021, 13:46:23

On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > The attached patch also fixes the above comments.
>
> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
>                        QUERY PLAN
> --------------------------------------------------------
>  Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
>    Filter: (a IS NULL)
> (2 rows)
>
> I guess that may be due to the following newly added code being incomplete:
>
> +/*
> + * get_partition_bound_null_index
> + *
> + * Returns the partition index of the partition bound which accepts NULL.
> + */
> +int
> +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
> +{
> +   int i = 0;
> +   int j = 0;
> +
> +   if (!boundinfo->isnulls)
> +       return -1;
>
> -           if (!val->constisnull)
> -               count++;
> +   for (i = 0; i < boundinfo->ndatums; i++)
> +   {
> +       //TODO: Handle for multi-column cases
> +       for (j = 0; j < 1; j++)
> +       {
> +           if (boundinfo->isnulls[i][j])
> +               return boundinfo->indexes[i];
>         }
>     }
>
> +   return -1;
> +}
>
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.
>
> Some other issues I noticed and suggestions for improvement:
>
> +/*
> + * checkForDuplicates
> + *
> + * Returns TRUE if the list bound element is already present in the list of
> + * list bounds, FALSE otherwise.
> + */
> +static bool
> +checkForDuplicates(List *source, List *searchElem)
>
> This function name may be too generic.  Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)
>
> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."
>
> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.
>
> +   int             i = 0;
> +   int             j = 0;
>
> Better to initialize such loop counters closer to the loop.
>
> +           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> +           colname[i] = get_attname(RelationGetRelid(parent),
> +                                    key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().
>
> +           ListCell   *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.
>
> +           RowExpr     *rowexpr = NULL;
> +
> +           if (!IsA(expr, RowExpr))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +                       errmsg("Invalid list bound specification"),
> +                       parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> +           rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
>     if (!IsA(rowexpr, RowExpr))
>
> +       if (isDuplicate)
> +           continue;
> +
> +       result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
>     if (!isDuplicate)
>         result = lappend(result, values);
>
> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
>  typedef struct PartitionListValue
>  {
>     int         index;
> -   Datum       value;
> +   Datum      *values;
> +   bool       *isnulls;
>  } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it.  Call it, say,
> PartitionListBound?
>
> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.
>
> Will send more comments in a bit...

+ * partition_bound_accepts_nulls
+ *
+ * Returns TRUE if partition bound has NULL value, FALSE otherwise.
  */

I suggest slight rewording, as follows:

"Returns TRUE if any of the partition bounds contains a NULL value,
FALSE otherwise."

-   PartitionListValue *all_values;
+   PartitionListValue **all_values;
...
-   all_values = (PartitionListValue *)
-       palloc(ndatums * sizeof(PartitionListValue));
+   ndatums = get_list_datum_count(boundspecs, nparts);
+   all_values = (PartitionListValue **)
+       palloc(ndatums * sizeof(PartitionListValue *));

I don't see the need to redefine all_values's pointer type.  No need
to palloc PartitionListValue repeatedly for every datum as done
further down as follows:

+           all_values[j] = (PartitionListValue *)
palloc(sizeof(PartitionListValue));

You do need the following two though:

+           all_values[j]->values = (Datum *) palloc0(key->partnatts *
sizeof(Datum));
+           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
sizeof(bool));

If you change the above the way I suggest, you'd also need to revert
the following change:

-   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
+   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
              qsort_partition_list_value_cmp, (void *) key);

+       int         orig_index = all_values[i]->index;
+       boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));

Missing a newline between these two statements.

BTW, I noticed that the boundDatums variable is no longer used in
create_list_bounds.  I traced back its origin and found that a recent
commit 53d86957e98 introduced it to implement an idea to reduce the
finer-grained pallocs that were being done in create_list_bounds().  I
don't think that this patch needs to throw away that work.  You can
make it work as the attached delta patch that applies on top of v3.
Please check.

@@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
    if (b1->nindexes != b2->nindexes)
        return false;

-   if (b1->null_index != b2->null_index)
+   if (get_partition_bound_null_index(b1) !=
get_partition_bound_null_index(b2))

As mentioned in the last message, this bit in partition_bounds_equal()
needs to be comparing "bitmapsets" of null bound indexes, that is
after fixing get_partition_bound_null_index() as previously mentioned.

But...

@@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
                 * context.  datumIsEqual() should be simple enough to be
                 * safe.
                 */
-               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+               if (b1->isnulls)
+                   b1_isnull = b1->isnulls[i][j];
+               if (b2->isnulls)
+                   b2_isnull = b2->isnulls[i][j];
+
+               /*
+                * If any of the partition bound has NULL value, then check
+                * equality for the NULL value instead of comparing the datums
+                * as it does not contain valid value in case of NULL.
+                */
+               if (b1_isnull || b2_isnull)
+               {
+                   if (b1_isnull != b2_isnull)
+                       return false;
+               }

...if you have this in the main loop, I don't think we need the above
code stanza which appears to implement a short-cut for this long-form
logic.

+               (key->strategy != PARTITION_STRATEGY_LIST ||
+                !src->isnulls[i][j]))

I think it's better to write this condition as follows just like the
accompanying condition involving src->kind:

    (src->nulls == NULL || !src->isnulls[i][j])

(Skipped looking at merge_list_bounds() and related changes for now as
I see a lot of TODOs remain to be done.)

In check_new_partition_bound():

+                       Datum      *values = (Datum *)
palloc0(key->partnatts * sizeof(Datum));
+                       bool       *isnulls = (bool *)
palloc0(key->partnatts * sizeof(bool));

Doesn't seem like a bad idea to declare these as:

    Datum    values[PARTITION_MAX_KEYS];
    bool        isnulls[PARTITION_MAX_KEYS];


I looked at get_qual_for_list_multi_column() and immediately thought
that it may be a bad idea.  I think it's better to integrate the logic
for multi-column case into the existing function even if that makes
the function appear more complex.  Having two functions with the same
goal and mostly the same code is not a good idea mainly because it
becomes a maintenance burden.

I have attempted a rewrite such that get_qual_for_list() now handles
both the single-column and multi-column cases.  Changes included in
the delta patch.  The patch updates some outputs of the newly added
tests for multi-column list partitions, because the new code emits the
IS NOT NULL tests a bit differently than
get_qual_for_list_mutli_column() would.  Notably, the old approach
would emit IS NOT NULL for every non-NULL datum matched to a given
column, not just once for the column.  However, the patch makes a few
other tests fail, mainly because I had to fix
partition_bound_accepts_nulls() to handle the multi-column case,
though didn't bother to update all callers of it to also handle the
multi-column case correctly.  I guess that's a TODO you're going to
deal with at some point anyway. :)

I still have more than half of v3 left to look at, so will continue
looking.   In the meantime, please check the changes I suggested,
including the delta patch, and let me know your thoughts.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

v3_delta_amit.patch

Re: Multi-Column List Partitioning

From

Rajkumar Raghuwanshi

Date:

13 September 2021, 10:17:36

On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.

--inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
update mlparted_tab mlp set c = 'xxx'
from
(select a from some_tab union all select a+1 from some_tab) ss (a)
where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
ERROR: negative bitmapset member not allowed

--partition_join.sql is crashing with enable_partitionwise_join set to true.
CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
ANALYZE plt1_adv;
CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
ANALYZE plt2_adv;
-- inner join
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

--stack-trace
Core was generated by `postgres: edb regression [local] EXPLAIN '.
Program terminated with signal 6, Aborted.
#0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
#1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
#2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
at assert.c:69
#3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
#4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
#5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
#6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
parts2=0x7fffd67751a8) at joinrels.c:1644
#7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
at joinrels.c:1402
#8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
at joinrels.c:926
#9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
#10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
#11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
#12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
#13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
#14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
#15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
#16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
#17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
#18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
#19 0x0000000000844ce9 in planner (parse=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
#20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
#21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
params=0x0, queryEnv=0x0) at explain.c:397
#22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
#23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
#24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
#25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
#26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
#27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
#28 0x0000000000978aa5 in exec_simple_query (

Thanks & Regards,

Rajkumar Raghuwanshi

On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > The attached patch also fixes the above comments.
>
> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> I guess that may be due to the following newly added code being incomplete:
>
> +/*
> + * get_partition_bound_null_index
> + *
> + * Returns the partition index of the partition bound which accepts NULL.
> + */
> +int
> +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
> +{
> + int i = 0;
> + int j = 0;
> +
> + if (!boundinfo->isnulls)
> + return -1;
>
> - if (!val->constisnull)
> - count++;
> + for (i = 0; i < boundinfo->ndatums; i++)
> + {
> + //TODO: Handle for multi-column cases
> + for (j = 0; j < 1; j++)
> + {
> + if (boundinfo->isnulls[i][j])
> + return boundinfo->indexes[i];
> }
> }
>
> + return -1;
> +}
>
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.
>
> Some other issues I noticed and suggestions for improvement:
>
> +/*
> + * checkForDuplicates
> + *
> + * Returns TRUE if the list bound element is already present in the list of
> + * list bounds, FALSE otherwise.
> + */
> +static bool
> +checkForDuplicates(List *source, List *searchElem)
>
> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)
>
> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."
>
> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.
>
> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.
>
> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().
>
> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.
>
> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))
>
> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);
>
> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?
>
> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.
>
> Will send more comments in a bit...

+ * partition_bound_accepts_nulls
+ *
+ * Returns TRUE if partition bound has NULL value, FALSE otherwise.
*/

I suggest slight rewording, as follows:

"Returns TRUE if any of the partition bounds contains a NULL value,
FALSE otherwise."

- PartitionListValue *all_values;
+ PartitionListValue **all_values;
...
- all_values = (PartitionListValue *)
- palloc(ndatums * sizeof(PartitionListValue));
+ ndatums = get_list_datum_count(boundspecs, nparts);
+ all_values = (PartitionListValue **)
+ palloc(ndatums * sizeof(PartitionListValue *));

I don't see the need to redefine all_values's pointer type. No need
to palloc PartitionListValue repeatedly for every datum as done
further down as follows:

+ all_values[j] = (PartitionListValue *)
palloc(sizeof(PartitionListValue));

You do need the following two though:

+ all_values[j]->values = (Datum *) palloc0(key->partnatts *
sizeof(Datum));
+ all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
sizeof(bool));

If you change the above the way I suggest, you'd also need to revert
the following change:

- qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
+ qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
qsort_partition_list_value_cmp, (void *) key);

+ int orig_index = all_values[i]->index;
+ boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));

Missing a newline between these two statements.

BTW, I noticed that the boundDatums variable is no longer used in
create_list_bounds. I traced back its origin and found that a recent
commit 53d86957e98 introduced it to implement an idea to reduce the
finer-grained pallocs that were being done in create_list_bounds(). I
don't think that this patch needs to throw away that work. You can
make it work as the attached delta patch that applies on top of v3.
Please check.

@@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
if (b1->nindexes != b2->nindexes)
return false;

- if (b1->null_index != b2->null_index)
+ if (get_partition_bound_null_index(b1) !=
get_partition_bound_null_index(b2))

As mentioned in the last message, this bit in partition_bounds_equal()
needs to be comparing "bitmapsets" of null bound indexes, that is
after fixing get_partition_bound_null_index() as previously mentioned.

But...

@@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
*parttyplen, bool *parttypbyval,
* context. datumIsEqual() should be simple enough to be
* safe.
*/
- if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+ if (b1->isnulls)
+ b1_isnull = b1->isnulls[i][j];
+ if (b2->isnulls)
+ b2_isnull = b2->isnulls[i][j];
+
+ /*
+ * If any of the partition bound has NULL value, then check
+ * equality for the NULL value instead of comparing the datums
+ * as it does not contain valid value in case of NULL.
+ */
+ if (b1_isnull || b2_isnull)
+ {
+ if (b1_isnull != b2_isnull)
+ return false;
+ }

...if you have this in the main loop, I don't think we need the above
code stanza which appears to implement a short-cut for this long-form
logic.

+ (key->strategy != PARTITION_STRATEGY_LIST ||
+ !src->isnulls[i][j]))

I think it's better to write this condition as follows just like the
accompanying condition involving src->kind:

(src->nulls == NULL || !src->isnulls[i][j])

(Skipped looking at merge_list_bounds() and related changes for now as
I see a lot of TODOs remain to be done.)

In check_new_partition_bound():

+ Datum *values = (Datum *)
palloc0(key->partnatts * sizeof(Datum));
+ bool *isnulls = (bool *)
palloc0(key->partnatts * sizeof(bool));

Doesn't seem like a bad idea to declare these as:

Datum values[PARTITION_MAX_KEYS];
bool isnulls[PARTITION_MAX_KEYS];

I looked at get_qual_for_list_multi_column() and immediately thought
that it may be a bad idea. I think it's better to integrate the logic
for multi-column case into the existing function even if that makes
the function appear more complex. Having two functions with the same
goal and mostly the same code is not a good idea mainly because it
becomes a maintenance burden.

I have attempted a rewrite such that get_qual_for_list() now handles
both the single-column and multi-column cases. Changes included in
the delta patch. The patch updates some outputs of the newly added
tests for multi-column list partitions, because the new code emits the
IS NOT NULL tests a bit differently than
get_qual_for_list_mutli_column() would. Notably, the old approach
would emit IS NOT NULL for every non-NULL datum matched to a given
column, not just once for the column. However, the patch makes a few
other tests fail, mainly because I had to fix
partition_bound_accepts_nulls() to handle the multi-column case,
though didn't bother to update all callers of it to also handle the
multi-column case correctly. I guess that's a TODO you're going to
deal with at some point anyway. :)

I still have more than half of v3 left to look at, so will continue
looking. In the meantime, please check the changes I suggested,
including the delta patch, and let me know your thoughts.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

01 October 2021, 13:55:44

Hi,

On Mon, Sep 13, 2021 at 7:17 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
> On PG head + Nitin's v3 patch + Amit's Delta patch.  Make check is failing with below errors.

Thanks Rajkumar for testing.

> --inherit.sql is failing with error :"ERROR:  negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
>   (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR:  negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.

Here's a v2 of the delta patch that should fix both of these test
failures.  As I mentioned in my last reply, my delta patch fixed what
I think were problems in Nitin's v3 patch but were not complete by
themselves.  Especially, I hadn't bothered to investigate various /*
TODO: handle multi-column list partitioning */ sites to deal with my
own changes.

In the attached updated version, I've dealt with some of those such
that at least the existing cases exercising partition pruning and
partition wise joins now pass.

I thought about sending a v4 of the main patch with my proposed
changes so far integrated, but decided to just post a delta_v2 for
now.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

v3_delta_amit_v2.patch

Re: Multi-Column List Partitioning

From

Zhihong Yu

Date:

01 October 2021, 15:09:07

On Fri, Oct 1, 2021 at 6:56 AM Amit Langote <amitlangote09@gmail.com> wrote:

Hi,

On Mon, Sep 13, 2021 at 7:17 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.

Thanks Rajkumar for testing.

> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.

Here's a v2 of the delta patch that should fix both of these test
failures. As I mentioned in my last reply, my delta patch fixed what
I think were problems in Nitin's v3 patch but were not complete by
themselves. Especially, I hadn't bothered to investigate various /*
TODO: handle multi-column list partitioning */ sites to deal with my
own changes.

In the attached updated version, I've dealt with some of those such
that at least the existing cases exercising partition pruning and
partition wise joins now pass.

I thought about sending a v4 of the main patch with my proposed
changes so far integrated, but decided to just post a delta_v2 for
now.

--
Amit Langote
EDB: http://www.enterprisedb.com

Hi,

+ for (i = 0; i < partnatts; i++)
+ {
+ if (outer_isnull[i])

+ outer_has_null = true;

We can come out of the loop once outer_has_null is assigned.

Similar comment for the loop inside if (inner_isnull).

+ * For the multi-column case, we must make an BoolExpr that

an BoolExpr -> a BoolExpr

Cheers

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

02 October 2021, 20:23:26

> > On PG head + Nitin's v3 patch + Amit's Delta patch.  Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures.  As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves.  Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
>                        QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
>  Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.


> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.


> This function name may be too generic.  Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.


> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.


> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.


> +   int             i = 0;
> +   int             j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.


> +           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> +           colname[i] = get_attname(RelationGetRelid(parent),
> +                                    key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.


> +           ListCell   *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.


> +           RowExpr     *rowexpr = NULL;
> +
> +           if (!IsA(expr, RowExpr))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +                       errmsg("Invalid list bound specification"),
> +                       parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> +           rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
>    if (!IsA(rowexpr, RowExpr))

Fixed.


> +       if (isDuplicate)
> +           continue;
> +
> +       result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
>    if (!isDuplicate)
>        result = lappend(result, values);

This looks good. I have changed in the patch.


> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
>    int         index;
> -   Datum       value;
> +   Datum      *values;
> +   bool       *isnulls;
>  } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it.  Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.


> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>  */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.


> -   PartitionListValue *all_values;
> +   PartitionListValue **all_values;
> ...
> -   all_values = (PartitionListValue *)
> -       palloc(ndatums * sizeof(PartitionListValue));
> +   ndatums = get_list_datum_count(boundspecs, nparts);
> +   all_values = (PartitionListValue **)
> +       palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type.  No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> +           all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> +           all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> +           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> -   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> +   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>              qsort_partition_list_value_cmp, (void *) key);
>
> +       int         orig_index = all_values[i]->index;
> +       boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.


> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
>    if (b1->nindexes != b2->nindexes)
>        return false;
>
> -   if (b1->null_index != b2->null_index)
> +   if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
>                 * context.  datumIsEqual() should be simple enough to be
>                 * safe.
>                 */
> -               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> +               if (b1->isnulls)
> +                   b1_isnull = b1->isnulls[i][j];
> +               if (b2->isnulls)
> +                   b2_isnull = b2->isnulls[i][j];
> +
> +               /*
> +                * If any of the partition bound has NULL value, then check
> +                * equality for the NULL value instead of comparing the datums
> +                * as it does not contain valid value in case of NULL.
> +                */
> +               if (b1_isnull || b2_isnull)
> +               {
> +                   if (b1_isnull != b2_isnull)
> +                       return false;
> +               }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.


> +               (key->strategy != PARTITION_STRATEGY_LIST ||
> +                !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
>    (src->nulls == NULL || !src->isnulls[i][j])

Fixed.


> In check_new_partition_bound():
>
> +                       Datum      *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> +                       bool       *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
>     Datum    values[PARTITION_MAX_KEYS];
>    bool        isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea.  I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex.  Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases.  Changes included in
> the delta patch.  The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would.  Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column.  However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly.  I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.


Thanks & Regards,
Nitin Jadhav


On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch.  Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR:  negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
>   (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR:  negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4,
6,8, 9); 
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4,
6,7, 9); 
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b <
10ORDER BY t1.a; 
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN                                      '.
> Program terminated with signal 6, Aborted.
> #0  0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-19.el7.x86_64libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64
openssl-libs-1.0.2k-19.el7.x86_64pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 
> (gdb) bt
> #0  0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1  0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2  0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3
"FailedAssertion",fileName=0xcbd2fe "partbounds.c", lineNumber=1957) 
>     at assert.c:69
> #3  0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4  0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738,
outer_rel=0x19b37c0,inner_rel=0x1922938, jointype=JOIN_INNER, 
>     outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5  0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738,
outer_rel=0x19b37c0,inner_rel=0x1922938, jointype=JOIN_INNER, 
>     outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6  0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938,
joinrel=0x1ab7f30,parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0, 
>     parts2=0x7fffd67751a8) at joinrels.c:1644
> #7  0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
parent_sjinfo=0x7fffd67752a0,parent_restrictlist=0x1ab3318) 
>     at joinrels.c:1402
> #8  0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938,
joinrel=0x1ab7f30,sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318) 
>     at joinrels.c:926
> #9  0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970,
other_rels=0x1ab2990)at joinrels.c:312 
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at
allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>,
qp_extra=0x7fffd6775630)at planmain.c:276 
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false,
tuple_fraction=0)at planner.c:1025 
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
>     query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv
t2ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at
planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
>     query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv
t2ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at
planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
>     query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv
t2ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at
postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
>     queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2
ON(t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", 
>     params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at
explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
>     queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2
ON(t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", 
>     readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378,
qc=0x7fffd6775f90)at utility.c:845 
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
>     queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2
ON(t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", 
>     readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378,
qc=0x7fffd6775f90)at utility.c:527 
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true,
dest=0x197c378,qc=0x7fffd6775f90) at pquery.c:1147 
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x1a0c0b8,altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758 
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres@gmail.com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> >                        QUERY PLAN
>> > --------------------------------------------------------
>> >  Seq Scan on p01 p0  (cost=0.00..22.50 rows=6 width=37)
>> >    Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > +   int i = 0;
>> > +   int j = 0;
>> > +
>> > +   if (!boundinfo->isnulls)
>> > +       return -1;
>> >
>> > -           if (!val->constisnull)
>> > -               count++;
>> > +   for (i = 0; i < boundinfo->ndatums; i++)
>> > +   {
>> > +       //TODO: Handle for multi-column cases
>> > +       for (j = 0; j < 1; j++)
>> > +       {
>> > +           if (boundinfo->isnulls[i][j])
>> > +               return boundinfo->indexes[i];
>> >         }
>> >     }
>> >
>> > +   return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic.  Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > +   int             i = 0;
>> > +   int             j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > +           colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > +           colname[i] = get_attname(RelationGetRelid(parent),
>> > +                                    key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > +           ListCell   *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > +           RowExpr     *rowexpr = NULL;
>> > +
>> > +           if (!IsA(expr, RowExpr))
>> > +               ereport(ERROR,
>> > +                       (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > +                       errmsg("Invalid list bound specification"),
>> > +                       parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > +           rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> >     if (!IsA(rowexpr, RowExpr))
>> >
>> > +       if (isDuplicate)
>> > +           continue;
>> > +
>> > +       result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> >     if (!isDuplicate)
>> >         result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> >  typedef struct PartitionListValue
>> >  {
>> >     int         index;
>> > -   Datum       value;
>> > +   Datum      *values;
>> > +   bool       *isnulls;
>> >  } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it.  Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>>   */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> -   PartitionListValue *all_values;
>> +   PartitionListValue **all_values;
>> ...
>> -   all_values = (PartitionListValue *)
>> -       palloc(ndatums * sizeof(PartitionListValue));
>> +   ndatums = get_list_datum_count(boundspecs, nparts);
>> +   all_values = (PartitionListValue **)
>> +       palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type.  No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> +           all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> +           all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> +           all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> -   qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> +   qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>>               qsort_partition_list_value_cmp, (void *) key);
>>
>> +       int         orig_index = all_values[i]->index;
>> +       boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds.  I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds().  I
>> don't think that this patch needs to throw away that work.  You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>>     if (b1->nindexes != b2->nindexes)
>>         return false;
>>
>> -   if (b1->null_index != b2->null_index)
>> +   if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>>                  * context.  datumIsEqual() should be simple enough to be
>>                  * safe.
>>                  */
>> -               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> +               if (b1->isnulls)
>> +                   b1_isnull = b1->isnulls[i][j];
>> +               if (b2->isnulls)
>> +                   b2_isnull = b2->isnulls[i][j];
>> +
>> +               /*
>> +                * If any of the partition bound has NULL value, then check
>> +                * equality for the NULL value instead of comparing the datums
>> +                * as it does not contain valid value in case of NULL.
>> +                */
>> +               if (b1_isnull || b2_isnull)
>> +               {
>> +                   if (b1_isnull != b2_isnull)
>> +                       return false;
>> +               }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> +               (key->strategy != PARTITION_STRATEGY_LIST ||
>> +                !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>>     (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> +                       Datum      *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> +                       bool       *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>>     Datum    values[PARTITION_MAX_KEYS];
>>     bool        isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea.  I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex.  Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases.  Changes included in
>> the delta patch.  The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would.  Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column.  However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly.  I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking.   In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Attachment

v4-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Rajkumar Raghuwanshi

Date:

07 October 2021, 10:34:52

Thanks Nitin,

v4 patches applied cleanly and make check is passing now. While testing further I observed that if multiple values are given for a single

column list partition it is not giving error instead it is changing values itself. Please find the example below.

postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE
postgres=# \d+ plt1;
Partitioned table "public.plt1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | character varying | | | | extended | | |
Partition key: LIST (b)
Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')

I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.
also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Thanks & Regards,

Rajkumar Raghuwanshi

On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures. As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves. Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.

> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.

> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.

> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.

> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.

> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.

> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.

> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.

> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))

Fixed.

> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);

This looks good. I have changed in the patch.

> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.

> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.

> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.

> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.

> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])

Fixed.

> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.

Thanks & Regards,
Nitin Jadhav

On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
> at assert.c:69
> #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
> parts2=0x7fffd67751a8) at joinrels.c:1644
> #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
> at joinrels.c:1402
> #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
> at joinrels.c:926
> #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres@gmail.com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > + int i = 0;
>> > + int j = 0;
>> > +
>> > + if (!boundinfo->isnulls)
>> > + return -1;
>> >
>> > - if (!val->constisnull)
>> > - count++;
>> > + for (i = 0; i < boundinfo->ndatums; i++)
>> > + {
>> > + //TODO: Handle for multi-column cases
>> > + for (j = 0; j < 1; j++)
>> > + {
>> > + if (boundinfo->isnulls[i][j])
>> > + return boundinfo->indexes[i];
>> > }
>> > }
>> >
>> > + return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>> >
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> - PartitionListValue *all_values;
>> + PartitionListValue **all_values;
>> ...
>> - all_values = (PartitionListValue *)
>> - palloc(ndatums * sizeof(PartitionListValue));
>> + ndatums = get_list_datum_count(boundspecs, nparts);
>> + all_values = (PartitionListValue **)
>> + palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type. No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> + all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> qsort_partition_list_value_cmp, (void *) key);
>>
>> + int orig_index = all_values[i]->index;
>> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds. I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds(). I
>> don't think that this patch needs to throw away that work. You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> if (b1->nindexes != b2->nindexes)
>> return false;
>>
>> - if (b1->null_index != b2->null_index)
>> + if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> * context. datumIsEqual() should be simple enough to be
>> * safe.
>> */
>> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>> +
>> + /*
>> + * If any of the partition bound has NULL value, then check
>> + * equality for the NULL value instead of comparing the datums
>> + * as it does not contain valid value in case of NULL.
>> + */
>> + if (b1_isnull || b2_isnull)
>> + {
>> + if (b1_isnull != b2_isnull)
>> + return false;
>> + }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> + !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>> (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> + Datum *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> + bool *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>> Datum values[PARTITION_MAX_KEYS];
>> bool isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea. I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex. Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases. Changes included in
>> the delta patch. The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would. Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column. However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly. I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking. In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

07 October 2021, 12:34:16

Thanks Rajkumar for testing.

> I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.

I also agree that it should throw an error in the above case. Fixed the issue in the attached patch. Also added related test cases to the regression test suite.

> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Now throwing errors in the initial stage, this case doesn't arise.

Please share if you find any other issues.

Thanks & Regards,

Nitin Jadhav

On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Thanks Nitin,

v4 patches applied cleanly and make check is passing now. While testing further I observed that if multiple values are given for a single
column list partition it is not giving error instead it is changing values itself. Please find the example below.

postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE
postgres=# \d+ plt1;
Partitioned table "public.plt1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | character varying | | | | extended | | |
Partition key: LIST (b)
Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')

I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.
also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Thanks & Regards,
Rajkumar Raghuwanshi

On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures. As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves. Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.

> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.

> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.

> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.

> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.

> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.

> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.

> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.

> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))

Fixed.

> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);

This looks good. I have changed in the patch.

> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.

> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.

> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.

> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.

> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])

Fixed.

> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.

Thanks & Regards,
Nitin Jadhav

On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
> at assert.c:69
> #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
> parts2=0x7fffd67751a8) at joinrels.c:1644
> #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
> at joinrels.c:1402
> #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
> at joinrels.c:926
> #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres@gmail.com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > + int i = 0;
>> > + int j = 0;
>> > +
>> > + if (!boundinfo->isnulls)
>> > + return -1;
>> >
>> > - if (!val->constisnull)
>> > - count++;
>> > + for (i = 0; i < boundinfo->ndatums; i++)
>> > + {
>> > + //TODO: Handle for multi-column cases
>> > + for (j = 0; j < 1; j++)
>> > + {
>> > + if (boundinfo->isnulls[i][j])
>> > + return boundinfo->indexes[i];
>> > }
>> > }
>> >
>> > + return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>> >
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> - PartitionListValue *all_values;
>> + PartitionListValue **all_values;
>> ...
>> - all_values = (PartitionListValue *)
>> - palloc(ndatums * sizeof(PartitionListValue));
>> + ndatums = get_list_datum_count(boundspecs, nparts);
>> + all_values = (PartitionListValue **)
>> + palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type. No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> + all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> qsort_partition_list_value_cmp, (void *) key);
>>
>> + int orig_index = all_values[i]->index;
>> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds. I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds(). I
>> don't think that this patch needs to throw away that work. You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> if (b1->nindexes != b2->nindexes)
>> return false;
>>
>> - if (b1->null_index != b2->null_index)
>> + if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> * context. datumIsEqual() should be simple enough to be
>> * safe.
>> */
>> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>> +
>> + /*
>> + * If any of the partition bound has NULL value, then check
>> + * equality for the NULL value instead of comparing the datums
>> + * as it does not contain valid value in case of NULL.
>> + */
>> + if (b1_isnull || b2_isnull)
>> + {
>> + if (b1_isnull != b2_isnull)
>> + return false;
>> + }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> + !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>> (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> + Datum *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> + bool *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>> Datum values[PARTITION_MAX_KEYS];
>> bool isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea. I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex. Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases. Changes included in
>> the delta patch. The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would. Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column. However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly. I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking. In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Attachment

v5-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Rajkumar Raghuwanshi

Date:

11 October 2021, 05:35:50

Thanks for the patch, it applied cleanly and fixed the reported issue. I observed another case where

In case of multi-col list partition on the same column query is not picking partition wise join. Is this expected?

CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT;
INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
ANALYSE plt1;
CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
ANALYSE plt2;
SET enable_partitionwise_join TO true;

EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;

postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;
QUERY PLAN
--------------------------------------------
Hash Join
Hash Cond: ((t1.c)::text = (t2.c)::text)
-> Append
-> Seq Scan on plt1_p1 t1_1
-> Seq Scan on plt1_p2 t1_2
-> Seq Scan on plt1_p3 t1_3
-> Hash
-> Append
-> Seq Scan on plt2_p1 t2_1
-> Seq Scan on plt2_p2 t2_2
-> Seq Scan on plt2_p3 t2_3
(11 rows)

Thanks & Regards,

Rajkumar Raghuwanshi

On Thu, Oct 7, 2021 at 6:03 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

Thanks Rajkumar for testing.

> I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.

I also agree that it should throw an error in the above case. Fixed the issue in the attached patch. Also added related test cases to the regression test suite.

> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Now throwing errors in the initial stage, this case doesn't arise.

Please share if you find any other issues.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks Nitin,

v4 patches applied cleanly and make check is passing now. While testing further I observed that if multiple values are given for a single
column list partition it is not giving error instead it is changing values itself. Please find the example below.

postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE
postgres=# \d+ plt1;
Partitioned table "public.plt1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | character varying | | | | extended | | |
Partition key: LIST (b)
Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')

I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.
also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Thanks & Regards,
Rajkumar Raghuwanshi

On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures. As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves. Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.

> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.

> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.

> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.

> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.

> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.

> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.

> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.

> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))

Fixed.

> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);

This looks good. I have changed in the patch.

> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.

> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.

> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.

> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.

> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])

Fixed.

> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.

Thanks & Regards,
Nitin Jadhav

On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
> at assert.c:69
> #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
> parts2=0x7fffd67751a8) at joinrels.c:1644
> #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
> at joinrels.c:1402
> #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
> at joinrels.c:926
> #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres@gmail.com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > + int i = 0;
>> > + int j = 0;
>> > +
>> > + if (!boundinfo->isnulls)
>> > + return -1;
>> >
>> > - if (!val->constisnull)
>> > - count++;
>> > + for (i = 0; i < boundinfo->ndatums; i++)
>> > + {
>> > + //TODO: Handle for multi-column cases
>> > + for (j = 0; j < 1; j++)
>> > + {
>> > + if (boundinfo->isnulls[i][j])
>> > + return boundinfo->indexes[i];
>> > }
>> > }
>> >
>> > + return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>> >
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> - PartitionListValue *all_values;
>> + PartitionListValue **all_values;
>> ...
>> - all_values = (PartitionListValue *)
>> - palloc(ndatums * sizeof(PartitionListValue));
>> + ndatums = get_list_datum_count(boundspecs, nparts);
>> + all_values = (PartitionListValue **)
>> + palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type. No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> + all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> qsort_partition_list_value_cmp, (void *) key);
>>
>> + int orig_index = all_values[i]->index;
>> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds. I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds(). I
>> don't think that this patch needs to throw away that work. You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> if (b1->nindexes != b2->nindexes)
>> return false;
>>
>> - if (b1->null_index != b2->null_index)
>> + if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> * context. datumIsEqual() should be simple enough to be
>> * safe.
>> */
>> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>> +
>> + /*
>> + * If any of the partition bound has NULL value, then check
>> + * equality for the NULL value instead of comparing the datums
>> + * as it does not contain valid value in case of NULL.
>> + */
>> + if (b1_isnull || b2_isnull)
>> + {
>> + if (b1_isnull != b2_isnull)
>> + return false;
>> + }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> + !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>> (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> + Datum *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> + bool *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>> Datum values[PARTITION_MAX_KEYS];
>> bool isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea. I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex. Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases. Changes included in
>> the delta patch. The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would. Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column. However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly. I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking. In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Rajkumar Raghuwanshi

Date:

11 October 2021, 10:42:32

Hi Nitin,

While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find test case & stack-trace below.

SET enable_partitionwise_join TO on;
CREATE TABLE plt1 (c varchar, d varchar) PARTITION BY LIST(c,d);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),(NULL,NULL));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
INSERT INTO plt1 SELECT to_char(i % 11, 'FM0000'), to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3,7,8,9);
INSERT INTO plt1 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i % 11 IN (3);
ANALYSE plt1;
CREATE TABLE plt2 (c varchar, d varchar) PARTITION BY LIST(c,d);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT to_char(i % 11, 'FM0000'), to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3);
INSERT INTO plt2 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i % 11 IN (3);
ANALYSE plt2;

EXPLAIN (COSTS OFF)
SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);

postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> \q
[edb@localhost bin]$ gdb -q -c data/core.66926 postgres
Reading symbols from /home/edb/WORK/pg_src/PG_TEMP/postgresql/inst/bin/postgres...done.
[New LWP 66926]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: edb postgres [local] EXPLAIN '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
1221 if (rel->pathlist == NIL)
(gdb) bt
#0 0x000000000082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
#1 0x000000000089341c in is_dummy_partition (rel=0x2f86e88, part_index=2) at partbounds.c:1959
#2 0x0000000000891d38 in merge_list_bounds (partnatts=2, partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88, inner_rel=0x2fd4368, jointype=JOIN_LEFT,
outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at partbounds.c:1325
#3 0x0000000000891991 in partition_bounds_merge (partnatts=2, partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88, inner_rel=0x2fd4368, jointype=JOIN_LEFT,
outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at partbounds.c:1198
#4 0x000000000082cc5a in compute_partition_bounds (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8, parts1=0x7ffea91f8cc0,
parts2=0x7ffea91f8cb8) at joinrels.c:1644
#5 0x000000000082c474 in try_partitionwise_join (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8, parent_restrictlist=0x2fae650)
at joinrels.c:1402
#6 0x000000000082b6e2 in populate_joinrel_with_paths (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, sjinfo=0x2f7dfa8, restrictlist=0x2fae650) at joinrels.c:926
#7 0x000000000082b135 in make_join_rel (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368) at joinrels.c:760
#8 0x000000000082a643 in make_rels_by_clause_joins (root=0x2f9e910, old_rel=0x2f86e88, other_rels_list=0x2f90148, other_rels=0x2f90160) at joinrels.c:312
#9 0x000000000082a119 in join_search_one_level (root=0x2f9e910, level=3) at joinrels.c:123
#10 0x000000000080cd97 in standard_join_search (root=0x2f9e910, levels_needed=3, initial_rels=0x2f90148) at allpaths.c:3020
#11 0x000000000080cd10 in make_rel_from_joinlist (root=0x2f9e910, joinlist=0x2fd7550) at allpaths.c:2951
#12 0x000000000080899a in make_one_rel (root=0x2f9e910, joinlist=0x2fd7550) at allpaths.c:228
#13 0x000000000084516a in query_planner (root=0x2f9e910, qp_callback=0x84ad85 <standard_qp_callback>, qp_extra=0x7ffea91f9140) at planmain.c:276
#14 0x000000000084788d in grouping_planner (root=0x2f9e910, tuple_fraction=0) at planner.c:1447
#15 0x0000000000846f56 in subquery_planner (glob=0x2fa0c08, parse=0x2f56d30, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
#16 0x000000000084578b in standard_planner (parse=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", cursorOptions=2048, boundParams=0x0) at planner.c:406
#17 0x0000000000845536 in planner (parse=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", cursorOptions=2048, boundParams=0x0) at planner.c:277
#18 0x0000000000978faf in pg_plan_query (querytree=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", cursorOptions=2048, boundParams=0x0) at postgres.c:847
#19 0x0000000000693e50 in ExplainOneQuery (query=0x2f56d30, cursorOptions=2048, into=0x0, es=0x2fa0920,
queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", params=0x0, queryEnv=0x0) at explain.c:397
#20 0x00000000006939a5 in ExplainQuery (pstate=0x2f9e0a0, stmt=0x2f56b50, params=0x0, dest=0x2f9e008) at explain.c:281
#21 0x0000000000981de8 in standard_ProcessUtility (pstmt=0x2fd2220,
queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2f9e008, qc=0x7ffea91f9aa0) at utility.c:862
#22 0x0000000000981585 in ProcessUtility (pstmt=0x2fd2220,
queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2f9e008, qc=0x7ffea91f9aa0) at utility.c:527
#23 0x00000000009801ba in PortalRunUtility (portal=0x2f10180, pstmt=0x2fd2220, isTopLevel=true, setHoldSnapshot=true, dest=0x2f9e008, qc=0x7ffea91f9aa0) at pquery.c:1155
#24 0x000000000097ff20 in FillPortalStore (portal=0x2f10180, isTopLevel=true) at pquery.c:1028
#25 0x000000000097f883 in PortalRun (portal=0x2f10180, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2fd2310, altdest=0x2fd2310, qc=0x7ffea91f9c60) at pquery.c:760
#26 0x00000000009795d1 in exec_simple_query (
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);") at postgres.c:1214
#27 0x000000000097da8d in PostgresMain (dbname=0x2ed8068 "postgres", username=0x2ed8048 "edb") at postgres.c:4497
#28 0x00000000008b9699 in BackendRun (port=0x2ecfd00) at postmaster.c:4560

Thanks & Regards,

Rajkumar Raghuwanshi

On Mon, Oct 11, 2021 at 11:05 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Thanks for the patch, it applied cleanly and fixed the reported issue. I observed another case where
In case of multi-col list partition on the same column query is not picking partition wise join. Is this expected?

CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT;
INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
ANALYSE plt1;
CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
ANALYSE plt2;
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;

postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;
QUERY PLAN
--------------------------------------------
Hash Join
Hash Cond: ((t1.c)::text = (t2.c)::text)
-> Append
-> Seq Scan on plt1_p1 t1_1
-> Seq Scan on plt1_p2 t1_2
-> Seq Scan on plt1_p3 t1_3
-> Hash
-> Append
-> Seq Scan on plt2_p1 t2_1
-> Seq Scan on plt2_p2 t2_2
-> Seq Scan on plt2_p3 t2_3
(11 rows)

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Oct 7, 2021 at 6:03 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
Thanks Rajkumar for testing.

> I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.

I also agree that it should throw an error in the above case. Fixed the issue in the attached patch. Also added related test cases to the regression test suite.

> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Now throwing errors in the initial stage, this case doesn't arise.

Please share if you find any other issues.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks Nitin,

v4 patches applied cleanly and make check is passing now. While testing further I observed that if multiple values are given for a single
column list partition it is not giving error instead it is changing values itself. Please find the example below.

postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE
postgres=# \d+ plt1;
Partitioned table "public.plt1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | character varying | | | | extended | | |
Partition key: LIST (b)
Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')

I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.
also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Thanks & Regards,
Rajkumar Raghuwanshi

On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures. As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves. Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.

> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.

> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.

> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.

> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.

> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.

> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.

> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.

> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))

Fixed.

> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);

This looks good. I have changed in the patch.

> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.

> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.

> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.

> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.

> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])

Fixed.

> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.

Thanks & Regards,
Nitin Jadhav

On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
> at assert.c:69
> #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
> parts2=0x7fffd67751a8) at joinrels.c:1644
> #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
> at joinrels.c:1402
> #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
> at joinrels.c:926
> #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres@gmail.com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > + int i = 0;
>> > + int j = 0;
>> > +
>> > + if (!boundinfo->isnulls)
>> > + return -1;
>> >
>> > - if (!val->constisnull)
>> > - count++;
>> > + for (i = 0; i < boundinfo->ndatums; i++)
>> > + {
>> > + //TODO: Handle for multi-column cases
>> > + for (j = 0; j < 1; j++)
>> > + {
>> > + if (boundinfo->isnulls[i][j])
>> > + return boundinfo->indexes[i];
>> > }
>> > }
>> >
>> > + return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>> >
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> - PartitionListValue *all_values;
>> + PartitionListValue **all_values;
>> ...
>> - all_values = (PartitionListValue *)
>> - palloc(ndatums * sizeof(PartitionListValue));
>> + ndatums = get_list_datum_count(boundspecs, nparts);
>> + all_values = (PartitionListValue **)
>> + palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type. No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> + all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> qsort_partition_list_value_cmp, (void *) key);
>>
>> + int orig_index = all_values[i]->index;
>> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds. I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds(). I
>> don't think that this patch needs to throw away that work. You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> if (b1->nindexes != b2->nindexes)
>> return false;
>>
>> - if (b1->null_index != b2->null_index)
>> + if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> * context. datumIsEqual() should be simple enough to be
>> * safe.
>> */
>> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>> +
>> + /*
>> + * If any of the partition bound has NULL value, then check
>> + * equality for the NULL value instead of comparing the datums
>> + * as it does not contain valid value in case of NULL.
>> + */
>> + if (b1_isnull || b2_isnull)
>> + {
>> + if (b1_isnull != b2_isnull)
>> + return false;
>> + }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> + !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>> (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> + Datum *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> + bool *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>> Datum values[PARTITION_MAX_KEYS];
>> bool isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea. I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex. Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases. Changes included in
>> the delta patch. The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would. Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column. However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly. I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking. In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

15 October 2021, 08:38:31

Hi Rajkumar,

On Mon, Oct 11, 2021 at 2:36 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> Thanks for the patch, it applied cleanly and fixed the reported issue.  I observed another case where
> In case of multi-col list partition on the same column query is not picking partition wise join. Is this expected?
>
> CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c);
> CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
> CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
> CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT;
> INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT  IN
(0,10);
> ANALYSE plt1;
> CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c);
> CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
> CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
> CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
> INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT  IN
(0,10);
> ANALYSE plt2;
> SET enable_partitionwise_join TO true;
> EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;
>
> postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;
>                  QUERY PLAN
> --------------------------------------------
>  Hash Join
>    Hash Cond: ((t1.c)::text = (t2.c)::text)
>    ->  Append
>          ->  Seq Scan on plt1_p1 t1_1
>          ->  Seq Scan on plt1_p2 t1_2
>          ->  Seq Scan on plt1_p3 t1_3
>    ->  Hash
>          ->  Append
>                ->  Seq Scan on plt2_p1 t2_1
>                ->  Seq Scan on plt2_p2 t2_2
>                ->  Seq Scan on plt2_p3 t2_3
> (11 rows)

Interesting test case.

I think this might be an *existing* limitation of the code that
compares join clauses against the partition key(s) to determine if
partition-wise join should be considered.  The clause t1.c = t2.c
should have been matched with both of the partition keys (c, c), but
it is not given the way have_partkey_equi_join() is currently coded.
I suspect you'd get the same behavior if you'd used a RANGE
partitioned table with keys (c, c).  Not sure though if it'd be
worthwhile to fix that coding to cater to this odd partition key
setting.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

22 October 2021, 09:49:39

> While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find test case & stack-trace below.

Thanks for sharing. I have fixed the issue in the attached patch.

Thanks & Regards,

Nitin Jadhav

On Mon, Oct 11, 2021 at 4:12 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Nitin,

While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find test case & stack-trace below.

SET enable_partitionwise_join TO on;
CREATE TABLE plt1 (c varchar, d varchar) PARTITION BY LIST(c,d);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),(NULL,NULL));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
INSERT INTO plt1 SELECT to_char(i % 11, 'FM0000'), to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3,7,8,9);
INSERT INTO plt1 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i % 11 IN (3);
ANALYSE plt1;
CREATE TABLE plt2 (c varchar, d varchar) PARTITION BY LIST(c,d);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT to_char(i % 11, 'FM0000'), to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3);
INSERT INTO plt2 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i % 11 IN (3);
ANALYSE plt2;

EXPLAIN (COSTS OFF)
SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);

postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> \q
[edb@localhost bin]$ gdb -q -c data/core.66926 postgres
Reading symbols from /home/edb/WORK/pg_src/PG_TEMP/postgresql/inst/bin/postgres...done.
[New LWP 66926]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: edb postgres [local] EXPLAIN '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
1221 if (rel->pathlist == NIL)
(gdb) bt
#0 0x000000000082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
#1 0x000000000089341c in is_dummy_partition (rel=0x2f86e88, part_index=2) at partbounds.c:1959
#2 0x0000000000891d38 in merge_list_bounds (partnatts=2, partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88, inner_rel=0x2fd4368, jointype=JOIN_LEFT,
outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at partbounds.c:1325
#3 0x0000000000891991 in partition_bounds_merge (partnatts=2, partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88, inner_rel=0x2fd4368, jointype=JOIN_LEFT,
outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at partbounds.c:1198
#4 0x000000000082cc5a in compute_partition_bounds (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8, parts1=0x7ffea91f8cc0,
parts2=0x7ffea91f8cb8) at joinrels.c:1644
#5 0x000000000082c474 in try_partitionwise_join (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8, parent_restrictlist=0x2fae650)
at joinrels.c:1402
#6 0x000000000082b6e2 in populate_joinrel_with_paths (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, sjinfo=0x2f7dfa8, restrictlist=0x2fae650) at joinrels.c:926
#7 0x000000000082b135 in make_join_rel (root=0x2f9e910, rel1=0x2f86e88, rel2=0x2fd4368) at joinrels.c:760
#8 0x000000000082a643 in make_rels_by_clause_joins (root=0x2f9e910, old_rel=0x2f86e88, other_rels_list=0x2f90148, other_rels=0x2f90160) at joinrels.c:312
#9 0x000000000082a119 in join_search_one_level (root=0x2f9e910, level=3) at joinrels.c:123
#10 0x000000000080cd97 in standard_join_search (root=0x2f9e910, levels_needed=3, initial_rels=0x2f90148) at allpaths.c:3020
#11 0x000000000080cd10 in make_rel_from_joinlist (root=0x2f9e910, joinlist=0x2fd7550) at allpaths.c:2951
#12 0x000000000080899a in make_one_rel (root=0x2f9e910, joinlist=0x2fd7550) at allpaths.c:228
#13 0x000000000084516a in query_planner (root=0x2f9e910, qp_callback=0x84ad85 <standard_qp_callback>, qp_extra=0x7ffea91f9140) at planmain.c:276
#14 0x000000000084788d in grouping_planner (root=0x2f9e910, tuple_fraction=0) at planner.c:1447
#15 0x0000000000846f56 in subquery_planner (glob=0x2fa0c08, parse=0x2f56d30, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
#16 0x000000000084578b in standard_planner (parse=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", cursorOptions=2048, boundParams=0x0) at planner.c:406
#17 0x0000000000845536 in planner (parse=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", cursorOptions=2048, boundParams=0x0) at planner.c:277
#18 0x0000000000978faf in pg_plan_query (querytree=0x2f56d30,
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", cursorOptions=2048, boundParams=0x0) at postgres.c:847
#19 0x0000000000693e50 in ExplainOneQuery (query=0x2f56d30, cursorOptions=2048, into=0x0, es=0x2fa0920,
queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", params=0x0, queryEnv=0x0) at explain.c:397
#20 0x00000000006939a5 in ExplainQuery (pstate=0x2f9e0a0, stmt=0x2f56b50, params=0x0, dest=0x2f9e008) at explain.c:281
#21 0x0000000000981de8 in standard_ProcessUtility (pstmt=0x2fd2220,
queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2f9e008, qc=0x7ffea91f9aa0) at utility.c:862
#22 0x0000000000981585 in ProcessUtility (pstmt=0x2fd2220,
queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2f9e008, qc=0x7ffea91f9aa0) at utility.c:527
#23 0x00000000009801ba in PortalRunUtility (portal=0x2f10180, pstmt=0x2fd2220, isTopLevel=true, setHoldSnapshot=true, dest=0x2f9e008, qc=0x7ffea91f9aa0) at pquery.c:1155
#24 0x000000000097ff20 in FillPortalStore (portal=0x2f10180, isTopLevel=true) at pquery.c:1028
#25 0x000000000097f883 in PortalRun (portal=0x2f10180, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2fd2310, altdest=0x2fd2310, qc=0x7ffea91f9c60) at pquery.c:760
#26 0x00000000009795d1 in exec_simple_query (
query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);") at postgres.c:1214
#27 0x000000000097da8d in PostgresMain (dbname=0x2ed8068 "postgres", username=0x2ed8048 "edb") at postgres.c:4497
#28 0x00000000008b9699 in BackendRun (port=0x2ecfd00) at postmaster.c:4560

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Oct 11, 2021 at 11:05 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks for the patch, it applied cleanly and fixed the reported issue. I observed another case where
In case of multi-col list partition on the same column query is not picking partition wise join. Is this expected?

CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT;
INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
ANALYSE plt1;
CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c);
CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN (('0001','0001'),('0002','0002'),('0003','0003'));
CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN (('0004','0004'),('0005','0005'),('0006','0006'));
CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
ANALYSE plt2;
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;

postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN plt2 t2 ON t1.c = t2.c;
QUERY PLAN
--------------------------------------------
Hash Join
Hash Cond: ((t1.c)::text = (t2.c)::text)
-> Append
-> Seq Scan on plt1_p1 t1_1
-> Seq Scan on plt1_p2 t1_2
-> Seq Scan on plt1_p3 t1_3
-> Hash
-> Append
-> Seq Scan on plt2_p1 t2_1
-> Seq Scan on plt2_p2 t2_2
-> Seq Scan on plt2_p3 t2_3
(11 rows)

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Oct 7, 2021 at 6:03 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
Thanks Rajkumar for testing.

> I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.

I also agree that it should throw an error in the above case. Fixed the issue in the attached patch. Also added related test cases to the regression test suite.

> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Now throwing errors in the initial stage, this case doesn't arise.

Please share if you find any other issues.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks Nitin,

v4 patches applied cleanly and make check is passing now. While testing further I observed that if multiple values are given for a single
column list partition it is not giving error instead it is changing values itself. Please find the example below.

postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
CREATE TABLE
postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN (('0001','0001'),('0002','0002'));
CREATE TABLE
postgres=# \d+ plt1;
Partitioned table "public.plt1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | character varying | | | | extended | | |
Partition key: LIST (b)
Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')

I think it should throw an error as the partition by list has only 1 column but we are giving 2 values.
also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’ instead of ('0001','0001').

Thanks & Regards,
Rajkumar Raghuwanshi

On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures. As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves. Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.

> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.

> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.

> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.

> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.

> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.

> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.

> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.

> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))

Fixed.

> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);

This looks good. I have changed in the patch.

> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.

> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.

> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.

> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.

> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])

Fixed.

> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.

Thanks & Regards,
Nitin Jadhav

On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
> at assert.c:69
> #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
> parts2=0x7fffd67751a8) at joinrels.c:1644
> #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
> at joinrels.c:1402
> #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
> at joinrels.c:926
> #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres@gmail.com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > + int i = 0;
>> > + int j = 0;
>> > +
>> > + if (!boundinfo->isnulls)
>> > + return -1;
>> >
>> > - if (!val->constisnull)
>> > - count++;
>> > + for (i = 0; i < boundinfo->ndatums; i++)
>> > + {
>> > + //TODO: Handle for multi-column cases
>> > + for (j = 0; j < 1; j++)
>> > + {
>> > + if (boundinfo->isnulls[i][j])
>> > + return boundinfo->indexes[i];
>> > }
>> > }
>> >
>> > + return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>> >
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> - PartitionListValue *all_values;
>> + PartitionListValue **all_values;
>> ...
>> - all_values = (PartitionListValue *)
>> - palloc(ndatums * sizeof(PartitionListValue));
>> + ndatums = get_list_datum_count(boundspecs, nparts);
>> + all_values = (PartitionListValue **)
>> + palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type. No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> + all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> qsort_partition_list_value_cmp, (void *) key);
>>
>> + int orig_index = all_values[i]->index;
>> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds. I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds(). I
>> don't think that this patch needs to throw away that work. You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> if (b1->nindexes != b2->nindexes)
>> return false;
>>
>> - if (b1->null_index != b2->null_index)
>> + if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> * context. datumIsEqual() should be simple enough to be
>> * safe.
>> */
>> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>> +
>> + /*
>> + * If any of the partition bound has NULL value, then check
>> + * equality for the NULL value instead of comparing the datums
>> + * as it does not contain valid value in case of NULL.
>> + */
>> + if (b1_isnull || b2_isnull)
>> + {
>> + if (b1_isnull != b2_isnull)
>> + return false;
>> + }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> + !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>> (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> + Datum *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> + bool *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>> Datum values[PARTITION_MAX_KEYS];
>> bool isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea. I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex. Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases. Changes included in
>> the delta patch. The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would. Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column. However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly. I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking. In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Attachment

v6-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Zhihong Yu

Date:

22 October 2021, 10:50:33

On Fri, Oct 22, 2021 at 2:48 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

> While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find test case & stack-trace below.

Thanks for sharing. I have fixed the issue in the attached patch.

Thanks & Regards,
Nitin Jadhav

Hi,

+isListBoundDuplicated(List *list_bounds, List *new_bound)

+ Const *value1 = castNode(Const, list_nth(elem, i));
+ Const *value2 = castNode(Const, list_nth(new_bound, i));

Should the upper bound for index i take into account the length of new_bound ?

If the length of new_bound is always the same as that for elem, please add an assertion.

For transformPartitionListBounds():

+ deparse_expression((Node *) list_nth(partexprs, j),
+ deparse_context_for(RelationGetRelationName(parent),
+ RelationGetRelid(parent)),

Please consider calling RelationGetRelationName(parent) and RelationGetRelid(parent) (and assigning to local variables) outside the loop.

+get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)

get_list_datum_count -> get_list_datums_count

For partition_bounds_equal():

+ if (b1->isnulls)
+ b1_isnull = b1->isnulls[i][j];
+ if (b2->isnulls)
+ b2_isnull = b2->isnulls[i][j];

Should the initialization of b1_isnull and b2_isnull be done inside the loop (so that they don't inherit value from previous iteration) ?

Cheers

Re: Multi-Column List Partitioning

From

Zhihong Yu

Date:

22 October 2021, 16:01:35

On Fri, Oct 22, 2021 at 3:50 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Fri, Oct 22, 2021 at 2:48 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find test case & stack-trace below.

Thanks for sharing. I have fixed the issue in the attached patch.

Thanks & Regards,
Nitin Jadhav

Hi,

+isListBoundDuplicated(List *list_bounds, List *new_bound)

+ Const *value1 = castNode(Const, list_nth(elem, i));
+ Const *value2 = castNode(Const, list_nth(new_bound, i));

Should the upper bound for index i take into account the length of new_bound ?
If the length of new_bound is always the same as that for elem, please add an assertion.

For transformPartitionListBounds():
+ deparse_expression((Node *) list_nth(partexprs, j),
+ deparse_context_for(RelationGetRelationName(parent),
+ RelationGetRelid(parent)),

Please consider calling RelationGetRelationName(parent) and RelationGetRelid(parent) (and assigning to local variables) outside the loop.

+get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)

get_list_datum_count -> get_list_datums_count

For partition_bounds_equal():

+ if (b1->isnulls)
+ b1_isnull = b1->isnulls[i][j];
+ if (b2->isnulls)
+ b2_isnull = b2->isnulls[i][j];

Should the initialization of b1_isnull and b2_isnull be done inside the loop (so that they don't inherit value from previous iteration) ?

Cheers

Hi,

Continuing review.

+ * For the multi-column case, we must make an BoolExpr that

an BoolExpr -> a BoolExpr

In get_qual_for_list(), it would be better if repetitive code can be extracted into a helper method:

+ if (val->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ key_is_null[j] = true;
+
+ nulltest->arg = keyCol[j];
+ nulltest->nulltesttype = IS_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+
+ if (key->partnatts > 1)
+ and_args = lappend(and_args, nulltest);
+ else
+ is_null_test = (Expr *) nulltest;
+ }
+ else
+ {
+ if (key->partnatts > 1)
+ {
+ Expr *opexpr =
+ make_partition_op_expr(key, j,
+ BTEqualStrategyNumber,
+ keyCol[j],
+ (Expr *) val);
+ and_args = lappend(and_args, opexpr);
+ }
+ else
+ datum_elem = (Expr *) val;
+ }

For match_clause_to_partition_key():

+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ {
+ *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+ else

Since the if block ends with return, the 'else' is not needed - else block can be indented to the left.

get_min_and_max_off(): I think get_min_and_max_offset as method name would be more informative.

+ Assert(0 == partition_lbound_datum_cmp(partsupfunc, partcollation,
+ boundinfo->datums[off],
+ boundinfo->isnulls[off],
+ values, isnulls, nvalues));

If the 'while (off >= 1)' loop exits without modifying off, is the above assertion always true (can boundinfo->datums[off] be accessed without checking bound) ?

Cheers

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

29 October 2021, 08:10:04

Hi Nitin,

On Fri, Oct 22, 2021 at 6:48 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Thanks for sharing. I have fixed the issue in the attached patch.

I noticed that there's no commitfest entry for this.  Will you please
add this to the next one?

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

02 November 2021, 13:33:37

Thanks for reviewing.

> +isListBoundDuplicated(List *list_bounds, List *new_bound)
>
> +           Const   *value1 = castNode(Const, list_nth(elem, i));
> +           Const   *value2 = castNode(Const, list_nth(new_bound, i));
>
> Should the upper bound for index i take into account the length of new_bound ?
> If the length of new_bound is always the same as that for elem, please add an assertion.

The length of 'elem' should be same as length of 'new_bound'. Added
assert statement for the same.


> For transformPartitionListBounds():
> +               deparse_expression((Node *) list_nth(partexprs, j),
> +                                  deparse_context_for(RelationGetRelationName(parent),
> +                                                      RelationGetRelid(parent)),
>
> Please consider calling RelationGetRelationName(parent) and RelationGetRelid(parent) (and assigning to local
variables)outside the loop.
 

I don't think this is an issue as 'RelationGetRelationName' and
'RelationGetRelid' are macros. Please let me know if your opinion is
different.


> +get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)
>
> get_list_datum_count -> get_list_datums_count

There was a function earlier with the name
'get_non_null_list_datum_count()'. So now this has changed to
'get_list_datum_count()' as we are not separating the non null datums
from the list. The new name is inline with the old function name which
was already accepted by the community. So I feel it is better to not
change.


> For partition_bounds_equal():
>
> +               if (b1->isnulls)
> +                   b1_isnull = b1->isnulls[i][j];
> +               if (b2->isnulls)
> +                   b2_isnull = b2->isnulls[i][j];
>
> Should the initialization of b1_isnull and b2_isnull be done inside the loop (so that they don't inherit value from
previousiteration) ?
 

Nice catch. Fixed.


> In get_qual_for_list(), it would be better if repetitive code can be extracted into a helper method:

I have removed the repetitive code and made a common function named
'get_qual_for_list_datums()'.


> For match_clause_to_partition_key():
>
> +       if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> +       {
> +           *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> +           return PARTCLAUSE_MATCH_NULLNESS;
> +       }
> +       else
>
> Since the if block ends with return, the 'else' is not needed - else block can be indented to the left.

Fixed.


> get_min_and_max_off(): I think get_min_and_max_offset as method name would be more informative.

Fixed.


> +   Assert(0 == partition_lbound_datum_cmp(partsupfunc, partcollation,
> +                                          boundinfo->datums[off],
> +                                          boundinfo->isnulls[off],
> +                                          values, isnulls, nvalues));
>
> If the 'while (off >= 1)' loop exits without modifying off, is the above assertion always true (can
boundinfo->datums[off]be accessed without checking bound) ?
 

Yes. The assertion holds good even though the control doesn't enter
the loop. In that case the 'off' can be directly considered as minoff
or maxoff. Since we are considering it as valid, the assertion is
needed.

Thanks & Regards,
Nitin Jadhav

On Fri, Oct 22, 2021 at 9:30 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Fri, Oct 22, 2021 at 3:50 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>>
>>
>>
>> On Fri, Oct 22, 2021 at 2:48 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
>>>
>>> > While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find
testcase & stack-trace below.
 
>>>
>>> Thanks for sharing. I have fixed the issue in the attached patch.
>>>
>>> Thanks & Regards,
>>> Nitin Jadhav
>>>
>>>>>>>>
>>>>>>>>
>> Hi,
>>
>> +isListBoundDuplicated(List *list_bounds, List *new_bound)
>>
>> +           Const   *value1 = castNode(Const, list_nth(elem, i));
>> +           Const   *value2 = castNode(Const, list_nth(new_bound, i));
>>
>> Should the upper bound for index i take into account the length of new_bound ?
>> If the length of new_bound is always the same as that for elem, please add an assertion.
>>
>> For transformPartitionListBounds():
>> +               deparse_expression((Node *) list_nth(partexprs, j),
>> +                                  deparse_context_for(RelationGetRelationName(parent),
>> +                                                      RelationGetRelid(parent)),
>>
>> Please consider calling RelationGetRelationName(parent) and RelationGetRelid(parent) (and assigning to local
variables)outside the loop.
 
>>
>> +get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)
>>
>> get_list_datum_count -> get_list_datums_count
>>
>> For partition_bounds_equal():
>>
>> +               if (b1->isnulls)
>> +                   b1_isnull = b1->isnulls[i][j];
>> +               if (b2->isnulls)
>> +                   b2_isnull = b2->isnulls[i][j];
>>
>> Should the initialization of b1_isnull and b2_isnull be done inside the loop (so that they don't inherit value from
previousiteration) ?
 
>>
>> Cheers
>
>
> Hi,
> Continuing review.
>
> +            * For the multi-column case, we must make an BoolExpr that
>
> an BoolExpr -> a BoolExpr
>
> In get_qual_for_list(), it would be better if repetitive code can be extracted into a helper method:
>
> +               if (val->constisnull)
> +               {
> +                   NullTest   *nulltest = makeNode(NullTest);
> +
> +                   key_is_null[j] = true;
> +
> +                   nulltest->arg = keyCol[j];
> +                   nulltest->nulltesttype = IS_NULL;
> +                   nulltest->argisrow = false;
> +                   nulltest->location = -1;
> +
> +                   if (key->partnatts > 1)
> +                       and_args = lappend(and_args, nulltest);
> +                   else
> +                       is_null_test = (Expr *) nulltest;
> +               }
> +               else
> +               {
> +                   if (key->partnatts > 1)
> +                   {
> +                       Expr *opexpr =
> +                           make_partition_op_expr(key, j,
> +                                                  BTEqualStrategyNumber,
> +                                                  keyCol[j],
> +                                                  (Expr *) val);
> +                       and_args = lappend(and_args, opexpr);
> +                   }
> +                   else
> +                       datum_elem = (Expr *) val;
> +               }
>
> For match_clause_to_partition_key():
>
> +       if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> +       {
> +           *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> +           return PARTCLAUSE_MATCH_NULLNESS;
> +       }
> +       else
>
> Since the if block ends with return, the 'else' is not needed - else block can be indented to the left.
>
> get_min_and_max_off(): I think get_min_and_max_offset as method name would be more informative.
>
> +   Assert(0 == partition_lbound_datum_cmp(partsupfunc, partcollation,
> +                                          boundinfo->datums[off],
> +                                          boundinfo->isnulls[off],
> +                                          values, isnulls, nvalues));
>
> If the 'while (off >= 1)' loop exits without modifying off, is the above assertion always true (can
boundinfo->datums[off]be accessed without checking bound) ?
 
>
> Cheers

Attachment

v7-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

02 November 2021, 13:35:48

> I noticed that there's no commitfest entry for this.  Will you please
> add this to the next one?

I have added it to Nov commitfest.

Thanks & Regards,
Nitin Jadhav

On Fri, Oct 29, 2021 at 1:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Nitin,
>
> On Fri, Oct 22, 2021 at 6:48 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Thanks for sharing. I have fixed the issue in the attached patch.
>
> I noticed that there's no commitfest entry for this.  Will you please
> add this to the next one?
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

03 December 2021, 13:55:31

Hi,

Few comments for v7 patch, note that I haven't been through the
previous discussion, if any of the review comments that has been
already discussed & overridden, then please ignore here too:


partbounds.c: In function ‘get_qual_for_list.isra.18’:
partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
         datumCopy(bound_info->datums[i][j],
                   ~~~~~~~~~~^~~~~~~~
partbounds.c:4335:21: note: ‘boundinfo’ was declared here
  PartitionBoundInfo boundinfo;
                     ^~~~~~~~~
partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
   bool    *inner_isnull;
            ^~~~~~~~~~~~
partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
   bool    *outer_isnull;
           ^~~~~~~~~~~~

Got these warnings with gcc -O2 compilation.
----

 /*
+ * isListBoundDuplicated
+ *
+ * Returns TRUE if the list bound element 'new_bound' is already present
+ * in the target list 'list_bounds', FALSE otherwise.
+ */
+static bool
+isListBoundDuplicated(List *list_bounds, List *new_bound)
+{
+ ListCell   *cell = NULL;
+
+ foreach(cell, list_bounds)
+ {
+ int i;
+ List   *elem = lfirst(cell);
+ bool isDuplicate = true;
+
+ Assert(list_length(elem) == list_length(new_bound));
+
+ for (i = 0; i < list_length(elem); i++)
+ {
+ Const   *value1 = castNode(Const, list_nth(elem, i));
+ Const   *value2 = castNode(Const, list_nth(new_bound, i));
+
+ if (!equal(value1, value2))
+ {
+ isDuplicate = false;
+ break;
+ }
+ }
+
+ if (isDuplicate)
+ return true;
+ }
+
+ return false;
+}

This function is unnecessarily complicated, I think you can avoid
inner for loops; simply replace for-loop-block with  "if
(equal(lfirst(cell), new_bound)) return true".
----

+ char   **colname = (char **) palloc0(partnatts * sizeof(char *));
+ Oid    *coltype = palloc0(partnatts * sizeof(Oid));
+ int32    *coltypmod = palloc0(partnatts * sizeof(int));
+ Oid    *partcollation = palloc0(partnatts * sizeof(Oid));
+
This allocation seems to be worthless, read ahead.
----

+ for (i = 0; i < partnatts; i++)
+ {
+ if (key->partattrs[i] != 0)
+ colname[i] = get_attname(RelationGetRelid(parent),
+ key->partattrs[i], false);
+ else
+ {
+ colname[i] =
+ deparse_expression((Node *) list_nth(partexprs, j),
+    deparse_context_for(RelationGetRelationName(parent),
+    RelationGetRelid(parent)),
+    false, false);
+ ++j;
+ }
+
+ coltype[i] = get_partition_col_typid(key, i);
+ coltypmod[i] = get_partition_col_typmod(key, i);
+ partcollation[i] = get_partition_col_collation(key, i);
+ }

I think there is no need for this separate loop inside
transformPartitionListBounds, you can do that same in the next loop as
well. And instead of  get_partition_col_* calling and storing, simply
use that directly as an argument to transformPartitionBoundValue().
----

+
+ if (IsA(expr, RowExpr) &&
+ partnatts != list_length(((RowExpr *) expr)->args))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("Must specify exactly one value per partitioning column"),
+ parser_errposition(pstate, exprLocation((Node *) spec))));
+

I think this should be inside the "else" block after "!IsA(rowexpr,
RowExpr)" error and you can avoid IsA() check too.
----

-               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+               if (b1->isnulls)
+                   b1_isnull = b1->isnulls[i][j];
+               if (b2->isnulls)
+                   b2_isnull = b2->isnulls[i][j];
+
+               /*
+                * If any of the partition bound has NULL value, then check
+                * equality for the NULL value instead of comparing the datums
+                * as it does not contain valid value in case of NULL.
+                */
+               if (b1_isnull || b2_isnull)
+               {
+                   if (b1_isnull != b2_isnull)
+                       return false;
+               }
+               else if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],

Looks difficult to understand at first glance, how about the following:

if (b1->isnulls != b2->isnulls)
    return false;

if (b1->isnulls)
{
    if (b1->isnulls[i][j] != b2->isnulls[i][j])
        return false;
    if (b1->isnulls[i][j])
        continue;
}

See how range partitioning infinite values are handled. Also, place
this before the comment block that was added for the "!datumIsEqual()"
case.
----

+       if (src->isnulls)
+           dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);
...
+           if (src->isnulls)
+               dest->isnulls[i][j] = src->isnulls[i][j];
+
Nothing wrong with this but if we could have checked "dest->isnulls"
instead of "src->isnulls" would be much better.
----

-           if (dest->kind == NULL ||
-               dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
+           if ((dest->kind == NULL ||
+                dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE) &&
+               (key->strategy != PARTITION_STRATEGY_LIST ||
+                (src->isnulls == NULL || !src->isnulls[i][j])))
                dest->datums[i][j] = datumCopy(src->datums[i][j],
                                               byval, typlen);
Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be unnecessary.
----

+       for (i = 0; i < partnatts; i++)
+       {
+           if (outer_isnull[i])
+           {
+               outer_has_null = true;
+               if (outer_map.merged_indexes[outer_index] == -1)
+                   consider_outer_null = true;
+           }
+       }
+
+       for (i = 0; i < partnatts; i++)
+       {
+           if (inner_isnull[i])
+           {
+               inner_has_null = true;
+               if (inner_map.merged_indexes[inner_index] == -1)
+                   consider_inner_null = true;
+           }
+       }

Can't be a single loop?
----

It would be helpful if you could run pgindent on your patch if not done already.
----

That's all for now, I am yet to finish the complete patch reading and
understand the code flow, but I am out of time now.

Regards,
Amul

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

06 December 2021, 13:58:26

Thank you for reviewing the patch.

> partbounds.c: In function ‘get_qual_for_list.isra.18’:
> partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>         datumCopy(bound_info->datums[i][j],
>                   ~~~~~~~~~~^~~~~~~~
> partbounds.c:4335:21: note: ‘boundinfo’ was declared here
>  PartitionBoundInfo boundinfo;
>                     ^~~~~~~~~
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>   bool    *inner_isnull;
>            ^~~~~~~~~~~~
> partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>   bool    *outer_isnull;
>           ^~~~~~~~~~~~

Fixed.

> This function is unnecessarily complicated, I think you can avoid
> inner for loops; simply replace for-loop-block with  "if
> (equal(lfirst(cell), new_bound)) return true".

Thank you for the suggestion. Fixed.

> + char   **colname = (char **) palloc0(partnatts * sizeof(char *));
> + Oid    *coltype = palloc0(partnatts * sizeof(Oid));
> + int32    *coltypmod = palloc0(partnatts * sizeof(int));
> + Oid    *partcollation = palloc0(partnatts * sizeof(Oid));
> +
> This allocation seems to be worthless, read ahead.
>
> I think there is no need for this separate loop inside
> transformPartitionListBounds, you can do that same in the next loop as
> well. And instead of  get_partition_col_* calling and storing, simply
> use that directly as an argument to transformPartitionBoundValue().

Yes. The loop can be avoided and content of the above loop can be
included in the next loop but the next loop iterates over a list of
multi column datums. For each iteration, we need the information of
all the columns. The above data (colname, coltype, coltypmod and
partcollation) remains same for each iteration of the loop, If we
modify as suggested, then the function to fetch these information has
to be called every-time. To avoid this situation I have made a
separate loop outside which only runs as many number of columns and
stores in a variable which can be reused later. Please let me correct
if I am wrong.

> I think this should be inside the "else" block after "!IsA(rowexpr,
> RowExpr)" error and you can avoid IsA() check too.

This is required to handle the situation when one partition key is
mentioned and multiple values are provided in the partition bound
specification.

> Looks difficult to understand at first glance, how about the following:
>
> if (b1->isnulls != b2->isnulls)
>    return false;
>
> if (b1->isnulls)
> {
>    if (b1->isnulls[i][j] != b2->isnulls[i][j])
>        return false;
>    if (b1->isnulls[i][j])
>        continue;
> }
>
> See how range partitioning infinite values are handled. Also, place
> this before the comment block that was added for the "!datumIsEqual()"
> case.

Fixed. I feel the 'continue' block is not required and hence removed it.

> Nothing wrong with this but if we could have checked "dest->isnulls"
> instead of "src->isnulls" would be much better.

Here we are copying the data from 'src' to 'dest'. If there is no data
in 'src', it is unnecessary to copy. Hence checking 'src'.

> Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be unnecessary.

Fixed.

> Can't be a single loop?

Yes. Fixed.



On Fri, Dec 3, 2021 at 7:26 PM Amul Sul <sulamul@gmail.com> wrote:
>
> Hi,
>
> Few comments for v7 patch, note that I haven't been through the
> previous discussion, if any of the review comments that has been
> already discussed & overridden, then please ignore here too:
>
>
> partbounds.c: In function ‘get_qual_for_list.isra.18’:
> partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>          datumCopy(bound_info->datums[i][j],
>                    ~~~~~~~~~~^~~~~~~~
> partbounds.c:4335:21: note: ‘boundinfo’ was declared here
>   PartitionBoundInfo boundinfo;
>                      ^~~~~~~~~
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>    bool    *inner_isnull;
>             ^~~~~~~~~~~~
> partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>    bool    *outer_isnull;
>            ^~~~~~~~~~~~
>
> Got these warnings with gcc -O2 compilation.
> ----
>
>  /*
> + * isListBoundDuplicated
> + *
> + * Returns TRUE if the list bound element 'new_bound' is already present
> + * in the target list 'list_bounds', FALSE otherwise.
> + */
> +static bool
> +isListBoundDuplicated(List *list_bounds, List *new_bound)
> +{
> + ListCell   *cell = NULL;
> +
> + foreach(cell, list_bounds)
> + {
> + int i;
> + List   *elem = lfirst(cell);
> + bool isDuplicate = true;
> +
> + Assert(list_length(elem) == list_length(new_bound));
> +
> + for (i = 0; i < list_length(elem); i++)
> + {
> + Const   *value1 = castNode(Const, list_nth(elem, i));
> + Const   *value2 = castNode(Const, list_nth(new_bound, i));
> +
> + if (!equal(value1, value2))
> + {
> + isDuplicate = false;
> + break;
> + }
> + }
> +
> + if (isDuplicate)
> + return true;
> + }
> +
> + return false;
> +}
>
> This function is unnecessarily complicated, I think you can avoid
> inner for loops; simply replace for-loop-block with  "if
> (equal(lfirst(cell), new_bound)) return true".
> ----
>
> + char   **colname = (char **) palloc0(partnatts * sizeof(char *));
> + Oid    *coltype = palloc0(partnatts * sizeof(Oid));
> + int32    *coltypmod = palloc0(partnatts * sizeof(int));
> + Oid    *partcollation = palloc0(partnatts * sizeof(Oid));
> +
> This allocation seems to be worthless, read ahead.
> ----
>
> + for (i = 0; i < partnatts; i++)
> + {
> + if (key->partattrs[i] != 0)
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
> + else
> + {
> + colname[i] =
> + deparse_expression((Node *) list_nth(partexprs, j),
> +    deparse_context_for(RelationGetRelationName(parent),
> +    RelationGetRelid(parent)),
> +    false, false);
> + ++j;
> + }
> +
> + coltype[i] = get_partition_col_typid(key, i);
> + coltypmod[i] = get_partition_col_typmod(key, i);
> + partcollation[i] = get_partition_col_collation(key, i);
> + }
>
> I think there is no need for this separate loop inside
> transformPartitionListBounds, you can do that same in the next loop as
> well. And instead of  get_partition_col_* calling and storing, simply
> use that directly as an argument to transformPartitionBoundValue().
> ----
>
> +
> + if (IsA(expr, RowExpr) &&
> + partnatts != list_length(((RowExpr *) expr)->args))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Must specify exactly one value per partitioning column"),
> + parser_errposition(pstate, exprLocation((Node *) spec))));
> +
>
> I think this should be inside the "else" block after "!IsA(rowexpr,
> RowExpr)" error and you can avoid IsA() check too.
> ----
>
> -               if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> +               if (b1->isnulls)
> +                   b1_isnull = b1->isnulls[i][j];
> +               if (b2->isnulls)
> +                   b2_isnull = b2->isnulls[i][j];
> +
> +               /*
> +                * If any of the partition bound has NULL value, then check
> +                * equality for the NULL value instead of comparing the datums
> +                * as it does not contain valid value in case of NULL.
> +                */
> +               if (b1_isnull || b2_isnull)
> +               {
> +                   if (b1_isnull != b2_isnull)
> +                       return false;
> +               }
> +               else if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>
> Looks difficult to understand at first glance, how about the following:
>
> if (b1->isnulls != b2->isnulls)
>     return false;
>
> if (b1->isnulls)
> {
>     if (b1->isnulls[i][j] != b2->isnulls[i][j])
>         return false;
>     if (b1->isnulls[i][j])
>         continue;
> }
>
> See how range partitioning infinite values are handled. Also, place
> this before the comment block that was added for the "!datumIsEqual()"
> case.
> ----
>
> +       if (src->isnulls)
> +           dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);
> ...
> +           if (src->isnulls)
> +               dest->isnulls[i][j] = src->isnulls[i][j];
> +
> Nothing wrong with this but if we could have checked "dest->isnulls"
> instead of "src->isnulls" would be much better.
> ----
>
> -           if (dest->kind == NULL ||
> -               dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
> +           if ((dest->kind == NULL ||
> +                dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE) &&
> +               (key->strategy != PARTITION_STRATEGY_LIST ||
> +                (src->isnulls == NULL || !src->isnulls[i][j])))
>                 dest->datums[i][j] = datumCopy(src->datums[i][j],
>                                                byval, typlen);
> Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be unnecessary.
> ----
>
> +       for (i = 0; i < partnatts; i++)
> +       {
> +           if (outer_isnull[i])
> +           {
> +               outer_has_null = true;
> +               if (outer_map.merged_indexes[outer_index] == -1)
> +                   consider_outer_null = true;
> +           }
> +       }
> +
> +       for (i = 0; i < partnatts; i++)
> +       {
> +           if (inner_isnull[i])
> +           {
> +               inner_has_null = true;
> +               if (inner_map.merged_indexes[inner_index] == -1)
> +                   consider_inner_null = true;
> +           }
> +       }
>
> Can't be a single loop?
> ----
>
> It would be helpful if you could run pgindent on your patch if not done already.
> ----
>
> That's all for now, I am yet to finish the complete patch reading and
> understand the code flow, but I am out of time now.
>
> Regards,
> Amul

Attachment

v8-0001-multi-column-list-partitioning.patch

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

06 December 2021, 14:52:04

On Mon, Dec 6, 2021 at 7:27 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> Thank you for reviewing the patch.
>
> > partbounds.c: In function ‘get_qual_for_list.isra.18’:
> > partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> > in this function [-Wmaybe-uninitialized]
> >         datumCopy(bound_info->datums[i][j],
> >                   ~~~~~~~~~~^~~~~~~~
> > partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> >  PartitionBoundInfo boundinfo;
> >                     ^~~~~~~~~
> > partbounds.c: In function ‘partition_bounds_merge’:
> > partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> >   bool    *inner_isnull;
> >            ^~~~~~~~~~~~
> > partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> >   bool    *outer_isnull;
> >           ^~~~~~~~~~~~
>
> Fixed.
>
> > This function is unnecessarily complicated, I think you can avoid
> > inner for loops; simply replace for-loop-block with  "if
> > (equal(lfirst(cell), new_bound)) return true".
>
> Thank you for the suggestion. Fixed.
>
> > + char   **colname = (char **) palloc0(partnatts * sizeof(char *));
> > + Oid    *coltype = palloc0(partnatts * sizeof(Oid));
> > + int32    *coltypmod = palloc0(partnatts * sizeof(int));
> > + Oid    *partcollation = palloc0(partnatts * sizeof(Oid));
> > +
> > This allocation seems to be worthless, read ahead.
> >
> > I think there is no need for this separate loop inside
> > transformPartitionListBounds, you can do that same in the next loop as
> > well. And instead of  get_partition_col_* calling and storing, simply
> > use that directly as an argument to transformPartitionBoundValue().
>
> Yes. The loop can be avoided and content of the above loop can be
> included in the next loop but the next loop iterates over a list of
> multi column datums. For each iteration, we need the information of
> all the columns. The above data (colname, coltype, coltypmod and
> partcollation) remains same for each iteration of the loop, If we
> modify as suggested, then the function to fetch these information has
> to be called every-time. To avoid this situation I have made a
> separate loop outside which only runs as many number of columns and
> stores in a variable which can be reused later. Please let me correct
> if I am wrong.
>

Ok, colname can be fetched in advance but I don't think it worth it to
fetch coltype, coltypmod & partcollation;  and, store in the
explicitly allocated memory, instead, you can directly call
get_partition_col_* inline functions.

> > I think this should be inside the "else" block after "!IsA(rowexpr,
> > RowExpr)" error and you can avoid IsA() check too.
>
> This is required to handle the situation when one partition key is
> mentioned and multiple values are provided in the partition bound
> specification.
>
> > Looks difficult to understand at first glance, how about the following:
> >
> > if (b1->isnulls != b2->isnulls)
> >    return false;
> >
> > if (b1->isnulls)
> > {
> >    if (b1->isnulls[i][j] != b2->isnulls[i][j])
> >        return false;
> >    if (b1->isnulls[i][j])
> >        continue;
> > }
> >
> > See how range partitioning infinite values are handled. Also, place
> > this before the comment block that was added for the "!datumIsEqual()"
> > case.
>
> Fixed. I feel the 'continue' block is not required and hence removed it.
>
> > Nothing wrong with this but if we could have checked "dest->isnulls"
> > instead of "src->isnulls" would be much better.
>
> Here we are copying the data from 'src' to 'dest'. If there is no data
> in 'src', it is unnecessary to copy. Hence checking 'src'.
>

I am not sure how that makes a difference since you do allocate 'dest'
based on 'src'; anyway, I leave that choice to you.

> > Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be unnecessary.
>
> Fixed.
>
> > Can't be a single loop?
>
> Yes. Fixed.
>

Thanks, will have a look.

Regards,
Amul

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

09 December 2021, 05:54:05

Hi Nitin,

Was looking at warnings generated by v8:

partbounds.c:971:17: warning: unused variable 'b1_isnull' [-Wunused-variable]
                                bool        b1_isnull = false;
                                            ^
partbounds.c:972:17: warning: unused variable 'b2_isnull' [-Wunused-variable]
                                bool        b2_isnull = false;

And it seems they've resulted from the above change:

On Mon, Dec 6, 2021 at 10:57 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > Looks difficult to understand at first glance, how about the following:
> >
> > if (b1->isnulls != b2->isnulls)
> >    return false;

I don't think having this block is correct, because this says that two
PartitionBoundInfos can't be "logically" equal unless their isnulls
pointers are the same, which is not the case unless they are
physically the same PartitionBoundInfo.  What this means for its only
caller compute_partition_bounds() is that it now always needs to
perform partition_bounds_merge() for a pair of list-partitioned
relations, even if they have exactly the same bounds.

So, I'd suggest removing the block.

> > if (b1->isnulls)
> > {
> >    if (b1->isnulls[i][j] != b2->isnulls[i][j])
> >        return false;
> >    if (b1->isnulls[i][j])
> >        continue;
> > }
> >
> > See how range partitioning infinite values are handled. Also, place
> > this before the comment block that was added for the "!datumIsEqual()"
> > case.
>
> Fixed. I feel the 'continue' block is not required and hence removed it.

Actually, you should've kept the continue block as Amul suggested and
remove the "else" from the following:

                /* < the long comment snipped >*/
                else if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
                                  parttypbyval[j], parttyplen[j]))
                    return false;

because with this, list bounds will never be passed to datumIsEqual()
for comparison, even if both are non-NULL.

IOW, the block of code should look as follows, including the comments:

                /*
                 * If the bound datums can be NULL, check that the datums on
                 * both sides are either both NULL or not NULL.
                 */
                if (b1->isnulls)
                {
                    if (b1->isnulls[i][j] != b2->isnulls[i][j])
                        return false;

                    /* Must not pass NULL datums to datumIsEqual(). */
                    if (b1->isnulls[i][j])
                        continue;
                }

                /* < the long comment snipped >*/
                if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
                                  parttypbyval[j], parttyplen[j]))
                    return false;

Also, please remove the declarations of b1_isnull and b2_isnull to get
rid of the warnings.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

09 December 2021, 05:56:04

On Thu, Dec 9, 2021 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi Nitin,
>
> Was looking at warnings generated by v8:
>
> partbounds.c:971:17: warning: unused variable 'b1_isnull' [-Wunused-variable]
>                                 bool        b1_isnull = false;
>                                             ^
> partbounds.c:972:17: warning: unused variable 'b2_isnull' [-Wunused-variable]
>                                 bool        b2_isnull = false;
>
> And it seems they've resulted from the above change:

I meant: "below change:".

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

09 December 2021, 06:11:19

On Thu, Dec 9, 2021 at 11:24 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
[....]
> On Mon, Dec 6, 2021 at 10:57 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > > Looks difficult to understand at first glance, how about the following:
> > >
> > > if (b1->isnulls != b2->isnulls)
> > >    return false;
>
> I don't think having this block is correct, because this says that two
> PartitionBoundInfos can't be "logically" equal unless their isnulls
> pointers are the same, which is not the case unless they are
> physically the same PartitionBoundInfo.  What this means for its only
> caller compute_partition_bounds() is that it now always needs to
> perform partition_bounds_merge() for a pair of list-partitioned
> relations, even if they have exactly the same bounds.
>
> So, I'd suggest removing the block.
>

Agreed, I too realized the same; the check is incorrect and have noted
it for the next post. But note that, we need a kind of check here otherwise,
how could two bounds be equal if one has nulls and the other doesn't.
Also, we would have a segmentation fault in the next block while
accessing b2->isnulls if that is null.

I would suggest check like this:

if ((b1->isnulls == NULL)  != (b2->isnulls == NULL))
return false;

OR

if ((b1->isnulls)  ^ (b2->isnulls))
return false;

Regards,
Amul

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

09 December 2021, 06:32:56

On Thu, Dec 9, 2021 at 3:12 PM Amul Sul <sulamul@gmail.com> wrote:
> On Thu, Dec 9, 2021 at 11:24 AM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> [....]
> > On Mon, Dec 6, 2021 at 10:57 PM Nitin Jadhav
> > <nitinjadhavpostgres@gmail.com> wrote:
> > > > Looks difficult to understand at first glance, how about the following:
> > > >
> > > > if (b1->isnulls != b2->isnulls)
> > > >    return false;
> >
> > I don't think having this block is correct, because this says that two
> > PartitionBoundInfos can't be "logically" equal unless their isnulls
> > pointers are the same, which is not the case unless they are
> > physically the same PartitionBoundInfo.  What this means for its only
> > caller compute_partition_bounds() is that it now always needs to
> > perform partition_bounds_merge() for a pair of list-partitioned
> > relations, even if they have exactly the same bounds.
> >
> > So, I'd suggest removing the block.
> >
>
> Agreed, I too realized the same; the check is incorrect and have noted
> it for the next post. But note that, we need a kind of check here otherwise,
> how could two bounds be equal if one has nulls and the other doesn't.

We check partition strategy at the top and that ensures that isnulls
fields should either be both NULL or not, same as the block above that
checks 'kind'.  Maybe adding an Assert inside the block makes sense,
like this:

                /*
                 * If the bound datums can be NULL, check that the datums on
                 * both sides are either both NULL or not NULL.
                 */
                if (b1->isnulls != NULL)
                {
                    /*
                     * Both bound collections have the same partition strategy,
                     * so the other side must allow NULL datums as well.
                     */
                    Assert(b2->isnulls != NULL);

                    if (b1->isnulls[i][j] != b2->isnulls[i][j])
                        return false;

                    /* Must not pass NULL datums to datumIsEqual(). */
                    if (b1->isnulls[i][j])
                        continue;
                }

                /* < the long comment snipped >*/
                if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
                                  parttypbyval[j], parttyplen[j]))
                    return false;

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

09 December 2021, 07:13:19

On Thu, Dec 9, 2021 at 12:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 3:12 PM Amul Sul <sulamul@gmail.com> wrote:
> > On Thu, Dec 9, 2021 at 11:24 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > [....]
> > > On Mon, Dec 6, 2021 at 10:57 PM Nitin Jadhav
> > > <nitinjadhavpostgres@gmail.com> wrote:
> > > > > Looks difficult to understand at first glance, how about the following:
> > > > >
> > > > > if (b1->isnulls != b2->isnulls)
> > > > >    return false;
> > >
> > > I don't think having this block is correct, because this says that two
> > > PartitionBoundInfos can't be "logically" equal unless their isnulls
> > > pointers are the same, which is not the case unless they are
> > > physically the same PartitionBoundInfo.  What this means for its only
> > > caller compute_partition_bounds() is that it now always needs to
> > > perform partition_bounds_merge() for a pair of list-partitioned
> > > relations, even if they have exactly the same bounds.
> > >
> > > So, I'd suggest removing the block.
> > >
> >
> > Agreed, I too realized the same; the check is incorrect and have noted
> > it for the next post. But note that, we need a kind of check here otherwise,
> > how could two bounds be equal if one has nulls and the other doesn't.
>
> We check partition strategy at the top and that ensures that isnulls
> fields should either be both NULL or not, same as the block above that
> checks 'kind'.  Maybe adding an Assert inside the block makes sense,
> like this:
>
>                 /*
>                  * If the bound datums can be NULL, check that the datums on
>                  * both sides are either both NULL or not NULL.
>                  */
>                 if (b1->isnulls != NULL)
>                 {
>                     /*
>                      * Both bound collections have the same partition strategy,
>                      * so the other side must allow NULL datums as well.
>                      */
>                     Assert(b2->isnulls != NULL);
>

Make sense, thanks!

Regards,
Amul

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

09 December 2021, 13:03:38

On Thu, Dec 9, 2021 at 12:43 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 12:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > On Thu, Dec 9, 2021 at 3:12 PM Amul Sul <sulamul@gmail.com> wrote:
> > > On Thu, Dec 9, 2021 at 11:24 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > >
> > > [....]
> > > > On Mon, Dec 6, 2021 at 10:57 PM Nitin Jadhav
> > > > <nitinjadhavpostgres@gmail.com> wrote:
> > > > > > Looks difficult to understand at first glance, how about the following:
> > > > > >
> > > > > > if (b1->isnulls != b2->isnulls)
> > > > > >    return false;
> > > >
> > > > I don't think having this block is correct, because this says that two
> > > > PartitionBoundInfos can't be "logically" equal unless their isnulls
> > > > pointers are the same, which is not the case unless they are
> > > > physically the same PartitionBoundInfo.  What this means for its only
> > > > caller compute_partition_bounds() is that it now always needs to
> > > > perform partition_bounds_merge() for a pair of list-partitioned
> > > > relations, even if they have exactly the same bounds.
> > > >
> > > > So, I'd suggest removing the block.
> > > >
> > >
> > > Agreed, I too realized the same; the check is incorrect and have noted
> > > it for the next post. But note that, we need a kind of check here otherwise,
> > > how could two bounds be equal if one has nulls and the other doesn't.
> >
> > We check partition strategy at the top and that ensures that isnulls
> > fields should either be both NULL or not, same as the block above that
> > checks 'kind'.  Maybe adding an Assert inside the block makes sense,
> > like this:
> >
> >                 /*
> >                  * If the bound datums can be NULL, check that the datums on
> >                  * both sides are either both NULL or not NULL.
> >                  */
> >                 if (b1->isnulls != NULL)
> >                 {
> >                     /*
> >                      * Both bound collections have the same partition strategy,
> >                      * so the other side must allow NULL datums as well.
> >                      */
> >                     Assert(b2->isnulls != NULL);
> >
>
> Make sense, thanks!
>

In addition to Amit's suggestions, here are a few more:

+   char          **colname = (char **) palloc0(partnatts * sizeof(char *));
+   Oid            *coltype = palloc0(partnatts * sizeof(Oid));
+   int32          *coltypmod = palloc0(partnatts * sizeof(int));
+   Oid            *partcollation = palloc0(partnatts * sizeof(Oid));
+

None of them really needed to be palloc0; also, as described
previously you can avoid the last three by using get_partition_col_*
directly.
---

+           i = 0;
+           foreach(cell2, rowexpr->args)
+           {

It's up to you, rather than using a separate index variable and
incrementing that at the end, I think we can use
foreach_current_index(cell2) which would look much nicer.
---

+           all_values[j].values = (Datum *) palloc0(key->partnatts *
sizeof(Datum));
+           all_values[j].isnulls = (bool *) palloc0(key->partnatts *
sizeof(bool));
+           all_values[j].index = i;

palloc0 is unnecessary for the "values".
---

        dest->datums[i] = &boundDatums[i * natts];
+       if (src->isnulls)
+           dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);

I think you can allocate memory for isnulls the same way you do
allocate boundDatums and just do the memcpy.
---

+       for (i = 0; i < partnatts; i++)
+       {
+           if (outer_isnull && outer_isnull[i])
+           {
+               outer_has_null = true;
+               if (outer_map.merged_indexes[outer_index] == -1)
+                   consider_outer_null = true;
+           }

I am wondering why you are not breaking the loop once you set
consider_outer_null?
Note that if you do that then you need a separate loop for the
inner_isnull part.
---

@@ -1351,14 +1431,30 @@ merge_list_bounds(FmgrInfo *partsupfunc, Oid
*partcollation,
            /* A list value missing from the inner side. */
            Assert(outer_pos < outer_bi->ndatums);

-           /*
-            * If the inner side has the default partition, or this is an
-            * outer join, try to assign a merged partition to the outer
-            * partition (see process_outer_partition()).  Otherwise, the
-            * outer partition will not contribute to the result.
-            */
-           if (inner_has_default || IS_OUTER_JOIN(jointype))
+           if (outer_has_null || inner_has_null)
            {
+               if (consider_outer_null || consider_inner_null)
+               {
+                   /* Merge the NULL partitions. */
+                   merged_index = merge_null_partitions(&outer_map, &inner_map,
+                                                        consider_outer_null,
+                                                        consider_inner_null,
+                                                        outer_index,
inner_index,
+                                                        jointype, &next_index);
+

I have doubts about the condition that allows reaching
merge_null_partitions() but I am not sure I am correct. I think if the
list values missing from the __inner side__ then we might need to
check only "inner_has_null" & "consider_inner_null" and merge the
same, but why is this code also checking "outer_has_null" &
"consider_outer_null". Correct me if I am missing something.
---

+           if (isnulls && isnulls[i])
+               cmpval = 0;     /* NULL "=" NULL */
+           else
+               cmpval = 1;     /* NULL ">" not-NULL */
+       }
+       else if (isnulls && isnulls[i])
+           cmpval = -1;        /* not-NULL "<" NULL */

I really doubt this assumption is correct; aren't those strict operators?
---

+get_list_partbound_value_string(List *bound_value)
+{
+   StringInfo      buf = makeStringInfo();
+   StringInfo      boundconstraint = makeStringInfo();

boundconstraint should be declared inside "if (ncols > 1)" block.
---

+   foreach(cell, bound_value)
+   {
+       Const      *val = castNode(Const, lfirst(cell));
+
+       appendStringInfoString(buf, sep);
+       get_const_expr(val, &context, -1);
+       sep = ", ";
+       ncols++;
+   }

I think no need to increment ncols every time, you have a list and you
can get that. Also, I think since you have ncols already, you can
prepend and append parenthesis before and after so that you can avoid
extra StringInfo.
---

 typedef struct PartitionBoundInfoData
 {
    char        strategy;       /* hash, list or range? */
+   int         partnatts;      /* number of partition key columns */
    int         ndatums;        /* Length of the datums[] array */
    Datum     **datums;
+   bool      **isnulls;

Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
added that for partition_bound_accepts_nulls(), but we can easily get
that value from the partitioning key & pass an additional argument.
Also, no information about the length of the "isnulls" array.
---

I think it would be helpful if you could split the patch: one for
multi-value list partitioning and another for the partition wise join, thanks.

Regards,
Amul

Re: Multi-Column List Partitioning

From

Ashutosh Sharma

Date:

13 December 2021, 14:36:52

Hi,

Is this okay?

postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
CREATE TABLE

postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4, 5, 6));
CREATE TABLE

postgres=# \d t1
Partitioned table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
Partition key: LIST (a, a, a)
Number of partitions: 1 (Use \d+ to list them.)

--

Also, getting some compiler warnings when building the source. please check.

--

With Regards,

Ashutosh Sharma.

On Mon, Dec 6, 2021 at 7:27 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:

Thank you for reviewing the patch.

> partbounds.c: In function ‘get_qual_for_list.isra.18’:
> partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
> datumCopy(bound_info->datums[i][j],
> ~~~~~~~~~~^~~~~~~~
> partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> PartitionBoundInfo boundinfo;
> ^~~~~~~~~
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> bool *inner_isnull;
> ^~~~~~~~~~~~
> partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> bool *outer_isnull;
> ^~~~~~~~~~~~

Fixed.

> This function is unnecessarily complicated, I think you can avoid
> inner for loops; simply replace for-loop-block with "if
> (equal(lfirst(cell), new_bound)) return true".

Thank you for the suggestion. Fixed.

> + char **colname = (char **) palloc0(partnatts * sizeof(char *));
> + Oid *coltype = palloc0(partnatts * sizeof(Oid));
> + int32 *coltypmod = palloc0(partnatts * sizeof(int));
> + Oid *partcollation = palloc0(partnatts * sizeof(Oid));
> +
> This allocation seems to be worthless, read ahead.
>
> I think there is no need for this separate loop inside
> transformPartitionListBounds, you can do that same in the next loop as
> well. And instead of get_partition_col_* calling and storing, simply
> use that directly as an argument to transformPartitionBoundValue().

Yes. The loop can be avoided and content of the above loop can be
included in the next loop but the next loop iterates over a list of
multi column datums. For each iteration, we need the information of
all the columns. The above data (colname, coltype, coltypmod and
partcollation) remains same for each iteration of the loop, If we
modify as suggested, then the function to fetch these information has
to be called every-time. To avoid this situation I have made a
separate loop outside which only runs as many number of columns and
stores in a variable which can be reused later. Please let me correct
if I am wrong.

> I think this should be inside the "else" block after "!IsA(rowexpr,
> RowExpr)" error and you can avoid IsA() check too.

This is required to handle the situation when one partition key is
mentioned and multiple values are provided in the partition bound
specification.

> Looks difficult to understand at first glance, how about the following:
>
> if (b1->isnulls != b2->isnulls)
> return false;
>
> if (b1->isnulls)
> {
> if (b1->isnulls[i][j] != b2->isnulls[i][j])
> return false;
> if (b1->isnulls[i][j])
> continue;
> }
>
> See how range partitioning infinite values are handled. Also, place
> this before the comment block that was added for the "!datumIsEqual()"
> case.

Fixed. I feel the 'continue' block is not required and hence removed it.

> Nothing wrong with this but if we could have checked "dest->isnulls"
> instead of "src->isnulls" would be much better.

Here we are copying the data from 'src' to 'dest'. If there is no data
in 'src', it is unnecessary to copy. Hence checking 'src'.

> Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be unnecessary.

Fixed.

> Can't be a single loop?

Yes. Fixed.

On Fri, Dec 3, 2021 at 7:26 PM Amul Sul <sulamul@gmail.com> wrote:
>
> Hi,
>
> Few comments for v7 patch, note that I haven't been through the
> previous discussion, if any of the review comments that has been
> already discussed & overridden, then please ignore here too:
>
>
> partbounds.c: In function ‘get_qual_for_list.isra.18’:
> partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
> datumCopy(bound_info->datums[i][j],
> ~~~~~~~~~~^~~~~~~~
> partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> PartitionBoundInfo boundinfo;
> ^~~~~~~~~
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> bool *inner_isnull;
> ^~~~~~~~~~~~
> partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> bool *outer_isnull;
> ^~~~~~~~~~~~
>
> Got these warnings with gcc -O2 compilation.
> ----
>
> /*
> + * isListBoundDuplicated
> + *
> + * Returns TRUE if the list bound element 'new_bound' is already present
> + * in the target list 'list_bounds', FALSE otherwise.
> + */
> +static bool
> +isListBoundDuplicated(List *list_bounds, List *new_bound)
> +{
> + ListCell *cell = NULL;
> +
> + foreach(cell, list_bounds)
> + {
> + int i;
> + List *elem = lfirst(cell);
> + bool isDuplicate = true;
> +
> + Assert(list_length(elem) == list_length(new_bound));
> +
> + for (i = 0; i < list_length(elem); i++)
> + {
> + Const *value1 = castNode(Const, list_nth(elem, i));
> + Const *value2 = castNode(Const, list_nth(new_bound, i));
> +
> + if (!equal(value1, value2))
> + {
> + isDuplicate = false;
> + break;
> + }
> + }
> +
> + if (isDuplicate)
> + return true;
> + }
> +
> + return false;
> +}
>
> This function is unnecessarily complicated, I think you can avoid
> inner for loops; simply replace for-loop-block with "if
> (equal(lfirst(cell), new_bound)) return true".
> ----
>
> + char **colname = (char **) palloc0(partnatts * sizeof(char *));
> + Oid *coltype = palloc0(partnatts * sizeof(Oid));
> + int32 *coltypmod = palloc0(partnatts * sizeof(int));
> + Oid *partcollation = palloc0(partnatts * sizeof(Oid));
> +
> This allocation seems to be worthless, read ahead.
> ----
>
> + for (i = 0; i < partnatts; i++)
> + {
> + if (key->partattrs[i] != 0)
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
> + else
> + {
> + colname[i] =
> + deparse_expression((Node *) list_nth(partexprs, j),
> + deparse_context_for(RelationGetRelationName(parent),
> + RelationGetRelid(parent)),
> + false, false);
> + ++j;
> + }
> +
> + coltype[i] = get_partition_col_typid(key, i);
> + coltypmod[i] = get_partition_col_typmod(key, i);
> + partcollation[i] = get_partition_col_collation(key, i);
> + }
>
> I think there is no need for this separate loop inside
> transformPartitionListBounds, you can do that same in the next loop as
> well. And instead of get_partition_col_* calling and storing, simply
> use that directly as an argument to transformPartitionBoundValue().
> ----
>
> +
> + if (IsA(expr, RowExpr) &&
> + partnatts != list_length(((RowExpr *) expr)->args))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Must specify exactly one value per partitioning column"),
> + parser_errposition(pstate, exprLocation((Node *) spec))));
> +
>
> I think this should be inside the "else" block after "!IsA(rowexpr,
> RowExpr)" error and you can avoid IsA() check too.
> ----
>
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
> + else if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>
> Looks difficult to understand at first glance, how about the following:
>
> if (b1->isnulls != b2->isnulls)
> return false;
>
> if (b1->isnulls)
> {
> if (b1->isnulls[i][j] != b2->isnulls[i][j])
> return false;
> if (b1->isnulls[i][j])
> continue;
> }
>
> See how range partitioning infinite values are handled. Also, place
> this before the comment block that was added for the "!datumIsEqual()"
> case.
> ----
>
> + if (src->isnulls)
> + dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);
> ...
> + if (src->isnulls)
> + dest->isnulls[i][j] = src->isnulls[i][j];
> +
> Nothing wrong with this but if we could have checked "dest->isnulls"
> instead of "src->isnulls" would be much better.
> ----
>
> - if (dest->kind == NULL ||
> - dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
> + if ((dest->kind == NULL ||
> + dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE) &&
> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + (src->isnulls == NULL || !src->isnulls[i][j])))
> dest->datums[i][j] = datumCopy(src->datums[i][j],
> byval, typlen);
> Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be unnecessary.
> ----
>
> + for (i = 0; i < partnatts; i++)
> + {
> + if (outer_isnull[i])
> + {
> + outer_has_null = true;
> + if (outer_map.merged_indexes[outer_index] == -1)
> + consider_outer_null = true;
> + }
> + }
> +
> + for (i = 0; i < partnatts; i++)
> + {
> + if (inner_isnull[i])
> + {
> + inner_has_null = true;
> + if (inner_map.merged_indexes[inner_index] == -1)
> + consider_inner_null = true;
> + }
> + }
>
> Can't be a single loop?
> ----
>
> It would be helpful if you could run pgindent on your patch if not done already.
> ----
>
> That's all for now, I am yet to finish the complete patch reading and
> understand the code flow, but I am out of time now.
>
> Regards,
> Amul

Re: Multi-Column List Partitioning

From

Amit Langote

Date:

20 December 2021, 13:34:11

Hi,

On Mon, Dec 13, 2021 at 11:37 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi,
>
> Is this okay?
>
> postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
> CREATE TABLE
>
> postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4, 5, 6));
> CREATE TABLE
>
> postgres=# \d t1
>            Partitioned table "public.t1"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  a      | integer |           |          |
>  b      | integer |           |          |
> Partition key: LIST (a, a, a)
> Number of partitions: 1 (Use \d+ to list them.)

I'd say it's not okay for a user to expect this to work sensibly, and
I don't think it would be worthwhile to write code to point that out
to the user if that is what you were implying.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Multi-Column List Partitioning

From

Ashutosh Sharma

Date:

21 December 2021, 05:46:51

On Mon, Dec 20, 2021 at 7:04 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi,

On Mon, Dec 13, 2021 at 11:37 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi,
>
> Is this okay?
>
> postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
> CREATE TABLE
>
> postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4, 5, 6));
> CREATE TABLE
>
> postgres=# \d t1
> Partitioned table "public.t1"
> Column | Type | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
> a | integer | | |
> b | integer | | |
> Partition key: LIST (a, a, a)
> Number of partitions: 1 (Use \d+ to list them.)

I'd say it's not okay for a user to expect this to work sensibly, and
I don't think it would be worthwhile to write code to point that out
to the user if that is what you were implying.

OK. As you wish.

--

With Regards,

Ashutosh Sharma.

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

23 December 2021, 13:02:44

On Tue, Dec 21, 2021 at 6:34 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> ---
>
> > +           if (isnulls && isnulls[i])
> > +               cmpval = 0;     /* NULL "=" NULL */
> > +           else
> > +               cmpval = 1;     /* NULL ">" not-NULL */
> > +       }
> > +       else if (isnulls && isnulls[i])
> > +           cmpval = -1;        /* not-NULL "<" NULL */
> >
> > I really doubt this assumption is correct; aren't those strict operators?
>
> Now there are possibilities of multiple NULL values. We should have a
> mechanism to sort it when the bound values contain Non NULL and NULL
> values. As per the above logic we put the NULL values at the end.
> Please let me know if I am wrong.

Ok, but I am not sure about the comparison approach, let's see what
others think.

> ---
[...]
>
> > typedef struct PartitionBoundInfoData
> > {
> >    char        strategy;       /* hash, list or range? */
> > +   int         partnatts;      /* number of partition key columns */
> >    int         ndatums;        /* Length of the datums[] array */
> >    Datum     **datums;
> > +   bool      **isnulls;
> >
> > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
> > added that for partition_bound_accepts_nulls(), but we can easily get
> > that value from the partitioning key & pass an additional argument.
> > Also, no information about the length of the "isnulls" array.
>
> This is required during merge_list_bounds(). AFAIK partition key
> information is not available here.
>

You can get that as an argument, see merge_range_bounds().

> > I think it would be helpful if you could split the patch: one for
> > multi-value list partitioning and another for the partition wise join, thanks.
>
> I have split the patch into 2 patches. One is for the multi column
> list partitioning core changes and the other is for partition-wise
> join support. Each patch has its respective test cases in the
> regression suit and regression tests run successfully on each patch.
> Kindly let me know if any other changes are required here.
>

Thanks, for the slit that is much helpful, I have a few comments for
the 0001 patch as follow:

+ char   **colname = (char **) palloc0(partnatts * sizeof(char *));

palloc0 is unnecessary.
---

+ foreach(cell2, rowexpr->args)
+ {
+ int idx = foreach_current_index(cell2);
+ Node    *expr = lfirst(cell2);
+ Const    *val =
+ transformPartitionBoundValue(pstate, expr, colname[i],
+ get_partition_col_typid(key, idx),
+ get_partition_col_typmod(key, idx),
+ get_partition_col_collation(key, idx));
+
+ values = lappend(values, val);
+ }

Array index for colname should be "idx".
---

        result->scan_default = partition_bound_has_default(boundinfo);
+
        return result;
...

        /* Always include the default partition if any. */
        result->scan_default = partition_bound_has_default(boundinfo);
-
        return result;

...
            else
                result->scan_default = partition_bound_has_default(boundinfo);
+
            return result;
...

-               /* Add columns specified to SET NULL or SET DEFAULT if
provided. */
+               /*
+                * Add columns specified to SET NULL or SET DEFAULT if
+                * provided.
+                */

spurious change -- look like something not related to your patch.
--

-        * For range partitioning, we must only perform pruning with values
-        * for either all partition keys or a prefix thereof.
+        * For range partitioning and list partitioning, we must only perform
+        * pruning with values for either all partition keys or a prefix
+        * thereof.
         */
-       if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+       if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
+                               context->strategy == PARTITION_STRATEGY_LIST))
            break;

I think this is not true for multi-value list partitions, we might
still want prune partitions for e.g. (100, IS NULL, 20).  Correct me
if I am missing something here.
---

        /*
-        * For range partitioning, if we have no clauses for the current key,
-        * we can't consider any later keys either, so we can stop here.
+        * For range partitioning and list partitioning, if we have no clauses
+        * for the current key, we can't consider any later keys either, so we
+        * can stop here.
         */
-       if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+       if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
+            part_scheme->strategy == PARTITION_STRATEGY_LIST) &&
            clauselist == NIL)
            break

Similarly, why would this be true for list partitioning? How can we
prune partitions if values is for e.g. (100, <not given>, 20).
--

-       if (bms_is_member(keyno, opstep->nullkeys))
+       if (bms_is_member(keyno, opstep->nullkeys) &&
+           context->strategy != PARTITION_STRATEGY_LIST)
            continue;
Will that prune for all NULL partitioning key values?
---

+                           appendStringInfoString
+                               (buf,
get_list_partbound_value_string(lfirst(cell)));

Formatting is not quite right.
--

+/*
+ * get_min_and_max_offset
+ *
+ * Fetches the minimum and maximum offset of the matching partitions.
+ */

...

+/*
+ * get_min_or_max_off
+ *
+ * Fetches either minimum or maximum offset of the matching partitions
+ * depending on the value of is_min parameter.
+ */

I am not sure we really have to have separate functions but if needed
then I would prefer to have a separate function for each min and max
rather than combining.
---

+       if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+       {
+           *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
+           return PARTCLAUSE_MATCH_NULLNESS;
+       }
+
+       expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
true, false);
+       partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+
+       partclause->keyno = partkeyidx;
+       partclause->expr = (Expr *) expr;
+       partclause->is_null = true;
+
+       if (nulltest->nulltesttype == IS_NOT_NULL)
+       {
+           partclause->op_is_ne = true;
+           partclause->op_strategy = InvalidStrategy;
+       }
+       else
+       {
+           partclause->op_is_ne = false;
+           partclause->op_strategy = BTEqualStrategyNumber;
+       }

-       return PARTCLAUSE_MATCH_NULLNESS;
+       *pc = partclause;
+       return PARTCLAUSE_MATCH_CLAUSE;

I still believe considering NULL value for match clause is not a
fundamentally correct thing. And that is only for List partitioning
which isn't aligned with the other partitioning.
---

Regards,
Amul

Re: Multi-Column List Partitioning

From

Nitin Jadhav

Date:

29 December 2021, 13:58:19

Thanks for reviewing.

> > > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
> > > added that for partition_bound_accepts_nulls(), but we can easily get
> > > that value from the partitioning key & pass an additional argument.
> > > Also, no information about the length of the "isnulls" array.
> >
> > This is required during merge_list_bounds(). AFAIK partition key
> > information is not available here.
> >
>
> You can get that as an argument, see merge_range_bounds().

Fixed.
---

> + char   **colname = (char **) palloc0(partnatts * sizeof(char *));
>
> palloc0 is unnecessary.

Fixed.
---

> + foreach(cell2, rowexpr->args)
> + {
> + int idx = foreach_current_index(cell2);
> + Node    *expr = lfirst(cell2);
> + Const    *val =
> + transformPartitionBoundValue(pstate, expr, colname[i],
> + get_partition_col_typid(key, idx),
> + get_partition_col_typmod(key, idx),
> + get_partition_col_collation(key, idx));
> +
> + values = lappend(values, val);
> + }
>
> Array index for colname should be "idx".

Fixed.
---

>        result->scan_default = partition_bound_has_default(boundinfo);
> +
>        return result;
> ...
>
>        /* Always include the default partition if any. */
>        result->scan_default = partition_bound_has_default(boundinfo);
> -
>        return result;
>
> ...
>            else
>                result->scan_default = partition_bound_has_default(boundinfo);
> +
>            return result;
> ...
>
> -               /* Add columns specified to SET NULL or SET DEFAULT if
> provided. */
> +               /*
> +                * Add columns specified to SET NULL or SET DEFAULT if
> +                * provided.
> +                */
>
> spurious change -- look like something not related to your patch.

Fixed.
---

> -        * For range partitioning, we must only perform pruning with values
> -        * for either all partition keys or a prefix thereof.
> +        * For range partitioning and list partitioning, we must only perform
> +        * pruning with values for either all partition keys or a prefix
> +        * thereof.
>         */
> -       if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
> +       if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
> +                               context->strategy == PARTITION_STRATEGY_LIST))
>            break;
>
> I think this is not true for multi-value list partitions, we might
> still want prune partitions for e.g. (100, IS NULL, 20).  Correct me
> if I am missing something here.

AFAIK, the above condition/comments says that, either we should
include all keys or prefixes of the partition keys to get the
partition pruning results. For example if we have a table with 2
columns and both are present in the partition key. Let the column
names be 'a' and 'b'.

SELECT * FROM table WHERE a=1 AND b=1;  - This query works for pruning
and it refers to a comment which says all partition keys are included.
SELECT * FROM table WHERE b=1;  - Here partition pruning does not work
as it does not contain prefix of the partition keys.
SELECT * FROM table WHERE a=1; - This query works fine as column 'a'
is prefix of partition keys.

Please let me know if you need more information.
---

> -        * For range partitioning, if we have no clauses for the current key,
> -        * we can't consider any later keys either, so we can stop here.
> +        * For range partitioning and list partitioning, if we have no clauses
> +        * for the current key, we can't consider any later keys either, so we
> +        * can stop here.
>         */
> -       if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
> +       if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
> +            part_scheme->strategy == PARTITION_STRATEGY_LIST) &&
>           clauselist == NIL)
>            break
>
> Similarly, why would this be true for list partitioning? How can we
> prune partitions if values is for e.g. (100, <not given>, 20).

The above description holds good for this also. As per the current
design, partition pruning is not applicable for the above example.
Kindly confirm whether we should support such scenarios.
---

> -       if (bms_is_member(keyno, opstep->nullkeys))
> +       if (bms_is_member(keyno, opstep->nullkeys) &&
> +           context->strategy != PARTITION_STRATEGY_LIST)
>            continue;
> Will that prune for all NULL partitioning key values?

Yes. This allows pruning with NULL values for list partitioning.
---

> +                           appendStringInfoString
> +                               (buf,
> get_list_partbound_value_string(lfirst(cell)));
>
> Formatting is not quite right.

Fixed.
---

> +/*
> + * get_min_and_max_offset
> + *
> + * Fetches the minimum and maximum offset of the matching partitions.
> + */
>
> ...
>
> +/*
> + * get_min_or_max_off
> + *
> + * Fetches either minimum or maximum offset of the matching partitions
> + * depending on the value of is_min parameter.
> + */
>
> I am not sure we really have to have separate functions but if needed
> then I would prefer to have a separate function for each min and max
> rather than combining.

If we don't make a separate function, then we have to include this
code in get_matching_list_bounds() which is already a big function. I
just made a separate function to not increase the complexity of
get_matching_list_bounds() and most of the code present in
get_min_or_max_off() is common for min and max calculation. If we make
it separate then there might be a lot of duplications. Please let me
know if you still feel if any action is required.
---

> +       if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> +       {
> +           *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> +           return PARTCLAUSE_MATCH_NULLNESS;
> +       }
> +
> +       expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
> true, false);
> +       partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
> +
> +       partclause->keyno = partkeyidx;
> +       partclause->expr = (Expr *) expr;
> +       partclause->is_null = true;
> +
> +       if (nulltest->nulltesttype == IS_NOT_NULL)
> +       {
> +           partclause->op_is_ne = true;
> +           partclause->op_strategy = InvalidStrategy;
> +       }
> +       else
> +       {
> +           partclause->op_is_ne = false;
> +           partclause->op_strategy = BTEqualStrategyNumber;
> +       }
>
> -       return PARTCLAUSE_MATCH_NULLNESS;
> +       *pc = partclause;
> +       return PARTCLAUSE_MATCH_CLAUSE;
>
> I still believe considering NULL value for match clause is not a
> fundamentally correct thing. And that is only for List partitioning
> which isn't aligned with the other partitioning.

As other partitions which support multiple partition keys (Range
partitioning) do not support NULL values. This feature supports
multiple partition keys with list partitioning and it also supports
NULL values. With the existing design, I have tried to support this
feature with minimal changes as possible. If this is not the right
approach to support NULL values, I would like to know how we can
support multiple NULL values. Kindly provide more information.

Thanks & Regards,
Nitin Jadhav




On Thu, Dec 23, 2021 at 6:33 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Tue, Dec 21, 2021 at 6:34 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> >
> > ---
> >
> > > +           if (isnulls && isnulls[i])
> > > +               cmpval = 0;     /* NULL "=" NULL */
> > > +           else
> > > +               cmpval = 1;     /* NULL ">" not-NULL */
> > > +       }
> > > +       else if (isnulls && isnulls[i])
> > > +           cmpval = -1;        /* not-NULL "<" NULL */
> > >
> > > I really doubt this assumption is correct; aren't those strict operators?
> >
> > Now there are possibilities of multiple NULL values. We should have a
> > mechanism to sort it when the bound values contain Non NULL and NULL
> > values. As per the above logic we put the NULL values at the end.
> > Please let me know if I am wrong.
>
> Ok, but I am not sure about the comparison approach, let's see what
> others think.
>
> > ---
> [...]
> >
> > > typedef struct PartitionBoundInfoData
> > > {
> > >    char        strategy;       /* hash, list or range? */
> > > +   int         partnatts;      /* number of partition key columns */
> > >    int         ndatums;        /* Length of the datums[] array */
> > >    Datum     **datums;
> > > +   bool      **isnulls;
> > >
> > > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
> > > added that for partition_bound_accepts_nulls(), but we can easily get
> > > that value from the partitioning key & pass an additional argument.
> > > Also, no information about the length of the "isnulls" array.
> >
> > This is required during merge_list_bounds(). AFAIK partition key
> > information is not available here.
> >
>
> You can get that as an argument, see merge_range_bounds().
>
> > > I think it would be helpful if you could split the patch: one for
> > > multi-value list partitioning and another for the partition wise join, thanks.
> >
> > I have split the patch into 2 patches. One is for the multi column
> > list partitioning core changes and the other is for partition-wise
> > join support. Each patch has its respective test cases in the
> > regression suit and regression tests run successfully on each patch.
> > Kindly let me know if any other changes are required here.
> >
>
> Thanks, for the slit that is much helpful, I have a few comments for
> the 0001 patch as follow:
>
> + char   **colname = (char **) palloc0(partnatts * sizeof(char *));
>
> palloc0 is unnecessary.
> ---
>
> + foreach(cell2, rowexpr->args)
> + {
> + int idx = foreach_current_index(cell2);
> + Node    *expr = lfirst(cell2);
> + Const    *val =
> + transformPartitionBoundValue(pstate, expr, colname[i],
> + get_partition_col_typid(key, idx),
> + get_partition_col_typmod(key, idx),
> + get_partition_col_collation(key, idx));
> +
> + values = lappend(values, val);
> + }
>
> Array index for colname should be "idx".
> ---
>
>         result->scan_default = partition_bound_has_default(boundinfo);
> +
>         return result;
> ...
>
>         /* Always include the default partition if any. */
>         result->scan_default = partition_bound_has_default(boundinfo);
> -
>         return result;
>
> ...
>             else
>                 result->scan_default = partition_bound_has_default(boundinfo);
> +
>             return result;
> ...
>
> -               /* Add columns specified to SET NULL or SET DEFAULT if
> provided. */
> +               /*
> +                * Add columns specified to SET NULL or SET DEFAULT if
> +                * provided.
> +                */
>
> spurious change -- look like something not related to your patch.
> --
>
> -        * For range partitioning, we must only perform pruning with values
> -        * for either all partition keys or a prefix thereof.
> +        * For range partitioning and list partitioning, we must only perform
> +        * pruning with values for either all partition keys or a prefix
> +        * thereof.
>          */
> -       if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
> +       if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
> +                               context->strategy == PARTITION_STRATEGY_LIST))
>             break;
>
> I think this is not true for multi-value list partitions, we might
> still want prune partitions for e.g. (100, IS NULL, 20).  Correct me
> if I am missing something here.
> ---
>
>         /*
> -        * For range partitioning, if we have no clauses for the current key,
> -        * we can't consider any later keys either, so we can stop here.
> +        * For range partitioning and list partitioning, if we have no clauses
> +        * for the current key, we can't consider any later keys either, so we
> +        * can stop here.
>          */
> -       if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
> +       if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
> +            part_scheme->strategy == PARTITION_STRATEGY_LIST) &&
>             clauselist == NIL)
>             break
>
> Similarly, why would this be true for list partitioning? How can we
> prune partitions if values is for e.g. (100, <not given>, 20).
> --
>
> -       if (bms_is_member(keyno, opstep->nullkeys))
> +       if (bms_is_member(keyno, opstep->nullkeys) &&
> +           context->strategy != PARTITION_STRATEGY_LIST)
>             continue;
> Will that prune for all NULL partitioning key values?
> ---
>
> +                           appendStringInfoString
> +                               (buf,
> get_list_partbound_value_string(lfirst(cell)));
>
> Formatting is not quite right.
> --
>
> +/*
> + * get_min_and_max_offset
> + *
> + * Fetches the minimum and maximum offset of the matching partitions.
> + */
>
> ...
>
> +/*
> + * get_min_or_max_off
> + *
> + * Fetches either minimum or maximum offset of the matching partitions
> + * depending on the value of is_min parameter.
> + */
>
> I am not sure we really have to have separate functions but if needed
> then I would prefer to have a separate function for each min and max
> rather than combining.
> ---
>
> +       if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> +       {
> +           *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> +           return PARTCLAUSE_MATCH_NULLNESS;
> +       }
> +
> +       expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
> true, false);
> +       partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
> +
> +       partclause->keyno = partkeyidx;
> +       partclause->expr = (Expr *) expr;
> +       partclause->is_null = true;
> +
> +       if (nulltest->nulltesttype == IS_NOT_NULL)
> +       {
> +           partclause->op_is_ne = true;
> +           partclause->op_strategy = InvalidStrategy;
> +       }
> +       else
> +       {
> +           partclause->op_is_ne = false;
> +           partclause->op_strategy = BTEqualStrategyNumber;
> +       }
>
> -       return PARTCLAUSE_MATCH_NULLNESS;
> +       *pc = partclause;
> +       return PARTCLAUSE_MATCH_CLAUSE;
>
> I still believe considering NULL value for match clause is not a
> fundamentally correct thing. And that is only for List partitioning
> which isn't aligned with the other partitioning.
> ---
>
> Regards,
> Amul

Attachment

Re: Multi-Column List Partitioning

From

Amul Sul

Date:

03 January 2022, 04:03:37

On Wed, Dec 29, 2021 at 7:26 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
>
> > -        * For range partitioning, we must only perform pruning with values
> > -        * for either all partition keys or a prefix thereof.
> > +        * For range partitioning and list partitioning, we must only perform
> > +        * pruning with values for either all partition keys or a prefix
> > +        * thereof.
> >         */
> > -       if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
> > +       if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
> > +                               context->strategy == PARTITION_STRATEGY_LIST))
> >            break;
> >
> > I think this is not true for multi-value list partitions, we might
> > still want prune partitions for e.g. (100, IS NULL, 20).  Correct me
> > if I am missing something here.
>
> AFAIK, the above condition/comments says that, either we should
> include all keys or prefixes of the partition keys to get the
> partition pruning results. For example if we have a table with 2
> columns and both are present in the partition key. Let the column
> names be 'a' and 'b'.
>
> SELECT * FROM table WHERE a=1 AND b=1;  - This query works for pruning
> and it refers to a comment which says all partition keys are included.
> SELECT * FROM table WHERE b=1;  - Here partition pruning does not work
> as it does not contain prefix of the partition keys.
> SELECT * FROM table WHERE a=1; - This query works fine as column 'a'
> is prefix of partition keys.
>
> Please let me know if you need more information.

That what I was assuming is not correct. The dependency of the prefix
is true for the range partitioning but why should that be in the case
of list partitioning? I think all partitioning keys in the list will
not be dependent on each other, AFAICU. If you prune list partitions
based on the b=1 value that still is correct & gives the correct
result, correct me If I am wrong.

> ---
>
> > +/*
> > + * get_min_and_max_offset
> > + *
> > + * Fetches the minimum and maximum offset of the matching partitions.
> > + */
> >
> > ...
> >
> > +/*
> > + * get_min_or_max_off
> > + *
> > + * Fetches either minimum or maximum offset of the matching partitions
> > + * depending on the value of is_min parameter.
> > + */
> >
> > I am not sure we really have to have separate functions but if needed
> > then I would prefer to have a separate function for each min and max
> > rather than combining.
>
> If we don't make a separate function, then we have to include this
> code in get_matching_list_bounds() which is already a big function. I
> just made a separate function to not increase the complexity of
> get_matching_list_bounds() and most of the code present in
> get_min_or_max_off() is common for min and max calculation. If we make
> it separate then there might be a lot of duplications. Please let me
> know if you still feel if any action is required.

Hmm, ok, I personally didn't like to have two functions one gives max
and min and the other gives only max or min, the other could have
different opinions.

How about keeping only one function say, get_min_max_off() and based
on the argument e.g. minoff & maxoff fetch the value, I mean e.g. if
minoff is not null then fetch the value otherwise skip that, same for
maxoff too.

> ---
>
> > +       if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> > +       {
> > +           *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> > +           return PARTCLAUSE_MATCH_NULLNESS;
> > +       }
> > +
> > +       expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
> > true, false);
> > +       partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
> > +
> > +       partclause->keyno = partkeyidx;
> > +       partclause->expr = (Expr *) expr;
> > +       partclause->is_null = true;
> > +
> > +       if (nulltest->nulltesttype == IS_NOT_NULL)
> > +       {
> > +           partclause->op_is_ne = true;
> > +           partclause->op_strategy = InvalidStrategy;
> > +       }
> > +       else
> > +       {
> > +           partclause->op_is_ne = false;
> > +           partclause->op_strategy = BTEqualStrategyNumber;
> > +       }
> >
> > -       return PARTCLAUSE_MATCH_NULLNESS;
> > +       *pc = partclause;
> > +       return PARTCLAUSE_MATCH_CLAUSE;
> >
> > I still believe considering NULL value for match clause is not a
> > fundamentally correct thing. And that is only for List partitioning
> > which isn't aligned with the other partitioning.
>
> As other partitions which support multiple partition keys (Range
> partitioning) do not support NULL values. This feature supports
> multiple partition keys with list partitioning and it also supports
> NULL values. With the existing design, I have tried to support this
> feature with minimal changes as possible. If this is not the right
> approach to support NULL values, I would like to know how we can
> support multiple NULL values. Kindly provide more information.

I haven't studied the whole partition pruning code and don't know the
complete code flow, but AFAICU, this is not the correct way to handle null
value.

Regards,
Amul

Re: Multi-Column List Partitioning

From

Julien Rouhaud

Date:

14 January 2022, 09:49:15

Hi,

The cfbot reports some clang warning on the last version of the patchset:

https://cirrus-ci.com/task/6721617647632384

[16:35:24.444] partprune.c:2775:8: error: explicitly assigning value of variable of type 'int' to itself
[-Werror,-Wself-assign]
[16:35:24.444]                         off = off;
[16:35:24.444]

A quick look at the patch seems to indicate that it's harmless dead code that
should be removed, and there are no other failure reported by the cfbot, but I
definitely didn't read the patch carefully so a confirmation (and an updated
patch) would be welcome.

However I see that Amul raised some concerns, so I will change the patch status
to Waiting on Author.  Feel free to switch it back to Needs Review if you think
it's more appropriate.