POC: Comparison of partitioning key values - Mailing list pgsql-hackers

From John Mikk
Subject POC: Comparison of partitioning key values
Date
Msg-id CADY9qXf_RsNNfZ88qYpzW7-3kreQq2fd+6Gezbm85pm_-FZHGQ@mail.gmail.com
Whole thread
Responses Re: POC: Comparison of partitioning key values
List pgsql-hackers
Hackers, Hi!

When executing the SQL script, we discovered strange behavior — one of the partitions cannot be created.

```sql
drop table if exists grid2 cascade;

create table grid2(x bigint, y bigint) partition by range (x,y);

create table g2_part1 partition of grid2 for values from (1,3) to (7,11);
create table g2_part2 partition of grid2 for values from (7,11) to (13,15);
create table g2_part3 partition of grid2 for values from (13,15) to (15,17);
create table g2_part4 partition of grid2 for values from (15,17) to (19,21);
--
create table g2_part5 partition of grid2 for values from (5,15) to (13,17);
-- [42P17] ERROR: partition "g2_part5" would overlap partition "g2_part1"
```

Why is that?

According to the documentation, the row comparison rule for tables applies (subsection 9.25.5).
However, in the case of row comparison, it is possible to override comparison operators, which is not the case when defining partitions.
There is no way to override the comparison operator for partition ranges. And is this general approach always correct in the case of partitioning?

For example, the following approach seems quite valid:

If we look at the relative positions of the ranges for each partitioning key (of a single table) on a plane, assuming that the from and to values of the partitioning key in for values are points on the main diagonal of a rectangle representing possible key values for the partition, then there appear to be no obstacles to creating the last partition in the provided SQL script.

Illustrative Python script:

```python
import matplotlib.pyplot as plt
import matplotlib.patches as patches

fig, ax = plt.subplots()

def key(point, radius=0.1, color='red', ax=ax):
circle = patches.Circle(point, radius=radius, color=color, alpha=1)
ax.add_patch(circle)
return

def range(lower, upper, label='', facecolor='blue', edgecolor='red', alpha=0.5, ax=ax):
x1, y1 = lower
x2, y2 = upper

x_min, y_min = (min(x1, x2), min(y1, y2))
width, height = (abs(x2 - x1), abs(y2 - y1))

rectangle = patches.Rectangle(
(x_min, y_min), width, height,
facecolor=facecolor, edgecolor=edgecolor, alpha=alpha
)

ax.add_patch(rectangle)

if label:
text_x, text_y = (x_min, y_min)
ax.text(text_x, text_y, label, fontsize=8, color='white', ha='left', va='bottom')

return rectangle

# Range, Key. -------------------------
range( (1, 3, ), (7, 11, ), 'g2_part1')
range( (7, 11, ), (13, 15, ), 'g2_part2')
range( (13, 15, ), (15, 17, ), 'g2_part3')
range( (15, 17, ), (19, 21, ), 'g2_part4')
range( (5, 15, ), (13, 17, ), 'g2_part5', 'black')

# Show. -------------------------------
plt.xlim(0, 25)
plt.ylim(0, 25)
plt.gca().set_aspect('equal')
plt.grid(True, alpha=0.3)
plt.show()
```

Let us provide a generalized reasoning:

Suppose we have a range-partitioned table with a partitioning key consisting of n attributes.
Let us consider the from and to values of the partitioning key from the for values clause as points on the main diagonal of an n-dimensional parallelepiped of permissible key values for a specific partition.
Of course, our set is finite, but this perspective on the partitioning key allows us to use a fact from multidimensional geometry:

Let the first parallelepiped be defined by the main diagonal points
A = (a_1, a_2, ..., a_n) and A' = (a_1', a_2', ..., a_n'), where a_i < a_i' for all i,
and the second parallelepiped be defined by the main diagonal points
B = (b_1, b_2, ..., b_n) and B' = (b_1', b_2', ..., b_n'), where b_i < b_i' for all i.

In this case, the following theorem (statement) holds true:

Two parallelepipeds do not intersect if and only if there exists at least one coordinate k (from 1 to n) for which the projection intervals on this axis do not intersect; that is, the intersection of
[a_k, a_k'] and [b_k, b_k'] is an empty set, or equivalently, the condition (a_k' < b_k) OR (b_k' < a_k) holds for one of the k values.

In other words, this fact establishes: first, when two figures do not intersect and are separated by a hyperplane x_k = const; and second, the necessary and sufficient condition for the figures to intersect, given that all projections intersect. The latter remains valid for a parallelepiped reduced to a point and can determine whether a new key belongs to a particular partition.

The experimental patch I am proposing (v1-0001-partition-by-range.patch) introduces changes to the source code based on the described approach, so that the partition from the example can be created.

Moreover, the algorithm for determining the partition of a new key during an insert operation is even more mysterious in the current implementation; my patch corrects this.

In the proposed patch, writing to output directly via fprintf is hardcoded. This allows obtaining a plain text list of existing and added ranges after each operation for the illustrative Python script.

A segmentation fault will occur during update and delete operations.

Or would adding an override for the range comparison operator be a more flexible and correct approach?

I would like to hear your opinion, dear hackers!
Attachment

pgsql-hackers by date:

Previous
From: "Matheus Alcantara"
Date:
Subject: Re: docs: Include database collation check on SQL from alter_collation.sgml
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired