Thread: parse partition strategy string in gram.y

parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

21 October 2022, 09:32:16

Hello

I've had this patch sitting in a local branch for way too long.  It's a
trivial thing but for some reason it bothered me: we let the partition 
strategy flow into the backend as a string and only parse it into the
catalog 1-char version quite late.

This patch makes gram.y responsible for parsing it and passing it down
as a value from a new enum, which looks more neat.  Because it's an
enum, some "default:" cases can be removed in a couple of places.  I
also added a new elog() in case the catalog contents becomes broken.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

Attachment

0001-have-gram.y-resolve-partition-strategy-names.patch

Re: parse partition strategy string in gram.y

From

Japin Li

Date:

21 October 2022, 10:05:18

On Fri, 21 Oct 2022 at 17:32, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Hello
>
> I've had this patch sitting in a local branch for way too long.  It's a
> trivial thing but for some reason it bothered me: we let the partition 
> strategy flow into the backend as a string and only parse it into the
> catalog 1-char version quite late.
>
> This patch makes gram.y responsible for parsing it and passing it down
> as a value from a new enum, which looks more neat.  Because it's an
> enum, some "default:" cases can be removed in a couple of places.  I
> also added a new elog() in case the catalog contents becomes broken.

Does there an error about forget the LIST partition?

+/*
+ * Parse a user-supplied partition strategy string into parse node
+ * PartitionStrategy representation, or die trying.
+ */
+static PartitionStrategy
+parsePartitionStrategy(char *strategy)
+{
+       if (pg_strcasecmp(strategy, "range") == 0)       <-- it should be list
+               return PARTITION_STRATEGY_RANGE;         <-- PARTITION_STRATEGY_LIST
+       else if (pg_strcasecmp(strategy, "hash") == 0)
+               return PARTITION_STRATEGY_HASH;
+       else if (pg_strcasecmp(strategy, "range") == 0)
+               return PARTITION_STRATEGY_RANGE;
+       ereport(ERROR,
+                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                        errmsg("unrecognized partitioning strategy \"%s\"",
+                                       strategy)));
+}
+


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

21 October 2022, 10:12:11

On 2022-Oct-21, Japin Li wrote:

> Does there an error about forget the LIST partition?

Of course.
https://cirrus-ci.com/build/4721735111540736

This is what you get for moving cases around at the last minute ...

Fixed, thanks.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Attachment

v2-0001-have-gram.y-resolve-partition-strategy-names.patch

Re: parse partition strategy string in gram.y

From

Japin Li

Date:

21 October 2022, 10:22:44

On Fri, 21 Oct 2022 at 18:12, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2022-Oct-21, Japin Li wrote:
>
>> Does there an error about forget the LIST partition?
>
> Of course.
> https://cirrus-ci.com/build/4721735111540736
>
> This is what you get for moving cases around at the last minute ...
>

Is there any way to get the regression tests diffs from Cirrus CI?
I did not find the diffs in [1].

[1] https://cirrus-ci.com/build/4721735111540736

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

21 October 2022, 10:26:48

On 2022-Oct-21, Japin Li wrote:

> Is there any way to get the regression tests diffs from Cirrus CI?
> I did not find the diffs in [1].

I think they should be somewhere in the artifacts, but I'm not sure.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

21 October 2022, 10:38:21

headerscheck fail, fixed here.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)

Attachment

v3-0001-have-gram.y-resolve-partition-strategy-names.patch

Re: parse partition strategy string in gram.y

From

Justin Pryzby

Date:

21 October 2022, 12:34:35

On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
> Is there any way to get the regression tests diffs from Cirrus CI?
> I did not find the diffs in [1].
> 
> [1] https://cirrus-ci.com/build/4721735111540736

They're called "main".
I'm planning on submitting a patch to rename it to "regress", someday.
See also: https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

-- 
Justin

Re: parse partition strategy string in gram.y

From

Japin Li

Date:

21 October 2022, 13:46:18

On Fri, 21 Oct 2022 at 20:34, Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
>> Is there any way to get the regression tests diffs from Cirrus CI?
>> I did not find the diffs in [1].
>> 
>> [1] https://cirrus-ci.com/build/4721735111540736
>
> They're called "main".
> I'm planning on submitting a patch to rename it to "regress", someday.
> See also: https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

Oh, thank you very much!  I find it in testrun/build/testrun/main/regress [1].

[1] https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: parse partition strategy string in gram.y

From

"Finnerty, Jim"

Date:

24 October 2022, 14:32:02

Is there a reason why HASH partitioning does not currently support range partition bounds, where the values in the
partitionbounds would refer to the hashed value?
 

The advantage of hash  partition bounds is that they are not domain-specific, as they are for ordinary RANGE
partitions,but they are more flexible than MODULUS/REMAINDER partition bounds.
 

On 10/21/22, 9:48 AM, "Japin Li" <japinli@hotmail.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 



    On Fri, 21 Oct 2022 at 20:34, Justin Pryzby <pryzby@telsasoft.com> wrote:
    > On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
    >> Is there any way to get the regression tests diffs from Cirrus CI?
    >> I did not find the diffs in [1].
    >>
    >> [1] https://cirrus-ci.com/build/4721735111540736
    >
    > They're called "main".
    > I'm planning on submitting a patch to rename it to "regress", someday.
    > See also: https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

    Oh, thank you very much!  I find it in testrun/build/testrun/main/regress [1].

    [1]
https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs

    --
    Regrads,
    Japin Li.
    ChengDu WenWu Information Technology Co.,Ltd.

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

24 October 2022, 16:13:17

On 2022-Oct-24, Finnerty, Jim wrote:

> Is there a reason why HASH partitioning does not currently support
> range partition bounds, where the values in the partition bounds would
> refer to the hashed value?

Just lack of an implementation, I suppose.

> The advantage of hash partition bounds is that they are not
> domain-specific, as they are for ordinary RANGE partitions, but they
> are more flexible than MODULUS/REMAINDER partition bounds.

Well, modulus/remainder is what we have.  If you have ideas for a
different implementation, let's hear them.  I suppose we would have to
know about both the user interface and how it would internally, from two
perspectives: how does tuple routing work (ie. how to match a tuple's
values to a set of bound values), and how does partition pruning work
(ie. how do partition bounds match a query's restriction clauses).

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: parse partition strategy string in gram.y

From

Tom Lane

Date:

25 October 2022, 00:50:24

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2022-Oct-24, Finnerty, Jim wrote:
>> The advantage of hash partition bounds is that they are not
>> domain-specific, as they are for ordinary RANGE partitions, but they
>> are more flexible than MODULUS/REMAINDER partition bounds.

I'm more than a bit skeptical of that claim.  Under what
circumstances (other than a really awful hash function,
perhaps) would it make sense to not use equi-sized hash
partitions?  If you can predict that more stuff is going
to go into one partition than another, then you need to
fix your hash function, not invent more complication for
the core partitioning logic.

            regards, tom lane

Re: parse partition strategy string in gram.y

From

"Finnerty, Jim"

Date:

25 October 2022, 14:18:51

It will often happen that some hash keys are more frequently referenced than others.  Consider a scenario where
customer_idis the hash key, and one customer is very large in terms of their activity, like IBM, and other keys have
muchless activity.  This asymmetry creates a noisy neighbor problem.  Some partitions may have more than one noisy
neighbor,and in general it would be more flexible to be able to divide the work evenly in terms of activity instead of
evenlywith respect to the encoding of the keys.

On 10/24/22, 8:50 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.

    Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
    > On 2022-Oct-24, Finnerty, Jim wrote:
    >> The advantage of hash partition bounds is that they are not
    >> domain-specific, as they are for ordinary RANGE partitions, but they
    >> are more flexible than MODULUS/REMAINDER partition bounds.

    I'm more than a bit skeptical of that claim.  Under what
    circumstances (other than a really awful hash function,
    perhaps) would it make sense to not use equi-sized hash
    partitions?  

<snip>

                            regards, tom lane

Re: parse partition strategy string in gram.y

From

"Finnerty, Jim"

Date:

25 October 2022, 18:36:27

Or if you know the frequencies of the highly frequent values of the partitioning key at the time the partition bounds
aredefined, you could define hash ranges that contain approximately the same number of rows in each partition.  A
parallelsequential scan of all partitions would then perform better because data skew is minimized.

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

25 October 2022, 23:15:32

On 2022-Oct-25, Finnerty, Jim wrote:

> Or if you know the frequencies of the highly frequent values of the
> partitioning key at the time the partition bounds are defined, you
> could define hash ranges that contain approximately the same number of
> rows in each partition.  A parallel sequential scan of all partitions
> would then perform better because data skew is minimized. 

This sounds very much like list partitioning to me.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The problem with the future is that it keeps turning into the present"
(Hobbes)

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

25 October 2022, 23:23:36

On 2022-Oct-26, Alvaro Herrera wrote:

> On 2022-Oct-25, Finnerty, Jim wrote:
> 
> > Or if you know the frequencies of the highly frequent values of the
> > partitioning key at the time the partition bounds are defined, you
> > could define hash ranges that contain approximately the same number of
> > rows in each partition.  A parallel sequential scan of all partitions
> > would then perform better because data skew is minimized. 
> 
> This sounds very much like list partitioning to me.

... or maybe you mean "if the value is X then use this specific
partition, otherwise use hash partitioning".  It's a bit like
multi-level partitioning, but not really.

(You could test this idea by using two levels, list partitioning on top
with a default partition which is in turn partitioned by hash; but this
is unlikely to work well for large scale in practice.  Or does it?)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Entristecido, Wutra                     (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"

Re: parse partition strategy string in gram.y

From

Alvaro Herrera

Date:

03 November 2022, 15:42:17

Pushed this.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/