Thread: Custom tuplesorts for extensions

Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

23 June 2022, 08:50:42

Hackers,

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

0001-Remove-Tuplesortstate.copytup-v1.patch
It's unclear for me how do we split functionality between
Tuplesortstate.copytup() function and tuplesort_put*() functions. For
instance, copytup_index() and copytup_datum() return error while
tuplesort_putindextuplevalues() and tuplesort_putdatum() do their
work. The patch removes Tuplesortstate.copytup() altogether, putting
their functions to tuplesort_put*().

0002-Tuplesortstate.getdatum1-method-v1.patch
0003-Put-abbreviation-logic-into-puttuple_common-v1.patch
The tuplesort_put*() functions contains common part related to dealing
with abbreviation. The 0002 extracts logic of getting value of
SortTuple.datum1 into Tuplesortstate.getdatum1() function. Thanks to
this new interface function, 0003 puts abbreviation logic into
puttuple().

0004-Move-freeing-memory-away-from-writetup-v1.patch
Assuming that SortTuple.tuple is always just a single chunk of memory,
we can put memory counting logic away from Tuplesortstate.writetup().
This makes Tuplesortstate.getdatum1() easier to implement without
knowledge of tuplesort.c guts.

0005-Reorganize-data-structures-v1.patch
This commit splits the "public" part of Tuplesortstate into
TuplesortOps, which is intended to be exposed outside tuplesort.c.

0006-Split-tuplesortops.c-v1.patch
This patch finally splits tuplesortops.c from tuplesort.c. tuplesort.c
leaves which generic routines for tuple sort, while tuplesortops.c
provides the implementation for particular tuple formats.

------
Regards,
Alexander Korotkov

Attachment

Re: Custom tuplesorts for extensions

From

Pavel Borisov

Date:

23 June 2022, 10:09:13

I've bumped into this case in RUM extension. The need to build it with tuplesort changes in different PG versions led me to reluctantly including different tuplesort.c versions into the extension code. So I totally support the intention of this patch and I'm planning to invest some time to review it.

Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Custom tuplesorts for extensions

From

Pavel Borisov

Date:

23 June 2022, 11:26:04

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

I still have one doubt about the thing: the compatibility with previous PG versions requires me to support code paths that I already added into RUM extension. I won't be able to drop it from extension for quite long time in the future. It could be avoided if we backpatch this, which seems doubtful to me provided the volume of code changes.

If we just change this thing since say v16 this will only help to extensions that doesn't support earlier PG versions. I still consider the change beneficial but wonder do you have some view on how should it be managed in existing extensions to benefit them?

Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Custom tuplesorts for extensions

From

Maxim Orlov

Date:

23 June 2022, 12:12:27

Hi!

I've reviewed the patchset and noticed some minor issues:

- extra semicolon in macro (lead to warnings)

- comparison of var isWorker should be done in different way

Here is an upgraded version of the patchset.

Overall, I consider this patchset useful. Any opinions?

Best regards,

Maxim Orlov.

Attachment

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

23 June 2022, 13:14:59

Hi, Pavel!

Thank you for your feedback.

On Thu, Jun 23, 2022 at 2:26 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> Some PostgreSQL extensions need to sort their pieces of data. Then it
>> worth to re-use our tuplesort. But despite our tuplesort having
>> extensibility, it's hidden inside tuplesort.c. There are at least a
>> couple of examples of how extensions deal with that.
>>
>> 1. RUM table access method: https://github.com/postgrespro/rum
>> RUM repository contains a copy of tuplesort.c for each major
>> PostgreSQL release. A reliable solution, but this is not how things
>> are intended to work, right?
>> 2. OrioleDB table access method: https://github.com/orioledb/orioledb
>> OrioleDB runs on patches PostgreSQL. It contains a patch, which just
>> exposes all the guts of tuplesort.c to the tuplesort.h
>> https://github.com/orioledb/postgres/commit/d42755f52c
>>
>> I think we need a proper way to let extension re-use our core
>> tuplesort facility. The attached patchset is intended to do this the
>> right way. Patches don't revise all the comments and lack code
>> beautification. The intention behind publishing this revision is to
>> verify the direction and get some feedback for further work.
>
>
> I still have one doubt about the thing: the compatibility with previous PG versions requires me to support code paths
thatI already added into RUM extension. I won't be able to drop it from extension for quite long time in the future. It
couldbe avoided if we  backpatch this, which seems doubtful to me provided the volume of code changes. 
>
> If we just change this thing since say v16 this will only help to extensions that doesn't support earlier PG
versions.I still consider the change beneficial but wonder do you have some view on how should it be managed in
existingextensions to benefit them? 

I don't think there is a way to help extensions with earlier PG
versions. This is a significant patchset, which shouldn't be a subject
for backpatch. The existing extensions will benefit by simplification
of maintenance for PG 16+ releases. I think that's all we can do.

------
Regards,
Alexander Korotkov

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

23 June 2022, 13:19:54

Hi, Maxim!

On Thu, Jun 23, 2022 at 3:12 PM Maxim Orlov <orlovmg@gmail.com> wrote:
> I've reviewed the patchset and noticed some minor issues:
> - extra semicolon in macro (lead to warnings)
> - comparison of var isWorker should be done in different way
>
> Here is an upgraded version of the patchset.

Thank you for fixing this.

> Overall, I consider this patchset useful. Any opinions?

Thank you.

------
Regards,
Alexander Korotkov

Re: Custom tuplesorts for extensions

From

Maxim Orlov

Date:

06 July 2022, 13:00:04

Hi!

Overall patch looks good let's mark it as ready for committer, shall we?

Best regards,

Maxim Orlov.

Re: Custom tuplesorts for extensions

From

Matthias van de Meent

Date:

06 July 2022, 14:41:44

On Thu, 23 Jun 2022 at 14:12, Maxim Orlov <orlovmg@gmail.com> wrote:
>
> Hi!
>
> I've reviewed the patchset and noticed some minor issues:
> - extra semicolon in macro (lead to warnings)
> - comparison of var isWorker should be done in different way
>
> Here is an upgraded version of the patchset.
>
> Overall, I consider this patchset useful. Any opinions?

All of the patches are currently missing descriptive commit messages,
which I think is critical for getting this committed. As for per-patch
comments:

0001: This patch removes copytup, but it is not quite clear why -
please describe the reasoning in the commit message.

0002: getdatum1 needs comments on what it does. From the name, it
would return the datum1 from a sorttuple, but that's not what it does;
a better name would be putdatum1 or populatedatum1.

0003: in the various tuplesort_put*tuple[values] functions, the datum1
field is manually extracted. Shouldn't we use the getdatum1 functions
from 0002 instead? We could use either them directly to allow
inlining, or use the indirection through tuplesortstate.

0004: Needs a commit message, but otherwise seems fine.

0005:
> +struct TuplesortOps

This struct has no comment on what it is. Something like "Public
interface of tuplesort operators, containing data directly accessable
to users of tuplesort" should suffice, but feel free to update the
wording.

> +    void *arg;
> +};

This field could use a comment on what it is used for, and how to use it.

> +struct Tuplesortstate
> +{
> +    TuplesortOps ops;

This field needs a comment too.

0006: Needs a commit message, but otherwise seems fine.

Kind regards,

Matthias van de Meent

Re: Custom tuplesorts for extensions

From

Andres Freund

Date:

06 July 2022, 15:01:39

Hi,

I think this needs to be evaluated for performance...

I agree with the nearby comment that the commits need a bit of justification
at least to review them.


On 2022-06-23 15:12:27 +0300, Maxim Orlov wrote:
> From 03b78cdade3b86a0e97723721fa1d0bd64d0c7df Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Tue, 21 Jun 2022 13:28:27 +0300
> Subject: [PATCH v2 1/6] Remove Tuplesortstate.copytup

Yea. I was recently complaining about the pointlessness of copytup.


> From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Tue, 21 Jun 2022 14:03:13 +0300
> Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method

Hm. This adds a bunch of indirect function calls were there previously
weren't.


> From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Tue, 21 Jun 2022 14:13:56 +0300
> Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common()

There's definitely a lot of redundancy removed... But the list of branches in
puttuple_common() grew.  Perhaps we instead can add a few flags to
putttuple_common() that determine whether abbreviation should happen, so that
we only do the work necessary for the "type" of sort?



> From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Wed, 22 Jun 2022 00:14:51 +0300
> Subject: [PATCH v2 4/6] Move freeing memory away from writetup()

Seems to do more than just moving freeing around?

> @@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
>  static void
>  puttuple_common(Tuplesortstate *state, SortTuple *tuple)
>  {
> +    MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
> +
>      Assert(!LEADER(state));
>
> +    if (tuple->tuple != NULL)
> +        USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
> +

Adding even more branches into common code...



> From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Wed, 22 Jun 2022 18:11:26 +0300
> Subject: [PATCH v2 5/6] Reorganize data structures

Hard to know what this is trying to achieve.

> -struct Tuplesortstate
> +struct TuplesortOps
>  {
> -    TupSortStatus status;        /* enumerated value as shown above */
> -    int            nKeys;            /* number of columns in sort key */
> -    int            sortopt;        /* Bitmask of flags used to setup sort */
> -    bool        bounded;        /* did caller specify a maximum number of
> -                                 * tuples to return? */
> -    bool        boundUsed;        /* true if we made use of a bounded heap */
> -    int            bound;            /* if bounded, the maximum number of tuples */
> -    bool        tuples;            /* Can SortTuple.tuple ever be set? */
> -    int64        availMem;        /* remaining memory available, in bytes */
> -    int64        allowedMem;        /* total memory allowed, in bytes */
> -    int            maxTapes;        /* max number of input tapes to merge in each
> -                                 * pass */
> -    int64        maxSpace;        /* maximum amount of space occupied among sort
> -                                 * of groups, either in-memory or on-disk */
> -    bool        isMaxSpaceDisk; /* true when maxSpace is value for on-disk
> -                                 * space, false when it's value for in-memory
> -                                 * space */
> -    TupSortStatus maxSpaceStatus;    /* sort status when maxSpace was reached */
>      MemoryContext maincontext;    /* memory context for tuple sort metadata that
>                                   * persists across multiple batches */
>      MemoryContext sortcontext;    /* memory context holding most sort data */
>      MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */
> -    LogicalTapeSet *tapeset;    /* logtape.c object for tapes in a temp file */
>
>      /*
>       * These function pointers decouple the routines that must know what kind

To me it seems odd to have memory contexts and similar things in a
datastructure calls *Ops.


> From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Wed, 22 Jun 2022 21:48:05 +0300
> Subject: [PATCH v2 6/6] Split tuplesortops.c

I strongly suspect this will cause a slowdown. There was potential for
cross-function optimization that's now removed.


Greetings,

Andres Freund

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

06 July 2022, 17:45:44

.Hi!

On Wed, Jul 6, 2022 at 6:01 PM Andres Freund <andres@anarazel.de> wrote:
> I think this needs to be evaluated for performance...

Surely, this needs.

> I agree with the nearby comment that the commits need a bit of justification
> at least to review them.

Will do this.
> > From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Tue, 21 Jun 2022 14:03:13 +0300
> > Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method
>
> Hm. This adds a bunch of indirect function calls were there previously
> weren't.

Yep.  I think it worth changing this function to deal with many
SortTuple's at once.

> > From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Tue, 21 Jun 2022 14:13:56 +0300
> > Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common()
>
> There's definitely a lot of redundancy removed... But the list of branches in
> puttuple_common() grew.  Perhaps we instead can add a few flags to
> putttuple_common() that determine whether abbreviation should happen, so that
> we only do the work necessary for the "type" of sort?

Good point, will refactor that.

> > From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Wed, 22 Jun 2022 00:14:51 +0300
> > Subject: [PATCH v2 4/6] Move freeing memory away from writetup()
>
> Seems to do more than just moving freeing around?

Yes, it also move memory accounting from tuplesort_put*() to
puttuple_common().  Will revise this.

> > @@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
> >  static void
> >  puttuple_common(Tuplesortstate *state, SortTuple *tuple)
> >  {
> > +     MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
> > +
> >       Assert(!LEADER(state));
> >
> > +     if (tuple->tuple != NULL)
> > +             USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
> > +
>
> Adding even more branches into common code...

I will see how to reduce branching here.

> > From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Wed, 22 Jun 2022 18:11:26 +0300
> > Subject: [PATCH v2 5/6] Reorganize data structures
>
> Hard to know what this is trying to achieve.

Split the public interface part out of Tuplesortstate.

> > -struct Tuplesortstate
> > +struct TuplesortOps
> >  {
> > -     TupSortStatus status;           /* enumerated value as shown above */
> > -     int                     nKeys;                  /* number of columns in sort key */
> > -     int                     sortopt;                /* Bitmask of flags used to setup sort */
> > -     bool            bounded;                /* did caller specify a maximum number of
> > -                                                              * tuples to return? */
> > -     bool            boundUsed;              /* true if we made use of a bounded heap */
> > -     int                     bound;                  /* if bounded, the maximum number of tuples */
> > -     bool            tuples;                 /* Can SortTuple.tuple ever be set? */
> > -     int64           availMem;               /* remaining memory available, in bytes */
> > -     int64           allowedMem;             /* total memory allowed, in bytes */
> > -     int                     maxTapes;               /* max number of input tapes to merge in each
> > -                                                              * pass */
> > -     int64           maxSpace;               /* maximum amount of space occupied among sort
> > -                                                              * of groups, either in-memory or on-disk */
> > -     bool            isMaxSpaceDisk; /* true when maxSpace is value for on-disk
> > -                                                              * space, false when it's value for in-memory
> > -                                                              * space */
> > -     TupSortStatus maxSpaceStatus;   /* sort status when maxSpace was reached */
> >       MemoryContext maincontext;      /* memory context for tuple sort metadata that
> >                                                                * persists across multiple batches */
> >       MemoryContext sortcontext;      /* memory context holding most sort data */
> >       MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */
> > -     LogicalTapeSet *tapeset;        /* logtape.c object for tapes in a temp file */
> >
> >       /*
> >        * These function pointers decouple the routines that must know what kind
>
> To me it seems odd to have memory contexts and similar things in a
> datastructure calls *Ops.

Yep, it worth renaming TuplesortOps into TuplesortPublic or something.

> > From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Wed, 22 Jun 2022 21:48:05 +0300
> > Subject: [PATCH v2 6/6] Split tuplesortops.c
>
> I strongly suspect this will cause a slowdown. There was potential for
> cross-function optimization that's now removed.

I wonder how can cross-function optimizations bypass function
pointers.  Is it possible?

------
Regards,
Alexander Korotkov

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

07 July 2022, 07:54:30

On Wed, Jul 6, 2022 at 8:45 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
> > > From: Alexander Korotkov <akorotkov@postgresql.org>
> > > Date: Wed, 22 Jun 2022 21:48:05 +0300
> > > Subject: [PATCH v2 6/6] Split tuplesortops.c
> >
> > I strongly suspect this will cause a slowdown. There was potential for
> > cross-function optimization that's now removed.
>
> I wonder how can cross-function optimizations bypass function
> pointers.  Is it possible?

Oh, this is not just functions called by pointer.  This is also
puttuple_common() etc.  OK, this needs to be checked.

------
Regards,
Alexander Korotkov

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

12 July 2022, 08:22:53

Hi, Matthias!

The revised patchset is attached.

On Wed, Jul 6, 2022 at 5:41 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> All of the patches are currently missing descriptive commit messages,
> which I think is critical for getting this committed. As for per-patch
> comments:
>
> 0001: This patch removes copytup, but it is not quite clear why -
> please describe the reasoning in the commit message.

Because spit of logic between Tuplesortstate.copytup() function and
tuplesort_put*() functions is unclear.  It doesn't look like we need
an abstraction here, while all the work could be done in
tuplesort_put*().

> 0002: getdatum1 needs comments on what it does. From the name, it
> would return the datum1 from a sorttuple, but that's not what it does;
> a better name would be putdatum1 or populatedatum1.
>
> 0003: in the various tuplesort_put*tuple[values] functions, the datum1
> field is manually extracted. Shouldn't we use the getdatum1 functions
> from 0002 instead? We could use either them directly to allow
> inlining, or use the indirection through tuplesortstate.

getdatum1() was a bad name.  Actually it restores original datum1
during rollback of abbreviations.  I've replaced it with
removeabbrev(), which seems name to me and process many SortTuple's
during one call.

> 0004: Needs a commit message, but otherwise seems fine.

Commit message is added.

> 0005:
> > +struct TuplesortOps
>
> This struct has no comment on what it is. Something like "Public
> interface of tuplesort operators, containing data directly accessable
> to users of tuplesort" should suffice, but feel free to update the
> wording.
>
> > +    void *arg;
> > +};
>
> This field could use a comment on what it is used for, and how to use it.
>
> > +struct Tuplesortstate
> > +{
> > +    TuplesortOps ops;
>
> This field needs a comment too.
>
> 0006: Needs a commit message, but otherwise seems fine.

TuplesortOps was renamed to TuplesortPublic.  Comments and commit
messages are added.

There are some places, which potentially could cause a slowdown.  I'm
going to make some experiments with that.

------
Regards,
Alexander Korotkov

Attachment

Re: Custom tuplesorts for extensions

From

John Naylor

Date:

21 July 2022, 03:44:26

On Tue, Jul 12, 2022 at 3:23 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> There are some places, which potentially could cause a slowdown. I'm
> going to make some experiments with that.

I haven't looked at the patches, so I don't know of a specific place to look for a slowdown, but I thought it might help to perform the same query tests as my most recent test for evaluating qsort variants (some description in [1]), and here is the spreadsheet. Overall, the differences look like noise. A few cases with unabbreviatable text look a bit faster with the patch. I'm not sure if that's a real difference, but in any case I don't see a slowdown anywhere.

[1] https://www.postgresql.org/message-id/CAFBsxsHeTACMP1JVQ%2Bm35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ%40mail.gmail.com
--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

master-vs-custom-ext-v2.ods

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

21 July 2022, 14:40:19

Hi, John!

On Thu, Jul 21, 2022 at 6:44 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
> On Tue, Jul 12, 2022 at 3:23 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > There are some places, which potentially could cause a slowdown.  I'm
> > going to make some experiments with that.
>
> I haven't looked at the patches, so I don't know of a specific place to look for a slowdown, but I thought it might
helpto perform the same query tests as my most recent test for evaluating qsort variants (some description in [1]), and
hereis the spreadsheet. Overall, the differences look like noise. A few cases with unabbreviatable text look a bit
fasterwith the patch. I'm not sure if that's a real difference, but in any case I don't see a slowdown anywhere. 
>
> [1] https://www.postgresql.org/message-id/CAFBsxsHeTACMP1JVQ%2Bm35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ%40mail.gmail.com

Great, thank you very much for the feedback!

------
Regards,
Alexander Korotkov

Re: Custom tuplesorts for extensions

From

Pavel Borisov

Date:

22 July 2022, 15:56:20

I've looked through the updated patch. Overall it looks good enough.

Some minor things:

- PARALLEL_SORT macro is based on coordinate struct instead of state struct. In some calls(i.e. from _bt_spools_heapscan) coordinate could appear to be NULL, which can be a segfault on items dereference inside the macro.

- state->worker and coordinate->isWorker a little bit differ in semantics i.e.:

..............................................worker............... leader

state -> worker........................ >=0.....................-1

coordinate ->isWorker............. 1..........................0

- in tuplesort_begin_index_btree I suppose it should be base->nKeys instead of state->nKeys

- Cfbot reports gcc warnings due to mixed code and declarations. So I used this to beautify code in tuplesortvariants.c a little. (This is added as a separate patch 0007)

All these things are corrected/done in a new version 3 of a patchset (PFA). For me, the patchset seems like a long-needed thing to support PostgreSQL extensibility. Overall corrections in v3 are minor, so I'd like to mark the patch as RfC if there are no objections.

Best regards,
Pavel Borisov

Supabase.

Attachment

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

24 July 2022, 12:24:42

Hi, Pavel!

Thank you for your review and corrections.

On Fri, Jul 22, 2022 at 6:57 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've looked through the updated patch. Overall it looks good enough.
>
> Some minor things:
>
> - PARALLEL_SORT macro is based on coordinate struct instead of state struct. In some calls(i.e. from
_bt_spools_heapscan) coordinate could appear to be NULL, which can be a segfault on items dereference inside the macro. 
>
> - state->worker and coordinate->isWorker a little bit differ in semantics i.e.:
> ..............................................worker............... leader
> state -> worker........................  >=0.....................-1
> coordinate ->isWorker............. 1..........................0
>
> - in tuplesort_begin_index_btree I suppose it should be base->nKeys instead of state->nKeys

Perfect, thank you!

> - Cfbot reports gcc warnings due to mixed code and declarations. So I used this to beautify code in
tuplesortvariants.ca little. (This is added as a separate patch 0007) 

It appears that warnings were caused by the extra semicolon in
TuplesortstateGetPublic() macro.  I've removed that semicolon, and I
don't think we need a beautification patch.  Also, please note that
there is no point to add indentation, which doesn't survive pgindent.

> All these things are corrected/done in a new version 3 of a patchset (PFA). For me, the patchset seems like a
long-neededthing to support PostgreSQL extensibility. Overall corrections in v3 are minor, so I'd like to mark the
patchas RfC if there are no objections. 

Thank you.  I've also revised the comments in the top of tuplesort.c
and tuplesortvariants.c.  The revised patchset is attached.

Also, my OrioleDB colleagues Ilya Kobets and Tatsiana Yaumenenka run
tests to check if the patchset causes a performance regression.  The
script and results are present in the "tuplesort_patch_test.zip"
archive.  The final comparison is given in the result/final_table.txt.
In short, they repeat each test 10 times and there is no difference
exceeding the random variation.

------
Regards,
Alexander Korotkov

Attachment

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

24 July 2022, 21:52:49

On Sun, Jul 24, 2022 at 3:24 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Also, my OrioleDB colleagues Ilya Kobets and Tatsiana Yaumenenka run
> tests to check if the patchset causes a performance regression.  The
> script and results are present in the "tuplesort_patch_test.zip"
> archive.  The final comparison is given in the result/final_table.txt.
> In short, they repeat each test 10 times and there is no difference
> exceeding the random variation.

I see the last revision passed cfbot without warnings.  I've added the
meta information to commit messages.  Also, I've re-run through the
thread and it seems all the comments are addressed.  I'm going to push
this if there are no objections.

------
Regards,
Alexander Korotkov

Attachment

Re: Custom tuplesorts for extensions

From

Justin Pryzby

Date:

24 July 2022, 23:23:05

Note that 0001+0002 (without the others) incurs warnings:

$ time { make -j4 clean; make -j4; } >/dev/null
tuplesort.c:1883:9: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:1955:10: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:2026:9: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:2103:10: warning: unused variable 'i' [-Wunused-variable]

(I wondered in the past if cfbot should try to test for clean builds of subsets
of patchsets, and it came up recently with the JSON patches.)

Also, this comment has some bad indentation:

* Set state to be consistent with never trying abbreviation.

-- 
Justin

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

25 July 2022, 09:30:24

Hi Justin!

On Mon, Jul 25, 2022 at 2:23 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Note that 0001+0002 (without the others) incurs warnings:
>
> $ time { make -j4 clean; make -j4; } >/dev/null
> tuplesort.c:1883:9: warning: unused variable 'i' [-Wunused-variable]
> tuplesort.c:1955:10: warning: unused variable 'i' [-Wunused-variable]
> tuplesort.c:2026:9: warning: unused variable 'i' [-Wunused-variable]
> tuplesort.c:2103:10: warning: unused variable 'i' [-Wunused-variable]
>
> (I wondered in the past if cfbot should try to test for clean builds of subsets
> of patchsets, and it came up recently with the JSON patches.)
>
> Also, this comment has some bad indentation:
>
> * Set state to be consistent with never trying abbreviation.

Thank you for caching this.  Fixed in the revision attached.

Testing subsets of patchsets in cfbot looks like a good idea to me.
However, I'm not sure if we always require subsets to be consistent.

------
Regards,
Alexander Korotkov

Attachment

Re: Custom tuplesorts for extensions

From

Pavel Borisov

Date:

25 July 2022, 13:00:49

Thank you for caching this. Fixed in the revision attached.

Testing subsets of patchsets in cfbot looks like a good idea to me.
However, I'm not sure if we always require subsets to be consistent.

Hi, hackers!

I've looked through a new v6 of a patchset and find it ok. When applied 0001+0002 only I don't see warnings anymore. Build and tests are successful and Cfbot also looks good. I've marked the patch as RfC.

Best regards,
Pavel Borisov

Re: Custom tuplesorts for extensions

From

Alexander Korotkov

Date:

27 July 2022, 05:36:56

On Mon, Jul 25, 2022 at 4:01 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> Thank you for caching this.  Fixed in the revision attached.
>>
>> Testing subsets of patchsets in cfbot looks like a good idea to me.
>> However, I'm not sure if we always require subsets to be consistent.
>
>
> Hi, hackers!
>
> I've looked through a new v6 of a patchset and find it ok. When applied 0001+0002 only I don't see warnings anymore.
Buildand tests are successful and Cfbot also looks good. I've marked the patch as RfC.
 

Thank you, pushed!

------
Regards,
Alexander Korotkov