Thread: Custom tuplesorts for extensions
Hackers, Some PostgreSQL extensions need to sort their pieces of data. Then it worth to re-use our tuplesort. But despite our tuplesort having extensibility, it's hidden inside tuplesort.c. There are at least a couple of examples of how extensions deal with that. 1. RUM table access method: https://github.com/postgrespro/rum RUM repository contains a copy of tuplesort.c for each major PostgreSQL release. A reliable solution, but this is not how things are intended to work, right? 2. OrioleDB table access method: https://github.com/orioledb/orioledb OrioleDB runs on patches PostgreSQL. It contains a patch, which just exposes all the guts of tuplesort.c to the tuplesort.h https://github.com/orioledb/postgres/commit/d42755f52c I think we need a proper way to let extension re-use our core tuplesort facility. The attached patchset is intended to do this the right way. Patches don't revise all the comments and lack code beautification. The intention behind publishing this revision is to verify the direction and get some feedback for further work. 0001-Remove-Tuplesortstate.copytup-v1.patch It's unclear for me how do we split functionality between Tuplesortstate.copytup() function and tuplesort_put*() functions. For instance, copytup_index() and copytup_datum() return error while tuplesort_putindextuplevalues() and tuplesort_putdatum() do their work. The patch removes Tuplesortstate.copytup() altogether, putting their functions to tuplesort_put*(). 0002-Tuplesortstate.getdatum1-method-v1.patch 0003-Put-abbreviation-logic-into-puttuple_common-v1.patch The tuplesort_put*() functions contains common part related to dealing with abbreviation. The 0002 extracts logic of getting value of SortTuple.datum1 into Tuplesortstate.getdatum1() function. Thanks to this new interface function, 0003 puts abbreviation logic into puttuple(). 0004-Move-freeing-memory-away-from-writetup-v1.patch Assuming that SortTuple.tuple is always just a single chunk of memory, we can put memory counting logic away from Tuplesortstate.writetup(). This makes Tuplesortstate.getdatum1() easier to implement without knowledge of tuplesort.c guts. 0005-Reorganize-data-structures-v1.patch This commit splits the "public" part of Tuplesortstate into TuplesortOps, which is intended to be exposed outside tuplesort.c. 0006-Split-tuplesortops.c-v1.patch This patch finally splits tuplesortops.c from tuplesort.c. tuplesort.c leaves which generic routines for tuple sort, while tuplesortops.c provides the implementation for particular tuple formats. ------ Regards, Alexander Korotkov
Attachment
I've bumped into this case in RUM extension. The need to build it with tuplesort changes in different PG versions led me to reluctantly including different tuplesort.c versions into the extension code. So I totally support the intention of this patch and I'm planning to invest some time to review it.
--
Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.
1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c
I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.
I still have one doubt about the thing: the compatibility with previous PG versions requires me to support code paths that I already added into RUM extension. I won't be able to drop it from extension for quite long time in the future. It could be avoided if we backpatch this, which seems doubtful to me provided the volume of code changes.
If we just change this thing since say v16 this will only help to extensions that doesn't support earlier PG versions. I still consider the change beneficial but wonder do you have some view on how should it be managed in existing extensions to benefit them?
Hi!
I've reviewed the patchset and noticed some minor issues:
- extra semicolon in macro (lead to warnings)
- comparison of var isWorker should be done in different way
Here is an upgraded version of the patchset.
Overall, I consider this patchset useful. Any opinions?
--
Best regards,
Maxim Orlov.
Attachment
Hi, Pavel! Thank you for your feedback. On Thu, Jun 23, 2022 at 2:26 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >> Some PostgreSQL extensions need to sort their pieces of data. Then it >> worth to re-use our tuplesort. But despite our tuplesort having >> extensibility, it's hidden inside tuplesort.c. There are at least a >> couple of examples of how extensions deal with that. >> >> 1. RUM table access method: https://github.com/postgrespro/rum >> RUM repository contains a copy of tuplesort.c for each major >> PostgreSQL release. A reliable solution, but this is not how things >> are intended to work, right? >> 2. OrioleDB table access method: https://github.com/orioledb/orioledb >> OrioleDB runs on patches PostgreSQL. It contains a patch, which just >> exposes all the guts of tuplesort.c to the tuplesort.h >> https://github.com/orioledb/postgres/commit/d42755f52c >> >> I think we need a proper way to let extension re-use our core >> tuplesort facility. The attached patchset is intended to do this the >> right way. Patches don't revise all the comments and lack code >> beautification. The intention behind publishing this revision is to >> verify the direction and get some feedback for further work. > > > I still have one doubt about the thing: the compatibility with previous PG versions requires me to support code paths thatI already added into RUM extension. I won't be able to drop it from extension for quite long time in the future. It couldbe avoided if we backpatch this, which seems doubtful to me provided the volume of code changes. > > If we just change this thing since say v16 this will only help to extensions that doesn't support earlier PG versions.I still consider the change beneficial but wonder do you have some view on how should it be managed in existingextensions to benefit them? I don't think there is a way to help extensions with earlier PG versions. This is a significant patchset, which shouldn't be a subject for backpatch. The existing extensions will benefit by simplification of maintenance for PG 16+ releases. I think that's all we can do. ------ Regards, Alexander Korotkov
Hi, Maxim! On Thu, Jun 23, 2022 at 3:12 PM Maxim Orlov <orlovmg@gmail.com> wrote: > I've reviewed the patchset and noticed some minor issues: > - extra semicolon in macro (lead to warnings) > - comparison of var isWorker should be done in different way > > Here is an upgraded version of the patchset. Thank you for fixing this. > Overall, I consider this patchset useful. Any opinions? Thank you. ------ Regards, Alexander Korotkov
Hi!
Overall patch looks good let's mark it as ready for committer, shall we?
Best regards,
Maxim Orlov.
On Thu, 23 Jun 2022 at 14:12, Maxim Orlov <orlovmg@gmail.com> wrote: > > Hi! > > I've reviewed the patchset and noticed some minor issues: > - extra semicolon in macro (lead to warnings) > - comparison of var isWorker should be done in different way > > Here is an upgraded version of the patchset. > > Overall, I consider this patchset useful. Any opinions? All of the patches are currently missing descriptive commit messages, which I think is critical for getting this committed. As for per-patch comments: 0001: This patch removes copytup, but it is not quite clear why - please describe the reasoning in the commit message. 0002: getdatum1 needs comments on what it does. From the name, it would return the datum1 from a sorttuple, but that's not what it does; a better name would be putdatum1 or populatedatum1. 0003: in the various tuplesort_put*tuple[values] functions, the datum1 field is manually extracted. Shouldn't we use the getdatum1 functions from 0002 instead? We could use either them directly to allow inlining, or use the indirection through tuplesortstate. 0004: Needs a commit message, but otherwise seems fine. 0005: > +struct TuplesortOps This struct has no comment on what it is. Something like "Public interface of tuplesort operators, containing data directly accessable to users of tuplesort" should suffice, but feel free to update the wording. > + void *arg; > +}; This field could use a comment on what it is used for, and how to use it. > +struct Tuplesortstate > +{ > + TuplesortOps ops; This field needs a comment too. 0006: Needs a commit message, but otherwise seems fine. Kind regards, Matthias van de Meent
Hi, I think this needs to be evaluated for performance... I agree with the nearby comment that the commits need a bit of justification at least to review them. On 2022-06-23 15:12:27 +0300, Maxim Orlov wrote: > From 03b78cdade3b86a0e97723721fa1d0bd64d0c7df Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov <akorotkov@postgresql.org> > Date: Tue, 21 Jun 2022 13:28:27 +0300 > Subject: [PATCH v2 1/6] Remove Tuplesortstate.copytup Yea. I was recently complaining about the pointlessness of copytup. > From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov <akorotkov@postgresql.org> > Date: Tue, 21 Jun 2022 14:03:13 +0300 > Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method Hm. This adds a bunch of indirect function calls were there previously weren't. > From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov <akorotkov@postgresql.org> > Date: Tue, 21 Jun 2022 14:13:56 +0300 > Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common() There's definitely a lot of redundancy removed... But the list of branches in puttuple_common() grew. Perhaps we instead can add a few flags to putttuple_common() that determine whether abbreviation should happen, so that we only do the work necessary for the "type" of sort? > From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov <akorotkov@postgresql.org> > Date: Wed, 22 Jun 2022 00:14:51 +0300 > Subject: [PATCH v2 4/6] Move freeing memory away from writetup() Seems to do more than just moving freeing around? > @@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull) > static void > puttuple_common(Tuplesortstate *state, SortTuple *tuple) > { > + MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext); > + > Assert(!LEADER(state)); > > + if (tuple->tuple != NULL) > + USEMEM(state, GetMemoryChunkSpace(tuple->tuple)); > + Adding even more branches into common code... > From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov <akorotkov@postgresql.org> > Date: Wed, 22 Jun 2022 18:11:26 +0300 > Subject: [PATCH v2 5/6] Reorganize data structures Hard to know what this is trying to achieve. > -struct Tuplesortstate > +struct TuplesortOps > { > - TupSortStatus status; /* enumerated value as shown above */ > - int nKeys; /* number of columns in sort key */ > - int sortopt; /* Bitmask of flags used to setup sort */ > - bool bounded; /* did caller specify a maximum number of > - * tuples to return? */ > - bool boundUsed; /* true if we made use of a bounded heap */ > - int bound; /* if bounded, the maximum number of tuples */ > - bool tuples; /* Can SortTuple.tuple ever be set? */ > - int64 availMem; /* remaining memory available, in bytes */ > - int64 allowedMem; /* total memory allowed, in bytes */ > - int maxTapes; /* max number of input tapes to merge in each > - * pass */ > - int64 maxSpace; /* maximum amount of space occupied among sort > - * of groups, either in-memory or on-disk */ > - bool isMaxSpaceDisk; /* true when maxSpace is value for on-disk > - * space, false when it's value for in-memory > - * space */ > - TupSortStatus maxSpaceStatus; /* sort status when maxSpace was reached */ > MemoryContext maincontext; /* memory context for tuple sort metadata that > * persists across multiple batches */ > MemoryContext sortcontext; /* memory context holding most sort data */ > MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */ > - LogicalTapeSet *tapeset; /* logtape.c object for tapes in a temp file */ > > /* > * These function pointers decouple the routines that must know what kind To me it seems odd to have memory contexts and similar things in a datastructure calls *Ops. > From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001 > From: Alexander Korotkov <akorotkov@postgresql.org> > Date: Wed, 22 Jun 2022 21:48:05 +0300 > Subject: [PATCH v2 6/6] Split tuplesortops.c I strongly suspect this will cause a slowdown. There was potential for cross-function optimization that's now removed. Greetings, Andres Freund
.Hi! On Wed, Jul 6, 2022 at 6:01 PM Andres Freund <andres@anarazel.de> wrote: > I think this needs to be evaluated for performance... Surely, this needs. > I agree with the nearby comment that the commits need a bit of justification > at least to review them. Will do this. > > From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001 > > From: Alexander Korotkov <akorotkov@postgresql.org> > > Date: Tue, 21 Jun 2022 14:03:13 +0300 > > Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method > > Hm. This adds a bunch of indirect function calls were there previously > weren't. Yep. I think it worth changing this function to deal with many SortTuple's at once. > > From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001 > > From: Alexander Korotkov <akorotkov@postgresql.org> > > Date: Tue, 21 Jun 2022 14:13:56 +0300 > > Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common() > > There's definitely a lot of redundancy removed... But the list of branches in > puttuple_common() grew. Perhaps we instead can add a few flags to > putttuple_common() that determine whether abbreviation should happen, so that > we only do the work necessary for the "type" of sort? Good point, will refactor that. > > From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001 > > From: Alexander Korotkov <akorotkov@postgresql.org> > > Date: Wed, 22 Jun 2022 00:14:51 +0300 > > Subject: [PATCH v2 4/6] Move freeing memory away from writetup() > > Seems to do more than just moving freeing around? Yes, it also move memory accounting from tuplesort_put*() to puttuple_common(). Will revise this. > > @@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull) > > static void > > puttuple_common(Tuplesortstate *state, SortTuple *tuple) > > { > > + MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext); > > + > > Assert(!LEADER(state)); > > > > + if (tuple->tuple != NULL) > > + USEMEM(state, GetMemoryChunkSpace(tuple->tuple)); > > + > > Adding even more branches into common code... I will see how to reduce branching here. > > From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001 > > From: Alexander Korotkov <akorotkov@postgresql.org> > > Date: Wed, 22 Jun 2022 18:11:26 +0300 > > Subject: [PATCH v2 5/6] Reorganize data structures > > Hard to know what this is trying to achieve. Split the public interface part out of Tuplesortstate. > > -struct Tuplesortstate > > +struct TuplesortOps > > { > > - TupSortStatus status; /* enumerated value as shown above */ > > - int nKeys; /* number of columns in sort key */ > > - int sortopt; /* Bitmask of flags used to setup sort */ > > - bool bounded; /* did caller specify a maximum number of > > - * tuples to return? */ > > - bool boundUsed; /* true if we made use of a bounded heap */ > > - int bound; /* if bounded, the maximum number of tuples */ > > - bool tuples; /* Can SortTuple.tuple ever be set? */ > > - int64 availMem; /* remaining memory available, in bytes */ > > - int64 allowedMem; /* total memory allowed, in bytes */ > > - int maxTapes; /* max number of input tapes to merge in each > > - * pass */ > > - int64 maxSpace; /* maximum amount of space occupied among sort > > - * of groups, either in-memory or on-disk */ > > - bool isMaxSpaceDisk; /* true when maxSpace is value for on-disk > > - * space, false when it's value for in-memory > > - * space */ > > - TupSortStatus maxSpaceStatus; /* sort status when maxSpace was reached */ > > MemoryContext maincontext; /* memory context for tuple sort metadata that > > * persists across multiple batches */ > > MemoryContext sortcontext; /* memory context holding most sort data */ > > MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */ > > - LogicalTapeSet *tapeset; /* logtape.c object for tapes in a temp file */ > > > > /* > > * These function pointers decouple the routines that must know what kind > > To me it seems odd to have memory contexts and similar things in a > datastructure calls *Ops. Yep, it worth renaming TuplesortOps into TuplesortPublic or something. > > From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001 > > From: Alexander Korotkov <akorotkov@postgresql.org> > > Date: Wed, 22 Jun 2022 21:48:05 +0300 > > Subject: [PATCH v2 6/6] Split tuplesortops.c > > I strongly suspect this will cause a slowdown. There was potential for > cross-function optimization that's now removed. I wonder how can cross-function optimizations bypass function pointers. Is it possible? ------ Regards, Alexander Korotkov
On Wed, Jul 6, 2022 at 8:45 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001 > > > From: Alexander Korotkov <akorotkov@postgresql.org> > > > Date: Wed, 22 Jun 2022 21:48:05 +0300 > > > Subject: [PATCH v2 6/6] Split tuplesortops.c > > > > I strongly suspect this will cause a slowdown. There was potential for > > cross-function optimization that's now removed. > > I wonder how can cross-function optimizations bypass function > pointers. Is it possible? Oh, this is not just functions called by pointer. This is also puttuple_common() etc. OK, this needs to be checked. ------ Regards, Alexander Korotkov
Hi, Matthias! The revised patchset is attached. On Wed, Jul 6, 2022 at 5:41 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > All of the patches are currently missing descriptive commit messages, > which I think is critical for getting this committed. As for per-patch > comments: > > 0001: This patch removes copytup, but it is not quite clear why - > please describe the reasoning in the commit message. Because spit of logic between Tuplesortstate.copytup() function and tuplesort_put*() functions is unclear. It doesn't look like we need an abstraction here, while all the work could be done in tuplesort_put*(). > 0002: getdatum1 needs comments on what it does. From the name, it > would return the datum1 from a sorttuple, but that's not what it does; > a better name would be putdatum1 or populatedatum1. > > 0003: in the various tuplesort_put*tuple[values] functions, the datum1 > field is manually extracted. Shouldn't we use the getdatum1 functions > from 0002 instead? We could use either them directly to allow > inlining, or use the indirection through tuplesortstate. getdatum1() was a bad name. Actually it restores original datum1 during rollback of abbreviations. I've replaced it with removeabbrev(), which seems name to me and process many SortTuple's during one call. > 0004: Needs a commit message, but otherwise seems fine. Commit message is added. > 0005: > > +struct TuplesortOps > > This struct has no comment on what it is. Something like "Public > interface of tuplesort operators, containing data directly accessable > to users of tuplesort" should suffice, but feel free to update the > wording. > > > + void *arg; > > +}; > > This field could use a comment on what it is used for, and how to use it. > > > +struct Tuplesortstate > > +{ > > + TuplesortOps ops; > > This field needs a comment too. > > 0006: Needs a commit message, but otherwise seems fine. TuplesortOps was renamed to TuplesortPublic. Comments and commit messages are added. There are some places, which potentially could cause a slowdown. I'm going to make some experiments with that. ------ Regards, Alexander Korotkov
Attachment
- 0001-Remove-Tuplesortstate.copytup-function-v2.patch
- 0002-Add-new-Tuplesortstate.removeabbrev-function-v2.patch
- 0003-Put-abbreviation-logic-into-puttuple_common-v2.patch
- 0005-Split-TuplesortPublic-from-Tuplesortstate-v2.patch
- 0004-Move-memory-management-away-from-writetup-and-tup-v2.patch
- 0006-Split-tuplesortvariants.c-from-tuplesort.c-v2.patch
On Tue, Jul 12, 2022 at 3:23 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> There are some places, which potentially could cause a slowdown. I'm
> going to make some experiments with that.
I haven't looked at the patches, so I don't know of a specific place to look for a slowdown, but I thought it might help to perform the same query tests as my most recent test for evaluating qsort variants (some description in [1]), and here is the spreadsheet. Overall, the differences look like noise. A few cases with unabbreviatable text look a bit faster with the patch. I'm not sure if that's a real difference, but in any case I don't see a slowdown anywhere.
[1] https://www.postgresql.org/message-id/CAFBsxsHeTACMP1JVQ%2Bm35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ%40mail.gmail.com
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
Hi, John! On Thu, Jul 21, 2022 at 6:44 AM John Naylor <john.naylor@enterprisedb.com> wrote: > On Tue, Jul 12, 2022 at 3:23 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > There are some places, which potentially could cause a slowdown. I'm > > going to make some experiments with that. > > I haven't looked at the patches, so I don't know of a specific place to look for a slowdown, but I thought it might helpto perform the same query tests as my most recent test for evaluating qsort variants (some description in [1]), and hereis the spreadsheet. Overall, the differences look like noise. A few cases with unabbreviatable text look a bit fasterwith the patch. I'm not sure if that's a real difference, but in any case I don't see a slowdown anywhere. > > [1] https://www.postgresql.org/message-id/CAFBsxsHeTACMP1JVQ%2Bm35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ%40mail.gmail.com Great, thank you very much for the feedback! ------ Regards, Alexander Korotkov
I've looked through the updated patch. Overall it looks good enough.
Some minor things:
- PARALLEL_SORT macro is based on coordinate struct instead of state struct. In some calls(i.e. from _bt_spools_heapscan) coordinate could appear to be NULL, which can be a segfault on items dereference inside the macro.
- state->worker and coordinate->isWorker a little bit differ in semantics i.e.:
..............................................worker............... leader
state -> worker........................ >=0.....................-1
coordinate ->isWorker............. 1..........................0
- in tuplesort_begin_index_btree I suppose it should be base->nKeys instead of state->nKeys
- Cfbot reports gcc warnings due to mixed code and declarations. So I used this to beautify code in tuplesortvariants.c a little. (This is added as a separate patch 0007)
All these things are corrected/done in a new version 3 of a patchset (PFA). For me, the patchset seems like a long-needed thing to support PostgreSQL extensibility. Overall corrections in v3 are minor, so I'd like to mark the patch as RfC if there are no objections.
Best regards,
Pavel Borisov
Pavel Borisov
Supabase.
Attachment
- v3-0001-Remove-Tuplesortstate.copytup-function.patch
- v3-0007-Tuplesort-code-beautification.patch
- v3-0006-Split-tuplesortvariants.c-from-tuplesort.c.patch
- v3-0004-Move-memory-management-away-from-writetup-and-tup.patch
- v3-0005-Split-TuplesortPublic-from-Tuplesortstate.patch
- v3-0003-Put-abbreviation-logic-into-puttuple_common.patch
- v3-0002-Add-new-Tuplesortstate.removeabbrev-function.patch
Hi, Pavel! Thank you for your review and corrections. On Fri, Jul 22, 2022 at 6:57 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > I've looked through the updated patch. Overall it looks good enough. > > Some minor things: > > - PARALLEL_SORT macro is based on coordinate struct instead of state struct. In some calls(i.e. from _bt_spools_heapscan) coordinate could appear to be NULL, which can be a segfault on items dereference inside the macro. > > - state->worker and coordinate->isWorker a little bit differ in semantics i.e.: > ..............................................worker............... leader > state -> worker........................ >=0.....................-1 > coordinate ->isWorker............. 1..........................0 > > - in tuplesort_begin_index_btree I suppose it should be base->nKeys instead of state->nKeys Perfect, thank you! > - Cfbot reports gcc warnings due to mixed code and declarations. So I used this to beautify code in tuplesortvariants.ca little. (This is added as a separate patch 0007) It appears that warnings were caused by the extra semicolon in TuplesortstateGetPublic() macro. I've removed that semicolon, and I don't think we need a beautification patch. Also, please note that there is no point to add indentation, which doesn't survive pgindent. > All these things are corrected/done in a new version 3 of a patchset (PFA). For me, the patchset seems like a long-neededthing to support PostgreSQL extensibility. Overall corrections in v3 are minor, so I'd like to mark the patchas RfC if there are no objections. Thank you. I've also revised the comments in the top of tuplesort.c and tuplesortvariants.c. The revised patchset is attached. Also, my OrioleDB colleagues Ilya Kobets and Tatsiana Yaumenenka run tests to check if the patchset causes a performance regression. The script and results are present in the "tuplesort_patch_test.zip" archive. The final comparison is given in the result/final_table.txt. In short, they repeat each test 10 times and there is no difference exceeding the random variation. ------ Regards, Alexander Korotkov
Attachment
- 0001-Remove-Tuplesortstate.copytup-function-v4.patch
- 0004-Move-memory-management-away-from-writetup-and-tup-v4.patch
- 0002-Add-new-Tuplesortstate.removeabbrev-function-v4.patch
- 0005-Split-TuplesortPublic-from-Tuplesortstate-v4.patch
- 0003-Put-abbreviation-logic-into-puttuple_common-v4.patch
- 0006-Split-tuplesortvariants.c-from-tuplesort.c-v4.patch
- tuplesort_patch_test.zip
On Sun, Jul 24, 2022 at 3:24 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Also, my OrioleDB colleagues Ilya Kobets and Tatsiana Yaumenenka run > tests to check if the patchset causes a performance regression. The > script and results are present in the "tuplesort_patch_test.zip" > archive. The final comparison is given in the result/final_table.txt. > In short, they repeat each test 10 times and there is no difference > exceeding the random variation. I see the last revision passed cfbot without warnings. I've added the meta information to commit messages. Also, I've re-run through the thread and it seems all the comments are addressed. I'm going to push this if there are no objections. ------ Regards, Alexander Korotkov
Attachment
- 0004-Move-memory-management-away-from-writetup-and-tup-v5.patch
- 0002-Add-new-Tuplesortstate.removeabbrev-function-v5.patch
- 0005-Split-TuplesortPublic-from-Tuplesortstate-v5.patch
- 0001-Remove-Tuplesortstate.copytup-function-v5.patch
- 0003-Put-abbreviation-logic-into-puttuple_common-v5.patch
- 0006-Split-tuplesortvariants.c-from-tuplesort.c-v5.patch
Note that 0001+0002 (without the others) incurs warnings: $ time { make -j4 clean; make -j4; } >/dev/null tuplesort.c:1883:9: warning: unused variable 'i' [-Wunused-variable] tuplesort.c:1955:10: warning: unused variable 'i' [-Wunused-variable] tuplesort.c:2026:9: warning: unused variable 'i' [-Wunused-variable] tuplesort.c:2103:10: warning: unused variable 'i' [-Wunused-variable] (I wondered in the past if cfbot should try to test for clean builds of subsets of patchsets, and it came up recently with the JSON patches.) Also, this comment has some bad indentation: * Set state to be consistent with never trying abbreviation. -- Justin
Hi Justin! On Mon, Jul 25, 2022 at 2:23 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > Note that 0001+0002 (without the others) incurs warnings: > > $ time { make -j4 clean; make -j4; } >/dev/null > tuplesort.c:1883:9: warning: unused variable 'i' [-Wunused-variable] > tuplesort.c:1955:10: warning: unused variable 'i' [-Wunused-variable] > tuplesort.c:2026:9: warning: unused variable 'i' [-Wunused-variable] > tuplesort.c:2103:10: warning: unused variable 'i' [-Wunused-variable] > > (I wondered in the past if cfbot should try to test for clean builds of subsets > of patchsets, and it came up recently with the JSON patches.) > > Also, this comment has some bad indentation: > > * Set state to be consistent with never trying abbreviation. Thank you for caching this. Fixed in the revision attached. Testing subsets of patchsets in cfbot looks like a good idea to me. However, I'm not sure if we always require subsets to be consistent. ------ Regards, Alexander Korotkov
Attachment
- 0004-Move-memory-management-away-from-writetup-and-tup-v6.patch
- 0001-Remove-Tuplesortstate.copytup-function-v6.patch
- 0002-Add-new-Tuplesortstate.removeabbrev-function-v6.patch
- 0003-Put-abbreviation-logic-into-puttuple_common-v6.patch
- 0005-Split-TuplesortPublic-from-Tuplesortstate-v6.patch
- 0006-Split-tuplesortvariants.c-from-tuplesort.c-v6.patch
Thank you for caching this. Fixed in the revision attached.
Testing subsets of patchsets in cfbot looks like a good idea to me.
However, I'm not sure if we always require subsets to be consistent.
Hi, hackers!
I've looked through a new v6 of a patchset and find it ok. When applied 0001+0002 only I don't see warnings anymore. Build and tests are successful and Cfbot also looks good. I've marked the patch as RfC.
Best regards,
Pavel Borisov
Pavel Borisov
On Mon, Jul 25, 2022 at 4:01 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: >> Thank you for caching this. Fixed in the revision attached. >> >> Testing subsets of patchsets in cfbot looks like a good idea to me. >> However, I'm not sure if we always require subsets to be consistent. > > > Hi, hackers! > > I've looked through a new v6 of a patchset and find it ok. When applied 0001+0002 only I don't see warnings anymore. Buildand tests are successful and Cfbot also looks good. I've marked the patch as RfC. Thank you, pushed! ------ Regards, Alexander Korotkov