Custom tuplesorts for extensions - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Custom tuplesorts for extensions
Date
Msg-id CAPpHfdvjix0Ahx-H3Jp1M2R+_74P-zKnGGygx4OWr=bUQ8BNdw@mail.gmail.com
Whole thread Raw
Responses Re: Custom tuplesorts for extensions  (Pavel Borisov <pashkin.elfe@gmail.com>)
Re: Custom tuplesorts for extensions  (Pavel Borisov <pashkin.elfe@gmail.com>)
List pgsql-hackers
Hackers,

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

0001-Remove-Tuplesortstate.copytup-v1.patch
It's unclear for me how do we split functionality between
Tuplesortstate.copytup() function and tuplesort_put*() functions. For
instance, copytup_index() and copytup_datum() return error while
tuplesort_putindextuplevalues() and tuplesort_putdatum() do their
work. The patch removes Tuplesortstate.copytup() altogether, putting
their functions to tuplesort_put*().

0002-Tuplesortstate.getdatum1-method-v1.patch
0003-Put-abbreviation-logic-into-puttuple_common-v1.patch
The tuplesort_put*() functions contains common part related to dealing
with abbreviation. The 0002 extracts logic of getting value of
SortTuple.datum1 into Tuplesortstate.getdatum1() function. Thanks to
this new interface function, 0003 puts abbreviation logic into
puttuple().

0004-Move-freeing-memory-away-from-writetup-v1.patch
Assuming that SortTuple.tuple is always just a single chunk of memory,
we can put memory counting logic away from Tuplesortstate.writetup().
This makes Tuplesortstate.getdatum1() easier to implement without
knowledge of tuplesort.c guts.

0005-Reorganize-data-structures-v1.patch
This commit splits the "public" part of Tuplesortstate into
TuplesortOps, which is intended to be exposed outside tuplesort.c.

0006-Split-tuplesortops.c-v1.patch
This patch finally splits tuplesortops.c from tuplesort.c. tuplesort.c
leaves which generic routines for tuple sort, while tuplesortops.c
provides the implementation for particular tuple formats.

------
Regards,
Alexander Korotkov

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Jakub Wartak
Date:
Subject: RE: Use fadvise in wal replay