Thread: allow sorted builds for btree_gist

allow sorted builds for btree_gist

From
Tomas Vondra
Date:
Hi,

I've been looking at GiST to see if there could be a good way to do
parallel builds - and there might be, if the opclass supports sorted
builds, because then we could parallelize the sort.

But then I noticed we support this mode only for point_ops, because
that's the only opclass with sortsupport procedure. Which mostly makes
sense, because types like geometries, polygons, ... do not have a well
defined order.

Still, we have btree_gist, and I don't think there's much reason to not
support sorted builds for those opclasses, where the gist opclass is
defined on top of btree ordering semantics.

So this patch does that, and the difference (compared to builds with
buiffering=on) can easily be an order of magnitude - at least that's
what my tests show:


test=# create index on test_int8 using gist (a) with (buffering = on);
CREATE INDEX
Time: 578799.450 ms (09:38.799)

test=# create index on test_int8 using gist (a) with (buffering = auto);
CREATE INDEX
Time: 53022.593 ms (00:53.023)


test=# create index on test_uuid using gist (a) with (buffering = on);
CREATE INDEX
Time: 39322.799 ms (00:39.323)

test=# create index on test_uuid using gist (a) with (buffering = auto);
CREATE INDEX
Time: 6466.341 ms (00:06.466)


The WIP patch adds enables this for data types with a usable sortsupport
procedure, which excludes time, timetz, cash, interval, bit, vbit, bool,
enum and macaddr8. I assume time, timetz and macaddr8 could be added,
it's just that I didn't find any existing sortsupport procedure. Same
for cash, but IIRC it's mostly deprecated.

Of course, people probably don't use btree_gist with a single column,
because they could just use btree. It's useful for multi-column GiST
indexes, with data types requiring GiST. And if the other opclasses also
allow sorted builds, this patch makes that possible. Of course, most
"proper GiST opclasses" don't support that, but why not - it's easy.

FWIW this is also why sorted builds likely are not a very practical way
to do parallel builds for GiST - it would help only with a small part of
cases, I think.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: allow sorted builds for btree_gist

From
Paul A Jungwirth
Date:
On Fri, May 17, 2024 at 12:41 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I've been looking at GiST to see if there could be a good way to do
> parallel builds - and there might be, if the opclass supports sorted
> builds, because then we could parallelize the sort.
>
> But then I noticed we support this mode only for point_ops, because
> that's the only opclass with sortsupport procedure. Which mostly makes
> sense, because types like geometries, polygons, ... do not have a well
> defined order.

Oh, I'm excited about this for temporal tables. It seems to me that
ranges and multiranges should have a well-defined order (assuming
their base types do), since you can do dictionary-style ordering
(compare the first value, then the next, then the next, etc.). Is
there any reason not to support those? No reason not to commit these
btree_gist functions first though. If you aren't interested in adding
GiST sortsupport for ranges & multiranges I'm willing to do it myself;
just let me know.

Do note that the 1.7 -> 1.8 changes have been reverted though (as part
of my temporal work), and it looks like your patch is a bit messed up
from that. You'll want to take 1.8 for yourself, and also your 1.9
upgrade script is trying to add the reverted stratnum functions.

Yours,
Paul



Re: allow sorted builds for btree_gist

From
"Andrey M. Borodin"
Date:

> On 18 May 2024, at 00:41, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> if the opclass supports sorted
> builds, because then we could parallelize the sort.

Hi Tomas!

Yup, I'd also be glad to see this feature. PostGIS folks are using their geometry (sortsupport was developed for this)
withobject id (this disables sort build). 

It was committed once [0], but then reverted, vardata opclasses were implemented wrong. Now it's on CF[1], Bernd is
activelyresponding in the thread, but currently patch lacks tests. 

Thanks for raising this!


Best regards, Andrey Borodin.

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9f984ba6d23dc
[1] https://commitfest.postgresql.org/48/3686/


Re: allow sorted builds for btree_gist

From
Tomas Vondra
Date:

On 5/18/24 08:51, Andrey M. Borodin wrote:
> 
> 
>> On 18 May 2024, at 00:41, Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>> 
>> if the opclass supports sorted builds, because then we could
>> parallelize the sort.
> 
> Hi Tomas!
> 
> Yup, I'd also be glad to see this feature. PostGIS folks are using
> their geometry (sortsupport was developed for this) with object id
> (this disables sort build).
> 
> It was committed once [0], but then reverted, vardata opclasses were
> implemented wrong. Now it's on CF[1], Bernd is actively responding in
> the thread, but currently patch lacks tests.
> 
> Thanks for raising this!
> 

Oh, damn! I didn't notice the CF already has a patch doing this, and
that it was committed/reverted in 2021. I was completely oblivious to
that. Apologies.

Let's continue working on that patch/thread, I'll take a look in the
next CF.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: allow sorted builds for btree_gist

From
"Andrey M. Borodin"
Date:

> On 18 May 2024, at 15:22, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Let's continue working on that patch/thread, I'll take a look in the
> next CF.

Cool! I'd be happy to review the patch before CF when Bernd or Christoph will address current issues.


Best regards, Andrey Borodin.


Re: allow sorted builds for btree_gist

From
Tomas Vondra
Date:
On 5/18/24 02:00, Paul A Jungwirth wrote:
> On Fri, May 17, 2024 at 12:41 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I've been looking at GiST to see if there could be a good way to do
>> parallel builds - and there might be, if the opclass supports sorted
>> builds, because then we could parallelize the sort.
>>
>> But then I noticed we support this mode only for point_ops, because
>> that's the only opclass with sortsupport procedure. Which mostly makes
>> sense, because types like geometries, polygons, ... do not have a well
>> defined order.
> 
> Oh, I'm excited about this for temporal tables. It seems to me that
> ranges and multiranges should have a well-defined order (assuming
> their base types do), since you can do dictionary-style ordering
> (compare the first value, then the next, then the next, etc.). Is
> there any reason not to support those? No reason not to commit these
> btree_gist functions first though. If you aren't interested in adding
> GiST sortsupport for ranges & multiranges I'm willing to do it myself;
> just let me know.
> 

I believe that's pretty much what the existing patch [1] linked by
Andrey (and apparently running for a number of CFs) does.

[1] https://commitfest.postgresql.org/48/3686/

> Do note that the 1.7 -> 1.8 changes have been reverted though (as part
> of my temporal work), and it looks like your patch is a bit messed up
> from that. You'll want to take 1.8 for yourself, and also your 1.9
> upgrade script is trying to add the reverted stratnum functions.
> 

Yeah, I happened to branch from a slightly older master, not noticing
this is affected by the revert.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: allow sorted builds for btree_gist

From
Michał Kłeczek
Date:

> On 17 May 2024, at 21:41, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> I've been looking at GiST to see if there could be a good way to do
> parallel builds - and there might be, if the opclass supports sorted
> builds, because then we could parallelize the sort.
>
> But then I noticed we support this mode only for point_ops, because
> that's the only opclass with sortsupport procedure. Which mostly makes
> sense, because types like geometries, polygons, ... do not have a well
> defined order.
>
> Still, we have btree_gist, and I don't think there's much reason to not
> support sorted builds for those opclasses, where the gist opclass is
> defined on top of btree ordering semantics.
>


I wonder if it was possible to add sort support to pg_trgm. Speeding up index build for multicolumn indexes supporting
textsearch would be great. 

—
Michal




Re: allow sorted builds for btree_gist

From
Bernd Helmle
Date:
Hi,

Am Samstag, dem 18.05.2024 um 12:22 +0200 schrieb Tomas Vondra:
> > It was committed once [0], but then reverted, vardata opclasses
> > were
> > implemented wrong. Now it's on CF[1], Bernd is actively responding
> > in
> > the thread, but currently patch lacks tests.
> >
> > Thanks for raising this!
> >
>
> Oh, damn! I didn't notice the CF already has a patch doing this, and
> that it was committed/reverted in 2021. I was completely oblivious to
> that. Apologies.
>
> Let's continue working on that patch/thread, I'll take a look in the
> next CF.

Sorry for the delay, i was on vacation (we had some public holidays
here in germany) and i am currently involved in other time consuming
projects, so i didn't follow my mails very close lately.
If my time permits i'd like to add the remaining missing things to the
patch, i am looking forward for your review, though!

Thanks for bringing this up again.

    Bernd