Thread: allow sorted builds for btree_gist
Hi, I've been looking at GiST to see if there could be a good way to do parallel builds - and there might be, if the opclass supports sorted builds, because then we could parallelize the sort. But then I noticed we support this mode only for point_ops, because that's the only opclass with sortsupport procedure. Which mostly makes sense, because types like geometries, polygons, ... do not have a well defined order. Still, we have btree_gist, and I don't think there's much reason to not support sorted builds for those opclasses, where the gist opclass is defined on top of btree ordering semantics. So this patch does that, and the difference (compared to builds with buiffering=on) can easily be an order of magnitude - at least that's what my tests show: test=# create index on test_int8 using gist (a) with (buffering = on); CREATE INDEX Time: 578799.450 ms (09:38.799) test=# create index on test_int8 using gist (a) with (buffering = auto); CREATE INDEX Time: 53022.593 ms (00:53.023) test=# create index on test_uuid using gist (a) with (buffering = on); CREATE INDEX Time: 39322.799 ms (00:39.323) test=# create index on test_uuid using gist (a) with (buffering = auto); CREATE INDEX Time: 6466.341 ms (00:06.466) The WIP patch adds enables this for data types with a usable sortsupport procedure, which excludes time, timetz, cash, interval, bit, vbit, bool, enum and macaddr8. I assume time, timetz and macaddr8 could be added, it's just that I didn't find any existing sortsupport procedure. Same for cash, but IIRC it's mostly deprecated. Of course, people probably don't use btree_gist with a single column, because they could just use btree. It's useful for multi-column GiST indexes, with data types requiring GiST. And if the other opclasses also allow sorted builds, this patch makes that possible. Of course, most "proper GiST opclasses" don't support that, but why not - it's easy. FWIW this is also why sorted builds likely are not a very practical way to do parallel builds for GiST - it would help only with a small part of cases, I think. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, May 17, 2024 at 12:41 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I've been looking at GiST to see if there could be a good way to do > parallel builds - and there might be, if the opclass supports sorted > builds, because then we could parallelize the sort. > > But then I noticed we support this mode only for point_ops, because > that's the only opclass with sortsupport procedure. Which mostly makes > sense, because types like geometries, polygons, ... do not have a well > defined order. Oh, I'm excited about this for temporal tables. It seems to me that ranges and multiranges should have a well-defined order (assuming their base types do), since you can do dictionary-style ordering (compare the first value, then the next, then the next, etc.). Is there any reason not to support those? No reason not to commit these btree_gist functions first though. If you aren't interested in adding GiST sortsupport for ranges & multiranges I'm willing to do it myself; just let me know. Do note that the 1.7 -> 1.8 changes have been reverted though (as part of my temporal work), and it looks like your patch is a bit messed up from that. You'll want to take 1.8 for yourself, and also your 1.9 upgrade script is trying to add the reverted stratnum functions. Yours, Paul
> On 18 May 2024, at 00:41, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > if the opclass supports sorted > builds, because then we could parallelize the sort. Hi Tomas! Yup, I'd also be glad to see this feature. PostGIS folks are using their geometry (sortsupport was developed for this) withobject id (this disables sort build). It was committed once [0], but then reverted, vardata opclasses were implemented wrong. Now it's on CF[1], Bernd is activelyresponding in the thread, but currently patch lacks tests. Thanks for raising this! Best regards, Andrey Borodin. [0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9f984ba6d23dc [1] https://commitfest.postgresql.org/48/3686/
On 5/18/24 08:51, Andrey M. Borodin wrote: > > >> On 18 May 2024, at 00:41, Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >> >> if the opclass supports sorted builds, because then we could >> parallelize the sort. > > Hi Tomas! > > Yup, I'd also be glad to see this feature. PostGIS folks are using > their geometry (sortsupport was developed for this) with object id > (this disables sort build). > > It was committed once [0], but then reverted, vardata opclasses were > implemented wrong. Now it's on CF[1], Bernd is actively responding in > the thread, but currently patch lacks tests. > > Thanks for raising this! > Oh, damn! I didn't notice the CF already has a patch doing this, and that it was committed/reverted in 2021. I was completely oblivious to that. Apologies. Let's continue working on that patch/thread, I'll take a look in the next CF. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 18 May 2024, at 15:22, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Let's continue working on that patch/thread, I'll take a look in the > next CF. Cool! I'd be happy to review the patch before CF when Bernd or Christoph will address current issues. Best regards, Andrey Borodin.
On 5/18/24 02:00, Paul A Jungwirth wrote: > On Fri, May 17, 2024 at 12:41 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> I've been looking at GiST to see if there could be a good way to do >> parallel builds - and there might be, if the opclass supports sorted >> builds, because then we could parallelize the sort. >> >> But then I noticed we support this mode only for point_ops, because >> that's the only opclass with sortsupport procedure. Which mostly makes >> sense, because types like geometries, polygons, ... do not have a well >> defined order. > > Oh, I'm excited about this for temporal tables. It seems to me that > ranges and multiranges should have a well-defined order (assuming > their base types do), since you can do dictionary-style ordering > (compare the first value, then the next, then the next, etc.). Is > there any reason not to support those? No reason not to commit these > btree_gist functions first though. If you aren't interested in adding > GiST sortsupport for ranges & multiranges I'm willing to do it myself; > just let me know. > I believe that's pretty much what the existing patch [1] linked by Andrey (and apparently running for a number of CFs) does. [1] https://commitfest.postgresql.org/48/3686/ > Do note that the 1.7 -> 1.8 changes have been reverted though (as part > of my temporal work), and it looks like your patch is a bit messed up > from that. You'll want to take 1.8 for yourself, and also your 1.9 > upgrade script is trying to add the reverted stratnum functions. > Yeah, I happened to branch from a slightly older master, not noticing this is affected by the revert. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 17 May 2024, at 21:41, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > I've been looking at GiST to see if there could be a good way to do > parallel builds - and there might be, if the opclass supports sorted > builds, because then we could parallelize the sort. > > But then I noticed we support this mode only for point_ops, because > that's the only opclass with sortsupport procedure. Which mostly makes > sense, because types like geometries, polygons, ... do not have a well > defined order. > > Still, we have btree_gist, and I don't think there's much reason to not > support sorted builds for those opclasses, where the gist opclass is > defined on top of btree ordering semantics. > I wonder if it was possible to add sort support to pg_trgm. Speeding up index build for multicolumn indexes supporting textsearch would be great. — Michal
Hi, Am Samstag, dem 18.05.2024 um 12:22 +0200 schrieb Tomas Vondra: > > It was committed once [0], but then reverted, vardata opclasses > > were > > implemented wrong. Now it's on CF[1], Bernd is actively responding > > in > > the thread, but currently patch lacks tests. > > > > Thanks for raising this! > > > > Oh, damn! I didn't notice the CF already has a patch doing this, and > that it was committed/reverted in 2021. I was completely oblivious to > that. Apologies. > > Let's continue working on that patch/thread, I'll take a look in the > next CF. Sorry for the delay, i was on vacation (we had some public holidays here in germany) and i am currently involved in other time consuming projects, so i didn't follow my mails very close lately. If my time permits i'd like to add the remaining missing things to the patch, i am looking forward for your review, though! Thanks for bringing this up again. Bernd