Re: Yet another fast GiST build - Mailing list pgsql-hackers

From Darafei "Komяpa" Praliaskouski
Subject Re: Yet another fast GiST build
Date
Msg-id CAC8Q8t+0sHUWZTpk23QnW5UFAn1kuNq+YhM6encWxZXWz32e4Q@mail.gmail.com
Whole thread Raw
In response to Re: Yet another fast GiST build  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Responses Re: Yet another fast GiST build  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hi,


On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:


> 9 сент. 2020 г., в 00:05, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> I've been reviewing the patch today. The biggest changes I've made have been in restructuring the code in gistbuild.c for readability, but there are a bunch of smaller changes throughout. Attached is what I've got so far, squashed into one patch.
Thanks!

> I'm continuing to review it, but a couple of questions so far:
>
> In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive == false'. That seems fishy, surely we need to index recently-dead tuples, too. The normal index build path isn't skipping them either.
That's an oversight.
>
> How does the 'sortsupport' routine interact with 'compress'/'decompress'? Which representation is passed to the comparator routine: the original value from the table, the compressed representation, or the decompressed representation? Do the comparetup_index_btree() and readtup_index() routines agree with that?

Currently we pass compressed values, which seems not very good.
But there was a request from PostGIS maintainers to pass values before decompression.
Darafei, please, correct me if I'm wrong. Also can you please provide link on PostGIS B-tree sorting functions?

We were expecting to reuse btree opclass for this thing. This way btree_gist extension will become a lot thinner. :)

Core routine for current sorting implementation is Hilbert curve, which is based on 2D center of a box - and used for abbreviated sort:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893 

All the btree functions are wrappers around gserialized_cmp which just adds a bunch of tiebreakers that don't matter in practice:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313

Base representation for index compressed datatype is GIDX, which is also a box. We can make it work on top of it instead of the original representation.
There is no such thing as "decompressed representation" unfortunately as compression is lossy.

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Improvements in Copy From
Next
From: Amit Kapila
Date:
Subject: Re: Inconsistency in determining the timestamp of the db statfile.