Thread: A case for GIST supporting ORDER BY

A case for GIST supporting ORDER BY

From
Michał Kłeczek
Date:
Hi,

Some time ago I’ve provided some details with the issues we face when trying to use GIST and partitioning at the same time in the postgresql-general mailing list:
We decided to go with the solution to partition our table by:

RANGE (‘2100-01-01' <-> operation_date).

While it (somewhat) solves partition pruning issues described above there is another problem:
It is impossible to create a unique constraint on the partitioned table.

So now we cannot use INSERT … ON CONFLICT (…) DO UPDATE



My question to hackers:
Would it be feasible to implement ORDER BY column GIST index (only) scan for types with total order and sensible greatest and least values?

Thanks,
Michal
Attachment

DRAFT GIST support for ORDER BY

From
Michał Kłeczek
Date:
Hi All,

Attached is a first attempt to implement GIST index (only) scans for ORDER BY column clauses.

The idea is that it order by column for some datatypes is a special case of ordering by distance:

ORDER BY a == ORDER BY a <-> MIN_VALUE
and
ORDER BY a DESC == ORDER BY a <-> MAX_VALUE

This allows implementing GIST ordered scans for btree_gist datatypes.

This in turn makes using GIST with partitioning feasible (I have described issues with such usage in my previous e-mails - see below).

The solution is not ideal as it requires registering “<“ and “>” operators as ordering operators in opfamily
(which in turn makes it possible to issue somewhat meaningless “ORDER BY a < ‘constant’)

The problem is though that right now handling of ORDER BY column clauses is tightly coupled to BTree.
It would be good to refactor the code so that semantics of ORDER BY column could be more flexible.

It would be great if someone could take a look at it.

Thanks,
Michal 

On 24 Oct 2023, at 13:22, Michał Kłeczek <michal@kleczek.org> wrote:

Hi,

Some time ago I’ve provided some details with the issues we face when trying to use GIST and partitioning at the same time in the postgresql-general mailing list:
We decided to go with the solution to partition our table by:

RANGE (‘2100-01-01' <-> operation_date).

While it (somewhat) solves partition pruning issues described above there is another problem:
It is impossible to create a unique constraint on the partitioned table.

So now we cannot use INSERT … ON CONFLICT (…) DO UPDATE



My question to hackers:
Would it be feasible to implement ORDER BY column GIST index (only) scan for types with total order and sensible greatest and least values?

Thanks,
Michal

Attachment

Re: DRAFT GIST support for ORDER BY

From
Matthias van de Meent
Date:
On Mon, 30 Oct 2023 at 09:04, Michał Kłeczek <michal@kleczek.org> wrote:
>
> Hi All,
>
> Attached is a first attempt to implement GIST index (only) scans for ORDER BY column clauses.

Cool!

> The solution is not ideal as it requires registering “<“ and “>” operators as ordering operators in opfamily
> (which in turn makes it possible to issue somewhat meaningless “ORDER BY a < ‘constant’)

I don't quite understand why we need to register new "<" and ">"
operators. Can't we update the current ones?

> The problem is though that right now handling of ORDER BY column clauses is tightly coupled to BTree.
> It would be good to refactor the code so that semantics of ORDER BY column could be more flexible.

The existence of a BTREE operator class for the type is the indicator
that (and how) the type can be ordered - that is where PostgreSQL gets
its methods for ordering most types. Although I agree that it's a
quirk, I don't mind it that much as an indicator of how a type is
ordered.
I do agree, though, that operator classes by themselves should be able
to say "hey, we support full ordered retrieval as well". Right now,
that seems to be limited to btrees, but indeed a GiST index with
btree_gist columns should be able to support the same.

> It would be great if someone could take a look at it.

I've not looked in detail at the patch, but here's some comments:

> --- a/contrib/btree_gist/btree_gist--1.6--1.7.sql
> +++ b/contrib/btree_gist/btree_gist--1.6--1.7.sql

You seem to be modifying an existing migration of a released version
of the btree_bist extension. I suggest you instead add a migration
from 1.7 to a new version 1.8, and update the control file's default
installed version.

> ORDER BY a == ORDER BY a <-> MIN_VALUE
> and
> ORDER BY a DESC == ORDER BY a <-> MAX_VALUE
>
> This allows implementing GIST ordered scans for btree_gist datatypes.
>
> This in turn makes using GIST with partitioning feasible (I have described issues with such usage in my previous
e-mails- see below). 

Did you take into account that GiST's internal distance function uses
floating point, and is thus only an approximation for values that
require more than 2^54 significant bits in their distance function?
For example, GiST wouldn't be guaranteed to yield correct ordering of
int8/bigint when you use `my_column <-> UINT64_MAX` because as far as
the floating point math is concerned, 0 is about as far away from
INT64_MAX as (say) 20 and -21.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: DRAFT GIST support for ORDER BY

From
Michał Kłeczek
Date:

> On 30 Oct 2023, at 13:31, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>
>> The solution is not ideal as it requires registering “<“ and “>” operators as ordering operators in opfamily
>> (which in turn makes it possible to issue somewhat meaningless “ORDER BY a < ‘constant’)
>
> I don't quite understand why we need to register new "<" and ">"
> operators. Can't we update the current ones?

I wasn’t precise: what is needed is adding pg_amop entries with amoppurpose = ‘o’ for existing “<" and “>" operators.

>
>> The problem is though that right now handling of ORDER BY column clauses is tightly coupled to BTree.
>> It would be good to refactor the code so that semantics of ORDER BY column could be more flexible.
>
> The existence of a BTREE operator class for the type is the indicator
> that (and how) the type can be ordered - that is where PostgreSQL gets
> its methods for ordering most types. Although I agree that it's a
> quirk, I don't mind it that much as an indicator of how a type is
> ordered.
> I do agree, though, that operator classes by themselves should be able
> to say "hey, we support full ordered retrieval as well". Right now,
> that seems to be limited to btrees, but indeed a GiST index with
> btree_gist columns should be able to support the same.

Right now opfamily and strategy are set in PathKey before creating index scan paths.

The patch actually copies existing code from create_indexscan_plan
that finds an operator OID for (pk_opfamily, pk_strategy).
The operator is supposed to be binary with specific operand types.

To create a path:
1) do the operator OID lookup as above
2) look for sortfamily of pg_amop entry for (operator did, index opfamily)
If the sort family is the same as pk_opfamily we can create a path.

The side effect is that it is possible to “ORDER BY column < ‘constant’” as we have more ordering operators in pg_amop.

Ideally we could look up _unary_ operator in pg_amop instead - that would make sense we are actually measuring some
“absolutedistance”. 
But this would require more changes - createplan.c would need to decide when to lookup unary and when - binary
operator.


>> It would be great if someone could take a look at it.
>
> I've not looked in detail at the patch, but here's some comments:
>
>> --- a/contrib/btree_gist/btree_gist--1.6--1.7.sql
>> +++ b/contrib/btree_gist/btree_gist--1.6--1.7.sql
>
> You seem to be modifying an existing migration of a released version
> of the btree_bist extension. I suggest you instead add a migration
> from 1.7 to a new version 1.8, and update the control file's default
> installed version.

Thanks. I didn’t know how to register a new migration so did it that way.
Will try to fix that.

>
>> ORDER BY a == ORDER BY a <-> MIN_VALUE
>> and
>> ORDER BY a DESC == ORDER BY a <-> MAX_VALUE
>>
>> This allows implementing GIST ordered scans for btree_gist datatypes.
>>
>> This in turn makes using GIST with partitioning feasible (I have described issues with such usage in my previous
e-mails- see below). 
>
> Did you take into account that GiST's internal distance function uses
> floating point, and is thus only an approximation for values that
> require more than 2^54 significant bits in their distance function?
> For example, GiST wouldn't be guaranteed to yield correct ordering of
> int8/bigint when you use `my_column <-> UINT64_MAX` because as far as
> the floating point math is concerned, 0 is about as far away from
> INT64_MAX as (say) 20 and -21.

Hmm… Good point but it means ORDER BY <-> is broken for these types then?
The patch assumes it works correctly and just uses it for ordered scans.

—
Michal


Re: DRAFT GIST support for ORDER BY

From
Matthias van de Meent
Date:
On Mon, 30 Oct 2023 at 14:39, Michał Kłeczek <michal@kleczek.org> wrote:
>> On 30 Oct 2023, at 13:31, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>
>>> The problem is though that right now handling of ORDER BY column clauses is tightly coupled to BTree.
>>> It would be good to refactor the code so that semantics of ORDER BY column could be more flexible.
>>
>> The existence of a BTREE operator class for the type is the indicator
>> that (and how) the type can be ordered - that is where PostgreSQL gets
>> its methods for ordering most types. Although I agree that it's a
>> quirk, I don't mind it that much as an indicator of how a type is
>> ordered.
>> I do agree, though, that operator classes by themselves should be able
>> to say "hey, we support full ordered retrieval as well". Right now,
>> that seems to be limited to btrees, but indeed a GiST index with
>> btree_gist columns should be able to support the same.
>
> Right now opfamily and strategy are set in PathKey before creating index scan paths.
>
> The patch actually copies existing code from create_indexscan_plan
> that finds an operator OID for (pk_opfamily, pk_strategy).
> The operator is supposed to be binary with specific operand types.
>
> To create a path:
> 1) do the operator OID lookup as above
> 2) look for sortfamily of pg_amop entry for (operator did, index opfamily)
> If the sort family is the same as pk_opfamily we can create a path.
>
> The side effect is that it is possible to “ORDER BY column < ‘constant’” as we have more ordering operators in
pg_amop.
>
> Ideally we could look up _unary_ operator in pg_amop instead - that would make sense we are actually measuring some
“absolutedistance”. 
> But this would require more changes - createplan.c would need to decide when to lookup unary and when - binary
operator.

After researching this a bit more, I'm confused: If I register an opclass

CREATE OPERATOR CLASS gist_mytype_btree
DEFUALT FOR mytype USING gist
AS
    OPERATOR 1 < (mytype, mytype) FOR ORDER BY mytype_ops, -- operator
<(mytype, mytype) returns bool
    ...
    OPERATOR 15 <-> (mytype, mytype) FOR ORDER BY mytype_ops. --
operator <->(mytype, mytype) returns mytype
    ...

Then which order of values does the system expect the index to return
tuples in when either of these operators is applied?
Is that
  ORDER BY (index_column opr constant); but bool isn't the type
supported by the FOR ORDER BY opclass, or
  ORDER BY (index_column); but this makes no sense for distance operators.

After looking at get_relation_info() in optimizer/util/plancat.c, I
guess the difference is the difference between amhandler->amcanorder
vs amhandler->amcanorderbyop? But still it's not quite clear what the
implication for this is. Does it mean an index AM can either provide
natural ordering, or operator ordering, but not both?

>>> ORDER BY a == ORDER BY a <-> MIN_VALUE
>>> and
>>> ORDER BY a DESC == ORDER BY a <-> MAX_VALUE
>>>
>>> This allows implementing GIST ordered scans for btree_gist datatypes.
>>>
>>> This in turn makes using GIST with partitioning feasible (I have described issues with such usage in my previous
e-mails- see below). 
>>
>> Did you take into account that GiST's internal distance function uses
>> floating point, and is thus only an approximation for values that
>> require more than 2^54 significant bits in their distance function?
>> For example, GiST wouldn't be guaranteed to yield correct ordering of
>> int8/bigint when you use `my_column <-> UINT64_MAX` because as far as
>> the floating point math is concerned, 0 is about as far away from
>> INT64_MAX as (say) 20 and -21.
>
> Hmm… Good point but it means ORDER BY <-> is broken for these types then?
> The patch assumes it works correctly and just uses it for ordered scans.

Huh, I didn't know this before, but apparently values are pushed onto
a reorderqueue/pairingheap if the index scan is marked
xs_recheckorderby (i.e. when the tuple order is not exact), which
would be used in this case.

So it seems like this wouldn't be much of an issue for the patch,
apart from the potential issue where this could use the pairingheap
much more than the usual ordered scan operations, which could result
in larger-than-normal memory usage. E.g. float btree ops wouldn't work
effectively at all because every reasonable value is extremely distant
from its max value.

Kind regards,

Matthias van de Meent