Thread: Custom opclass for column statistics?

Custom opclass for column statistics?

From
Ancoron Luciferis
Date:
Hi,

I've been wondering whether it is possible somehow to have the standard
column statistics to respect a certain operator class?

The reason why I am asking for this is that I have a UUID column with a
unique index at it using a custom operator class which implies a
different sort order than for the default UUID operator class.

This results into planner mistakes when determining whether to use the
index for row selection or not. Too often it falls back into sequential
scan due to this.


Thanx and Cheers,

    Ancoron



Re: Custom opclass for column statistics?

From
Tomas Vondra
Date:
On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>Hi,
>
>I've been wondering whether it is possible somehow to have the standard
>column statistics to respect a certain operator class?
>
>The reason why I am asking for this is that I have a UUID column with a
>unique index at it using a custom operator class which implies a
>different sort order than for the default UUID operator class.
>
>This results into planner mistakes when determining whether to use the
>index for row selection or not. Too often it falls back into sequential
>scan due to this.
>

Can you share an example demonstrating the issue?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Custom opclass for column statistics?

From
Tom Lane
Date:
Ancoron Luciferis <ancoron.luciferis@googlemail.com> writes:
> I've been wondering whether it is possible somehow to have the standard
> column statistics to respect a certain operator class?

In principle, pg_statistic can represent stats for a non-default opclass.
Teaching ANALYZE to collect such stats when appropriate, and then teaching
the planner to use them when appropriate, is left as an exercise for the
reader.

I think the "when appropriate" bit is actually the hardest part of that.
Possibly, if you were satisfied with a relatively manual approach,
you could proceed by using CREATE STATISTICS to declare interest in
keeping standard stats for a non-default sort order.  Not sure what to
do if you want it to be automatic, because I don't think people would
hold still for having ANALYZE collect stats for any random non-default
opclass automatically.  Maybe a new opclass property?

            regards, tom lane



Re: Custom opclass for column statistics?

From
Ancoron Luciferis
Date:
On 06/07/2019 15:38, Tomas Vondra wrote:
> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>> Hi,
>>
>> I've been wondering whether it is possible somehow to have the standard
>> column statistics to respect a certain operator class?
>>
>> The reason why I am asking for this is that I have a UUID column with a
>> unique index at it using a custom operator class which implies a
>> different sort order than for the default UUID operator class.
>>
>> This results into planner mistakes when determining whether to use the
>> index for row selection or not. Too often it falls back into sequential
>> scan due to this.
>>
> 
> Can you share an example demonstrating the issue?
> 
> 
> regards
> 

Yes, I have an opclass as follows:

CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
    USING btree AS
        OPERATOR        1       <*,
        OPERATOR        1       <~ (uuid, timestamp with time zone),
        OPERATOR        2       <=*,
        OPERATOR        2       <=~ (uuid, timestamp with time zone),
        OPERATOR        3       =,
        OPERATOR        3       =~ (uuid, timestamp with time zone),
        OPERATOR        4       >=*,
        OPERATOR        4       >=~ (uuid, timestamp with time zone),
        OPERATOR        5       >*,
        OPERATOR        5       >~ (uuid, timestamp with time zone),
        FUNCTION        1       uuid_timestamp_cmp(uuid, uuid),
        FUNCTION        1       uuid_timestamp_only_cmp(uuid, timestamp
with time zone),
        FUNCTION        2       uuid_timestamp_sortsupport(internal)
;

...and e.g. operator "<*" is defined as:

CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
RETURNS bool
AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
LANGUAGE C
IMMUTABLE
LEAKPROOF
STRICT
PARALLEL SAFE;

COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';

CREATE OPERATOR <* (
    LEFTARG = uuid,
    RIGHTARG = uuid,
    PROCEDURE = uuid_timestamp_lt,
    COMMUTATOR = '>*',
    NEGATOR = '>=*',
    RESTRICT = scalarltsel,
    JOIN = scalarltjoinsel
);


The function "uuid_timestamp_lt" is basically defined as follows:
1. if not version 1 UUID fallback to standard uuid compare
2. extract timestamp values and compare
3. if equal timestamps fallback to standard uuid compare

...so that a chronological order is established.


The test table is created as follows:

CREATE TABLE uuid_v1_ext (id uuid);
CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);


The values for "histogram_bounds" of the test table look like this (due
to the default sort order for standard type UUID):

00003789-97bf-11e9-b6bb-e03f49f7f733
008b88f8-6deb-11e9-901a-e03f4947f477
010a8b22-586a-11e9-8258-e03f49ce78f3
...
6f682e68-978d-11e9-901a-e03f4947f477
6ff412ee-926f-11e9-901a-e03f4947f477
7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
70ffaeca-4645-11e9-adf9-e03f494677fb
...
fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
ff779ce8-9e52-11e9-8258-e03f49ce78f3
ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf

...and I think that's where the planner gets the decision for a query
such as:

DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';

...which then get's executed as sequential scan instead of an index scan.

I was also thinking about changing the selectivity function used by the
custom operator, but I didn't find any hints how to implement that
without duplicating a lot of internal code.


Cheers,

    Ancoron




Re: Custom opclass for column statistics?

From
Tomas Vondra
Date:
On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote:
>On 06/07/2019 15:38, Tomas Vondra wrote:
>> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>>> Hi,
>>>
>>> I've been wondering whether it is possible somehow to have the standard
>>> column statistics to respect a certain operator class?
>>>
>>> The reason why I am asking for this is that I have a UUID column with a
>>> unique index at it using a custom operator class which implies a
>>> different sort order than for the default UUID operator class.
>>>
>>> This results into planner mistakes when determining whether to use the
>>> index for row selection or not. Too often it falls back into sequential
>>> scan due to this.
>>>
>>
>> Can you share an example demonstrating the issue?
>>
>>
>> regards
>>
>
>Yes, I have an opclass as follows:
>
>CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
>    USING btree AS
>        OPERATOR        1       <*,
>        OPERATOR        1       <~ (uuid, timestamp with time zone),
>        OPERATOR        2       <=*,
>        OPERATOR        2       <=~ (uuid, timestamp with time zone),
>        OPERATOR        3       =,
>        OPERATOR        3       =~ (uuid, timestamp with time zone),
>        OPERATOR        4       >=*,
>        OPERATOR        4       >=~ (uuid, timestamp with time zone),
>        OPERATOR        5       >*,
>        OPERATOR        5       >~ (uuid, timestamp with time zone),
>        FUNCTION        1       uuid_timestamp_cmp(uuid, uuid),
>        FUNCTION        1       uuid_timestamp_only_cmp(uuid, timestamp
>with time zone),
>        FUNCTION        2       uuid_timestamp_sortsupport(internal)
>;
>
>...and e.g. operator "<*" is defined as:
>
>CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
>RETURNS bool
>AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
>LANGUAGE C
>IMMUTABLE
>LEAKPROOF
>STRICT
>PARALLEL SAFE;
>
>COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';
>
>CREATE OPERATOR <* (
>    LEFTARG = uuid,
>    RIGHTARG = uuid,
>    PROCEDURE = uuid_timestamp_lt,
>    COMMUTATOR = '>*',
>    NEGATOR = '>=*',
>    RESTRICT = scalarltsel,
>    JOIN = scalarltjoinsel
>);
>
>
>The function "uuid_timestamp_lt" is basically defined as follows:
>1. if not version 1 UUID fallback to standard uuid compare
>2. extract timestamp values and compare
>3. if equal timestamps fallback to standard uuid compare
>
>...so that a chronological order is established.
>
>
>The test table is created as follows:
>
>CREATE TABLE uuid_v1_ext (id uuid);
>CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);
>
>
>The values for "histogram_bounds" of the test table look like this (due
>to the default sort order for standard type UUID):
>
>00003789-97bf-11e9-b6bb-e03f49f7f733
>008b88f8-6deb-11e9-901a-e03f4947f477
>010a8b22-586a-11e9-8258-e03f49ce78f3
>...
>6f682e68-978d-11e9-901a-e03f4947f477
>6ff412ee-926f-11e9-901a-e03f4947f477
>7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
>70ffaeca-4645-11e9-adf9-e03f494677fb
>...
>fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
>ff779ce8-9e52-11e9-8258-e03f49ce78f3
>ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf
>
>...and I think that's where the planner gets the decision for a query
>such as:
>
>DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';
>
>...which then get's executed as sequential scan instead of an index scan.
>
>I was also thinking about changing the selectivity function used by the
>custom operator, but I didn't find any hints how to implement that
>without duplicating a lot of internal code.
>

Not sure, I'm not very familiar with this code, so I'd have to play with
it and try things. But that's hard when I don't have any code. Would it
be possible to share a small self-contained test case?

I wonder what does uuid_timestamp_cmp do? I suppose it first compares by
a timestamp extracted from the UUID, right?

It'd be interesting to see

(a) statistics for the column from pg_stats, both for the table and
index (which should have been built using the custom opclass, I think).

(b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the
default one (that can't use the timestamp condition, but it should be
possible to generate smallers/largest uuid for a timestamp).

BTW which PostgreSQL version is this?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Custom opclass for column statistics?

From
Ancoron Luciferis
Date:
On 06/07/2019 17:57, Tomas Vondra wrote:
> On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote:
>> On 06/07/2019 15:38, Tomas Vondra wrote:
>>> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>>>> Hi,
>>>>
>>>> I've been wondering whether it is possible somehow to have the standard
>>>> column statistics to respect a certain operator class?
>>>>
>>>> The reason why I am asking for this is that I have a UUID column with a
>>>> unique index at it using a custom operator class which implies a
>>>> different sort order than for the default UUID operator class.
>>>>
>>>> This results into planner mistakes when determining whether to use the
>>>> index for row selection or not. Too often it falls back into sequential
>>>> scan due to this.
>>>>
>>>
>>> Can you share an example demonstrating the issue?
>>>
>>>
>>> regards
>>>
>>
>> Yes, I have an opclass as follows:
>>
>> CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
>>    USING btree AS
>>        OPERATOR        1       <*,
>>        OPERATOR        1       <~ (uuid, timestamp with time zone),
>>        OPERATOR        2       <=*,
>>        OPERATOR        2       <=~ (uuid, timestamp with time zone),
>>        OPERATOR        3       =,
>>        OPERATOR        3       =~ (uuid, timestamp with time zone),
>>        OPERATOR        4       >=*,
>>        OPERATOR        4       >=~ (uuid, timestamp with time zone),
>>        OPERATOR        5       >*,
>>        OPERATOR        5       >~ (uuid, timestamp with time zone),
>>        FUNCTION        1       uuid_timestamp_cmp(uuid, uuid),
>>        FUNCTION        1       uuid_timestamp_only_cmp(uuid, timestamp
>> with time zone),
>>        FUNCTION        2       uuid_timestamp_sortsupport(internal)
>> ;
>>
>> ...and e.g. operator "<*" is defined as:
>>
>> CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
>> RETURNS bool
>> AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
>> LANGUAGE C
>> IMMUTABLE
>> LEAKPROOF
>> STRICT
>> PARALLEL SAFE;
>>
>> COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';
>>
>> CREATE OPERATOR <* (
>>    LEFTARG = uuid,
>>    RIGHTARG = uuid,
>>    PROCEDURE = uuid_timestamp_lt,
>>    COMMUTATOR = '>*',
>>    NEGATOR = '>=*',
>>    RESTRICT = scalarltsel,
>>    JOIN = scalarltjoinsel
>> );
>>
>>
>> The function "uuid_timestamp_lt" is basically defined as follows:
>> 1. if not version 1 UUID fallback to standard uuid compare
>> 2. extract timestamp values and compare
>> 3. if equal timestamps fallback to standard uuid compare
>>
>> ...so that a chronological order is established.
>>
>>
>> The test table is created as follows:
>>
>> CREATE TABLE uuid_v1_ext (id uuid);
>> CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id
>> uuid_timestamp_ops);
>>
>>
>> The values for "histogram_bounds" of the test table look like this (due
>> to the default sort order for standard type UUID):
>>
>> 00003789-97bf-11e9-b6bb-e03f49f7f733
>> 008b88f8-6deb-11e9-901a-e03f4947f477
>> 010a8b22-586a-11e9-8258-e03f49ce78f3
>> ...
>> 6f682e68-978d-11e9-901a-e03f4947f477
>> 6ff412ee-926f-11e9-901a-e03f4947f477
>> 7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
>> 70ffaeca-4645-11e9-adf9-e03f494677fb
>> ...
>> fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
>> ff779ce8-9e52-11e9-8258-e03f49ce78f3
>> ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf
>>
>> ...and I think that's where the planner gets the decision for a query
>> such as:
>>
>> DELETE FROM uuid_v1_ext WHERE id <*
>> '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';
>>
>> ...which then get's executed as sequential scan instead of an index scan.
>>
>> I was also thinking about changing the selectivity function used by the
>> custom operator, but I didn't find any hints how to implement that
>> without duplicating a lot of internal code.
>>
> 
> Not sure, I'm not very familiar with this code, so I'd have to play with
> it and try things. But that's hard when I don't have any code. Would it
> be possible to share a small self-contained test case?

My test uses historically generated UUID's and is currently not
automated (generates ~120M UUID's over a period of ~4 months). The tool
I am using to generate the UUID's is a Java tool, so not very automation
friendly (using Maven, which downloads a lot at first build time).

The resulting data is ~4 GiB in size, but here is a guide I use for my
manual testing:

https://gist.github.com/ancoron/b08ac4b1ceafda2a38ff12030c011385

Please note that my testing involves 4 SSD's (to avoid too much mixed
I/O but not for functionality):
1.) system + WAL
2.) generated UUID files (for reading)
3.) table data (tablespace "fast")
4.) index data (tablespace "faster")


> 
> I wonder what does uuid_timestamp_cmp do? I suppose it first compares by
> a timestamp extracted from the UUID, right?

exactly:

https://github.com/ancoron/pg-uuid-ext/blob/master/uuid_ext.c#L234

> 
> It'd be interesting to see
> 
> (a) statistics for the column from pg_stats, both for the table and
> index (which should have been built using the custom opclass, I think).

For the table column:

schemaname             | public
tablename              | uuid_v1_ext
attname                | id
inherited              | f
null_frac              | 0
avg_width              | 16
n_distinct             | -1
most_common_vals       |
most_common_freqs      |
histogram_bounds       | {00003789-97bf-11e9-b6bb-e03f49f7f733, ...
correlation            | 0.00448091
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |


There is no statistic for the index (or I don't know how to retrieve it).

> 
> (b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the
> default one (that can't use the timestamp condition, but it should be
> possible to generate smallers/largest uuid for a timestamp).

After some further testing it seems that I am comparing apples with
bananas to a certain extend as I also have another index using a uuid
timestamp extraction function and then using a timestamp for selecting
the entries to delete.

For uuid 4bdf6f81-56ad-11e9-8258-e03f49ce78f3, the query then looks as
follows:

DELETE FROM uuid_v1_timestamp WHERE uuid_v1_timestamp(id) < '2019-04-04
07:43:11.3776';

...resulting in a plan like:

 Delete on uuid_v1_timestamp  (cost=0.57..779651.49 rows=30077367 width=6)
   ->  Index Scan using idx_uuid_v1_timestamp on uuid_v1_timestamp
(cost=0.57..779651.49 rows=30077367 width=6)
         Index Cond: (uuid_v1_timestamp(id) < '2019-04-04
07:43:11.3776+00'::timestamp with time zone)

So, as my opclass basically does the same internally, I was expecting
the same behavior but it turned out that the statistics are just off a
bit. When changing the timestamp to a lower value (resulting in less
rows selected), the planner correctly uses an index scan, e.g.:

 Delete on uuid_v1_ext  (cost=0.57..767334.76 rows=1829994 width=6)
   ->  Index Scan using idx_uuid_v1_ext on uuid_v1_ext
(cost=0.57..767334.76 rows=1829994 width=6)
         Index Cond: (id <* '4e91eb0a-4e91-11e9-83fd-e03f49ef76f7'::uuid)

...despite the fact that the bare uuid value is pretty high (the
timestamp value is 2019-03-25 00:00:28.846626+00).

What I can see is that the row estimates are really off here:

 Seq Scan on uuid_v1_ext  (cost=0.00..2184455.80 rows=46969870 width=16)
(actual time=0.029..9372.892 rows=30000000 loops=1)
   Filter: (id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3'::uuid)
   Rows Removed by Filter: 92000000
 Planning Time: 0.152 ms
 Execution Time: 10139.718 ms

vs.

 Index Only Scan using idx_uuid_v1_ext on uuid_v1_ext
(cost=0.57..767334.76 rows=1829994 width=16) (actual
time=0.042..3255.211 rows=19709001 loops=1)
   Index Cond: (id <* '4e91eb0a-4e91-11e9-83fd-e03f49ef76f7'::uuid)
   Heap Fetches: 19709001
 Planning Time: 0.182 ms
 Execution Time: 3763.380 ms

^^ last case off by a factor of 10?

However, I think the lower rows range for switching to an index scan may
also be influenced by the index width (16 byte uuid vs. 8 byte
timestamp). Or am I wrong here?

> 
> BTW which PostgreSQL version is this?

I am testing mainly on 11.4. I did do any testing on 12 beta so far.

> 
> regards
> 




Re: Custom opclass for column statistics?

From
Ancoron Luciferis
Date:
On 06/07/2019 15:58, Tom Lane wrote:
> Ancoron Luciferis <ancoron.luciferis@googlemail.com> writes:
>> I've been wondering whether it is possible somehow to have the standard
>> column statistics to respect a certain operator class?
> 
> In principle, pg_statistic can represent stats for a non-default opclass.
> Teaching ANALYZE to collect such stats when appropriate, and then teaching
> the planner to use them when appropriate, is left as an exercise for the
> reader.

Hehe, now that you are saying it, I realize what I was actually asking
for with this... ;)

> 
> I think the "when appropriate" bit is actually the hardest part of that.
> Possibly, if you were satisfied with a relatively manual approach,
> you could proceed by using CREATE STATISTICS to declare interest in
> keeping standard stats for a non-default sort order.  Not sure what to
> do if you want it to be automatic, because I don't think people would
> hold still for having ANALYZE collect stats for any random non-default
> opclass automatically.  Maybe a new opclass property?

I totally agree with the complications around all that.

Now I think if I want better statistics and better plans for my new
time-sorted index, I will need a new data type for which I can set the
opclass as default, which also would provide users the guarantee that
they'll get what they expect.

Thanx and cheers,

    Ancoron