Re: Custom opclass for column statistics? - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Custom opclass for column statistics?
Date
Msg-id 20190706155753.4yf5i4lezx42x5os@development
Whole thread Raw
In response to Re: Custom opclass for column statistics?  (Ancoron Luciferis <ancoron.luciferis@googlemail.com>)
Responses Re: Custom opclass for column statistics?
List pgsql-performance
On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote:
>On 06/07/2019 15:38, Tomas Vondra wrote:
>> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>>> Hi,
>>>
>>> I've been wondering whether it is possible somehow to have the standard
>>> column statistics to respect a certain operator class?
>>>
>>> The reason why I am asking for this is that I have a UUID column with a
>>> unique index at it using a custom operator class which implies a
>>> different sort order than for the default UUID operator class.
>>>
>>> This results into planner mistakes when determining whether to use the
>>> index for row selection or not. Too often it falls back into sequential
>>> scan due to this.
>>>
>>
>> Can you share an example demonstrating the issue?
>>
>>
>> regards
>>
>
>Yes, I have an opclass as follows:
>
>CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
>    USING btree AS
>        OPERATOR        1       <*,
>        OPERATOR        1       <~ (uuid, timestamp with time zone),
>        OPERATOR        2       <=*,
>        OPERATOR        2       <=~ (uuid, timestamp with time zone),
>        OPERATOR        3       =,
>        OPERATOR        3       =~ (uuid, timestamp with time zone),
>        OPERATOR        4       >=*,
>        OPERATOR        4       >=~ (uuid, timestamp with time zone),
>        OPERATOR        5       >*,
>        OPERATOR        5       >~ (uuid, timestamp with time zone),
>        FUNCTION        1       uuid_timestamp_cmp(uuid, uuid),
>        FUNCTION        1       uuid_timestamp_only_cmp(uuid, timestamp
>with time zone),
>        FUNCTION        2       uuid_timestamp_sortsupport(internal)
>;
>
>...and e.g. operator "<*" is defined as:
>
>CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
>RETURNS bool
>AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
>LANGUAGE C
>IMMUTABLE
>LEAKPROOF
>STRICT
>PARALLEL SAFE;
>
>COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';
>
>CREATE OPERATOR <* (
>    LEFTARG = uuid,
>    RIGHTARG = uuid,
>    PROCEDURE = uuid_timestamp_lt,
>    COMMUTATOR = '>*',
>    NEGATOR = '>=*',
>    RESTRICT = scalarltsel,
>    JOIN = scalarltjoinsel
>);
>
>
>The function "uuid_timestamp_lt" is basically defined as follows:
>1. if not version 1 UUID fallback to standard uuid compare
>2. extract timestamp values and compare
>3. if equal timestamps fallback to standard uuid compare
>
>...so that a chronological order is established.
>
>
>The test table is created as follows:
>
>CREATE TABLE uuid_v1_ext (id uuid);
>CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);
>
>
>The values for "histogram_bounds" of the test table look like this (due
>to the default sort order for standard type UUID):
>
>00003789-97bf-11e9-b6bb-e03f49f7f733
>008b88f8-6deb-11e9-901a-e03f4947f477
>010a8b22-586a-11e9-8258-e03f49ce78f3
>...
>6f682e68-978d-11e9-901a-e03f4947f477
>6ff412ee-926f-11e9-901a-e03f4947f477
>7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
>70ffaeca-4645-11e9-adf9-e03f494677fb
>...
>fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
>ff779ce8-9e52-11e9-8258-e03f49ce78f3
>ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf
>
>...and I think that's where the planner gets the decision for a query
>such as:
>
>DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';
>
>...which then get's executed as sequential scan instead of an index scan.
>
>I was also thinking about changing the selectivity function used by the
>custom operator, but I didn't find any hints how to implement that
>without duplicating a lot of internal code.
>

Not sure, I'm not very familiar with this code, so I'd have to play with
it and try things. But that's hard when I don't have any code. Would it
be possible to share a small self-contained test case?

I wonder what does uuid_timestamp_cmp do? I suppose it first compares by
a timestamp extracted from the UUID, right?

It'd be interesting to see

(a) statistics for the column from pg_stats, both for the table and
index (which should have been built using the custom opclass, I think).

(b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the
default one (that can't use the timestamp condition, but it should be
possible to generate smallers/largest uuid for a timestamp).

BTW which PostgreSQL version is this?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-performance by date:

Previous
From: Ancoron Luciferis
Date:
Subject: Re: Custom opclass for column statistics?
Next
From: Ancoron Luciferis
Date:
Subject: Re: Custom opclass for column statistics?