Re: Custom opclass for column statistics? - Mailing list pgsql-performance
From | Tomas Vondra |
---|---|
Subject | Re: Custom opclass for column statistics? |
Date | |
Msg-id | 20190706155753.4yf5i4lezx42x5os@development Whole thread Raw |
In response to | Re: Custom opclass for column statistics? (Ancoron Luciferis <ancoron.luciferis@googlemail.com>) |
Responses |
Re: Custom opclass for column statistics?
|
List | pgsql-performance |
On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote: >On 06/07/2019 15:38, Tomas Vondra wrote: >> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote: >>> Hi, >>> >>> I've been wondering whether it is possible somehow to have the standard >>> column statistics to respect a certain operator class? >>> >>> The reason why I am asking for this is that I have a UUID column with a >>> unique index at it using a custom operator class which implies a >>> different sort order than for the default UUID operator class. >>> >>> This results into planner mistakes when determining whether to use the >>> index for row selection or not. Too often it falls back into sequential >>> scan due to this. >>> >> >> Can you share an example demonstrating the issue? >> >> >> regards >> > >Yes, I have an opclass as follows: > >CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid > USING btree AS > OPERATOR 1 <*, > OPERATOR 1 <~ (uuid, timestamp with time zone), > OPERATOR 2 <=*, > OPERATOR 2 <=~ (uuid, timestamp with time zone), > OPERATOR 3 =, > OPERATOR 3 =~ (uuid, timestamp with time zone), > OPERATOR 4 >=*, > OPERATOR 4 >=~ (uuid, timestamp with time zone), > OPERATOR 5 >*, > OPERATOR 5 >~ (uuid, timestamp with time zone), > FUNCTION 1 uuid_timestamp_cmp(uuid, uuid), > FUNCTION 1 uuid_timestamp_only_cmp(uuid, timestamp >with time zone), > FUNCTION 2 uuid_timestamp_sortsupport(internal) >; > >...and e.g. operator "<*" is defined as: > >CREATE FUNCTION uuid_timestamp_lt(uuid, uuid) >RETURNS bool >AS 'MODULE_PATHNAME', 'uuid_timestamp_lt' >LANGUAGE C >IMMUTABLE >LEAKPROOF >STRICT >PARALLEL SAFE; > >COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than'; > >CREATE OPERATOR <* ( > LEFTARG = uuid, > RIGHTARG = uuid, > PROCEDURE = uuid_timestamp_lt, > COMMUTATOR = '>*', > NEGATOR = '>=*', > RESTRICT = scalarltsel, > JOIN = scalarltjoinsel >); > > >The function "uuid_timestamp_lt" is basically defined as follows: >1. if not version 1 UUID fallback to standard uuid compare >2. extract timestamp values and compare >3. if equal timestamps fallback to standard uuid compare > >...so that a chronological order is established. > > >The test table is created as follows: > >CREATE TABLE uuid_v1_ext (id uuid); >CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops); > > >The values for "histogram_bounds" of the test table look like this (due >to the default sort order for standard type UUID): > >00003789-97bf-11e9-b6bb-e03f49f7f733 >008b88f8-6deb-11e9-901a-e03f4947f477 >010a8b22-586a-11e9-8258-e03f49ce78f3 >... >6f682e68-978d-11e9-901a-e03f4947f477 >6ff412ee-926f-11e9-901a-e03f4947f477 >7079ffe2-642f-11e9-b0cc-e03f49e7fd3b >70ffaeca-4645-11e9-adf9-e03f494677fb >... >fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b >ff779ce8-9e52-11e9-8258-e03f49ce78f3 >ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf > >...and I think that's where the planner gets the decision for a query >such as: > >DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3'; > >...which then get's executed as sequential scan instead of an index scan. > >I was also thinking about changing the selectivity function used by the >custom operator, but I didn't find any hints how to implement that >without duplicating a lot of internal code. > Not sure, I'm not very familiar with this code, so I'd have to play with it and try things. But that's hard when I don't have any code. Would it be possible to share a small self-contained test case? I wonder what does uuid_timestamp_cmp do? I suppose it first compares by a timestamp extracted from the UUID, right? It'd be interesting to see (a) statistics for the column from pg_stats, both for the table and index (which should have been built using the custom opclass, I think). (b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the default one (that can't use the timestamp condition, but it should be possible to generate smallers/largest uuid for a timestamp). BTW which PostgreSQL version is this? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-performance by date: