Re: Patch: Global Unique Index - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Patch: Global Unique Index
Date
Msg-id CANbhV-Eo4D2Njpa=YiBr7BxOMDi5B54wenjgoamRKjzj-9gBGQ@mail.gmail.com
Whole thread Raw
In response to Patch: Global Unique Index  (Cary Huang <cary.huang@highgo.ca>)
Responses Re: Patch: Global Unique Index
Re: Patch: Global Unique Index
List pgsql-hackers
On Thu, 17 Nov 2022 at 22:01, Cary Huang <cary.huang@highgo.ca> wrote:
>
> Patch: Global Unique Index

Let me start by expressing severe doubt on the usefulness of such a
feature, but also salute your efforts to contribute.

> In other words, a global unique index and a regular partitioned index are essentially the same in terms of their
storagestructure except that one can do cross-partition uniqueness check, the other cannot. 

This is the only workable architecture, since it allows DETACH to be
feasible, which is essential.

You don't seem to mention that this would require a uniqueness check
on each partition. Is that correct? This would result in O(N) cost of
uniqueness checks, severely limiting load speed. I notice you don't
offer any benchmarks on load speed or the overhead associated with
this, which is not good if you want to be taken seriously, but at
least it is recoverable.

(It might be necessary to specify some partitions as READ ONLY, to
allow us to record their min/max values for the indexed cols, allowing
us to do this more quickly.)

> - Supported Features -
> 1. Global unique index is supported only on btree index type

Why? Surely any index type that supports uniqueness is good.

> - Not-supported Features -
> 1. Global uniqueness check with Sub partition tables is not yet supported as we do not have immediate use case and it
mayinvolve majoy change in current implementation 

Hmm, sounds like a problem. Arranging the calls recursively should work.

> - Create a global unique index -
> To create a regular unique index on a partitioned table, Postgres has to perform heap scan and sorting on every child
partition.Uniqueness check happens during the sorting phase and will raise an error if multiple tuples with the same
indexkey are sorted together. To achieve global uniqueness check, we make Postgres perform the sorting after all of the
childpartitions have been scanned instead of on the "sort per partition" fashion. In otherwords, the sorting only
happensonce at the very end and it sorts the tuples coming from all the partitions and therefore can ensure global
uniqueness.

My feeling is that performance on this will suck so badly that we must
warn people away from it, and tell people if they want this, create
the index at the start and let it load.

Hopefully CREATE INDEX CONCURRENTLY still works.

Let's see some benchmarks on this also please.

You'll need to think about progress reporting early because correctly
reporting the progress and expected run times are likely critical for
usability.

> Example:
>
> > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);
> > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);
> > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);
> > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL;
> > INSERT INTO gidx_part values(5, 5, 'test');
> > INSERT INTO gidx_part values(15, 5, 'test');
> ERROR:  duplicate key value violates unique constraint "gidx_part1_b_idx"
> DETAIL:  Key (b)=(5) already exists.

Well done.

> - DETACH -
> Since we retain the same partitioned structure, detaching a partition with global unique index is straightforward.
UponDETACH, Postgres will change its relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their inheritance
relationshipas usual. 

It's the only way that works

> - Optimizer, query planning and vacuum -
> Since no major modification is done on global unique index's structure and storage, it works in the same way as a
regularpartitioned index. No major change is required to be done on optimizer, planner and vacuum process as they
shouldwork in the same way as regular index. 

Agreed


Making a prototype is a great first step.

The next step is to understand the good and the bad aspects of it, so
you can see what else needs to be done. You need to be honest and real
about the fact that this may not actually be desirable in practice, or
in a restricted use case.

That means performance analysis of create, load, attach, detach,
INSERT, SELECT, UPD/DEL and anything else that might be affected,
together with algorithmic analysis of what happens for larger N and
larger tables.

Expect many versions; take provisions for many days.

Best of luck

--
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Catalog_xmin is not advanced when a logical slot is lost
Next
From: Naeem Akhter
Date:
Subject: Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row