Re: Patch: Global Unique Index - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Patch: Global Unique Index |
Date | |
Msg-id | CANbhV-Eo4D2Njpa=YiBr7BxOMDi5B54wenjgoamRKjzj-9gBGQ@mail.gmail.com Whole thread Raw |
In response to | Patch: Global Unique Index (Cary Huang <cary.huang@highgo.ca>) |
Responses |
Re: Patch: Global Unique Index
Re: Patch: Global Unique Index |
List | pgsql-hackers |
On Thu, 17 Nov 2022 at 22:01, Cary Huang <cary.huang@highgo.ca> wrote: > > Patch: Global Unique Index Let me start by expressing severe doubt on the usefulness of such a feature, but also salute your efforts to contribute. > In other words, a global unique index and a regular partitioned index are essentially the same in terms of their storagestructure except that one can do cross-partition uniqueness check, the other cannot. This is the only workable architecture, since it allows DETACH to be feasible, which is essential. You don't seem to mention that this would require a uniqueness check on each partition. Is that correct? This would result in O(N) cost of uniqueness checks, severely limiting load speed. I notice you don't offer any benchmarks on load speed or the overhead associated with this, which is not good if you want to be taken seriously, but at least it is recoverable. (It might be necessary to specify some partitions as READ ONLY, to allow us to record their min/max values for the indexed cols, allowing us to do this more quickly.) > - Supported Features - > 1. Global unique index is supported only on btree index type Why? Surely any index type that supports uniqueness is good. > - Not-supported Features - > 1. Global uniqueness check with Sub partition tables is not yet supported as we do not have immediate use case and it mayinvolve majoy change in current implementation Hmm, sounds like a problem. Arranging the calls recursively should work. > - Create a global unique index - > To create a regular unique index on a partitioned table, Postgres has to perform heap scan and sorting on every child partition.Uniqueness check happens during the sorting phase and will raise an error if multiple tuples with the same indexkey are sorted together. To achieve global uniqueness check, we make Postgres perform the sorting after all of the childpartitions have been scanned instead of on the "sort per partition" fashion. In otherwords, the sorting only happensonce at the very end and it sorts the tuples coming from all the partitions and therefore can ensure global uniqueness. My feeling is that performance on this will suck so badly that we must warn people away from it, and tell people if they want this, create the index at the start and let it load. Hopefully CREATE INDEX CONCURRENTLY still works. Let's see some benchmarks on this also please. You'll need to think about progress reporting early because correctly reporting the progress and expected run times are likely critical for usability. > Example: > > > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a); > > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10); > > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20); > > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL; > > INSERT INTO gidx_part values(5, 5, 'test'); > > INSERT INTO gidx_part values(15, 5, 'test'); > ERROR: duplicate key value violates unique constraint "gidx_part1_b_idx" > DETAIL: Key (b)=(5) already exists. Well done. > - DETACH - > Since we retain the same partitioned structure, detaching a partition with global unique index is straightforward. UponDETACH, Postgres will change its relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their inheritance relationshipas usual. It's the only way that works > - Optimizer, query planning and vacuum - > Since no major modification is done on global unique index's structure and storage, it works in the same way as a regularpartitioned index. No major change is required to be done on optimizer, planner and vacuum process as they shouldwork in the same way as regular index. Agreed Making a prototype is a great first step. The next step is to understand the good and the bad aspects of it, so you can see what else needs to be done. You need to be honest and real about the fact that this may not actually be desirable in practice, or in a restricted use case. That means performance analysis of create, load, attach, detach, INSERT, SELECT, UPD/DEL and anything else that might be affected, together with algorithmic analysis of what happens for larger N and larger tables. Expect many versions; take provisions for many days. Best of luck -- Simon Riggs http://www.EnterpriseDB.com/
pgsql-hackers by date: