Re: Patch: Global Unique Index - Mailing list pgsql-hackers
From | Cary Huang |
---|---|
Subject | Re: Patch: Global Unique Index |
Date | |
Msg-id | 184a69cad00.d1c5dca03015829.3476870657827543418@highgo.ca Whole thread Raw |
In response to | Re: Patch: Global Unique Index (Simon Riggs <simon.riggs@enterprisedb.com>) |
List | pgsql-hackers |
Hi Simon
Thank you so much for sharing these valuable comments and concerns to our work. We understand there is a lot of TODOs left to be done to move forward with this in a serious matter. Your comments have been very helpful and we are very grateful.
> You don't seem to mention that this would require a uniqueness check
> on each partition. Is that correct? This would result in O(N) cost of
> uniqueness checks, severely limiting load speed. I notice you don't
> offer any benchmarks on load speed or the overhead associated with
> this, which is not good if you want to be taken seriously, but at
> least it is recoverable.
Yes, during INSERT and UPDATE, the uniqueness check happens on every partition including the current one. This introduces extra look-up costs and will limit the speed significantly especially when there is a large number of partitions. This is one drawback of global unique index that needs to be optimized / improved.
In fact, all other operations such as CREATE and ATTACH that involve global uniqueness check will have certain degree of performance loss as well. See benchmark figures below.
> (It might be necessary to specify some partitions as READ ONLY, to
> allow us to record their min/max values for the indexed cols, allowing
> us to do this more quickly.)
Thank you so much for this great suggestion, If there were an indication that some partitions have become READ ONLY, record the min/max values of their global unique indexed columns to these partitions, then we might be able to skip these partitions for uniqueness checking if the value is out of the range (min/max)? Did we understand it correctly? Could you help elaborate more?
>> 1. Global unique index is supported only on btree index type
>
> Why? Surely any index type that supports uniqueness is good.
Yes, we can definitely have the same support for other index types that support UNIQUE.
>> - Not-supported Features -
>> 1. Global uniqueness check with Sub partition tables is not yet supported as we do not have immediate use case and it may involve major change in current implementation
>
> Hmm, sounds like a problem. Arranging the calls recursively should work.
Yes, it is a matter of rearranging the recursive calls to correctly find out all "leaves" partitions to be considered for global uniqueness check. So far, only the partitions in the first layer is considered.
> My feeling is that performance on this will suck so badly that we must
> warn people away from it, and tell people if they want this, create
> the index at the start and let it load.
Yes, to support global unique index, extra logic needs to be run to ensure uniqueness and especially during INSERT and ATTACH where it needs to look up all involved partitions. We have a benchmark figures attached below.
This is also the reason that "global" syntax is required so people know they really want to have this feature. To help users better understand the potential performance drawbacks, should we add a warning in the documentation?
> Hopefully CREATE INDEX CONCURRENTLY still works.
Yes, we verified this global unique index approach on Postgres 14.5 with a community CREATE INDEX CONCURRENTLY patch on partitioned table.
> Let's see some benchmarks on this also please.
Here is a simple 'timing' comparison between regular and global unique index on a partitioned table having 6 partitions.
global unique index:
-> 156,285ms to insert 6 million records (1 million on each partition)
-> 6,592ms to delete all 6 million records
-> 3,957ms to create global unique index with 6 million records pre-inserted
-> 3,650ms to attach a new partition with 1 million records pre-inserted
-> 17ms to detach a partition with 1 million records in it
regular unique index:
-> 26,007ms to insert 6 million records (1 million on each partition)
-> 7,238ms to delete all 6 million records
-> 2,933ms to create regular unique index with 6 million records pre-inserted
-> 628ms to attach a new partition with 1 million records pre-inserted
-> 17ms to detach a partition with 1 million records in it
These are the commands I use to get the numbers (switch create unique index clause between global and regular):
-> \timing on
-> create table test(a int, b int, c text) partition by range (a);
-> create table test1 partition of test for values from (MINVALUE) to (1000000);
-> create table test2 partition of test for values from (1000000) to (2000000);
-> create table test3 partition of test for values from (2000000) to (3000000);
-> create table test4 partition of test for values from (3000000) to (4000000);
-> create table test5 partition of test for values from (4000000) to (5000000);
-> create table test6 partition of test for values from (5000000) to (6000000);
-> create unique index myindex on test(b) global;
-> insert into test values(generate_series(0,5999999), generate_series(0,5999999), 'test'); /* record timing */
-> delete from test; /* record timing */
-> drop index myindex;
-> insert into test values(generate_series(0,5999999), generate_series(0,5999999), 'test');
-> create unique index myindex on test(b) global; /* record timing */
-> create table test7 (a int, b int, c text);
-> insert into test7 values(generate_series(6000000, 6999999), generate_series(6000000, 6999999), 'test');
-> alter table test attach partition test7 for values from (6000000) TO (7000000); /* record timing */
-> alter table test detach partition test7; /* record timing */
As you can see, insert operation suffers the most performance drawbacks. In fact, it takes roughly 6 times as much time to complete the insertion, which matches the number of partitions in the test.
The Attach operation also takes roughly 6 times as much time to complete, because it has to performs uniqueness check on all 6 existing partitions to determine global uniqueness. Detach in both case takes the same time to complete.
Create global unique index takes 35% longer to build.
We also ran some tests for random SELECT and UPDATE using non-partition key with pgbench to compare the performance among 3 conditions: no index, regular unique index (with partition-key involved), and global unique index:
Test 1: scale=100, 10 partitions, 1 million tuples/partition
SELECT:
-> No partitioned index: tps = 3.827886
-> regular unique index: tps = 14.713099
-> global unique index: tps = 23791.314238
UPDATE mixed with SELECT:
-> No partitioned index: tps = 1.926013
-> regular unique index: tps = 7.087059
-> global unique index: tps = 2253.098335
Test 2: scale=1,000, 100 partitions, 1 million tuples/partition
SELECT:
-> No partitioned index: tps = 0.110029
-> regular unique index: tps = 0.268199
-> global unique index: tps = 2334.811682
UPDATE mixed with SELECT:
-> No partitioned index: tps = 0.115329
-> regular unique index: tps = 0.197052
-> global unique index: tps = 541.488621
Test 3: scale=10,000, 1,000 partitions, 1 million tuples/partition
SELECT:
-> No partitioned index: tps = 0.011047
-> regular unique index: tps = 0.036812
-> global unique index: tps = 147.189456
UPDATE mixed with SELECT:
-> No partitioned index: tps = 0.008209
-> regular unique index: tps = 0.054367
-> global unique index: tps = 57.740432
thank you very much and we hope this information could help clarify some concerns about this approach.
David and Cary
============================
HighGo Software Canada
---- On Mon, 21 Nov 2022 05:33:30 -0700 Simon Riggs wrote ---
> On Thu, 17 Nov 2022 at 22:01, Cary Huang cary.huang@highgo.ca> wrote:
> >
> > Patch: Global Unique Index
>
> Let me start by expressing severe doubt on the usefulness of such a
> feature, but also salute your efforts to contribute.
>
> > In other words, a global unique index and a regular partitioned index are essentially the same in terms of their storage structure except that one can do cross-partition uniqueness check, the other cannot.
>
> This is the only workable architecture, since it allows DETACH to be
> feasible, which is essential.
>
> You don't seem to mention that this would require a uniqueness check
> on each partition. Is that correct? This would result in O(N) cost of
> uniqueness checks, severely limiting load speed. I notice you don't
> offer any benchmarks on load speed or the overhead associated with
> this, which is not good if you want to be taken seriously, but at
> least it is recoverable.
>
> (It might be necessary to specify some partitions as READ ONLY, to
> allow us to record their min/max values for the indexed cols, allowing
> us to do this more quickly.)
>
> > - Supported Features -
> > 1. Global unique index is supported only on btree index type
>
> Why? Surely any index type that supports uniqueness is good.
>
> > - Not-supported Features -
> > 1. Global uniqueness check with Sub partition tables is not yet supported as we do not have immediate use case and it may involve majoy change in current implementation
>
> Hmm, sounds like a problem. Arranging the calls recursively should work.
>
> > - Create a global unique index -
> > To create a regular unique index on a partitioned table, Postgres has to perform heap scan and sorting on every child partition. Uniqueness check happens during the sorting phase and will raise an error if multiple tuples with the same index key are sorted together. To achieve global uniqueness check, we make Postgres perform the sorting after all of the child partitions have been scanned instead of on the "sort per partition" fashion. In otherwords, the sorting only happens once at the very end and it sorts the tuples coming from all the partitions and therefore can ensure global uniqueness.
>
> My feeling is that performance on this will suck so badly that we must
> warn people away from it, and tell people if they want this, create
> the index at the start and let it load.
>
> Hopefully CREATE INDEX CONCURRENTLY still works.
>
> Let's see some benchmarks on this also please.
>
> You'll need to think about progress reporting early because correctly
> reporting the progress and expected run times are likely critical for
> usability.
>
> > Example:
> >
> > > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);
> > > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);
> > > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);
> > > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL;
> > > INSERT INTO gidx_part values(5, 5, 'test');
> > > INSERT INTO gidx_part values(15, 5, 'test');
> > ERROR: duplicate key value violates unique constraint "gidx_part1_b_idx"
> > DETAIL: Key (b)=(5) already exists.
>
> Well done.
>
> > - DETACH -
> > Since we retain the same partitioned structure, detaching a partition with global unique index is straightforward. Upon DETACH, Postgres will change its relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their inheritance relationship as usual.
>
> It's the only way that works
>
> > - Optimizer, query planning and vacuum -
> > Since no major modification is done on global unique index's structure and storage, it works in the same way as a regular partitioned index. No major change is required to be done on optimizer, planner and vacuum process as they should work in the same way as regular index.
>
> Agreed
>
>
> Making a prototype is a great first step.
>
> The next step is to understand the good and the bad aspects of it, so
> you can see what else needs to be done. You need to be honest and real
> about the fact that this may not actually be desirable in practice, or
> in a restricted use case.
>
> That means performance analysis of create, load, attach, detach,
> INSERT, SELECT, UPD/DEL and anything else that might be affected,
> together with algorithmic analysis of what happens for larger N and
> larger tables.
>
> Expect many versions; take provisions for many days.
>
> Best of luck
>
> --
> Simon Riggs http://www.EnterpriseDB.com/
>
>
>
pgsql-hackers by date: