Re: Partitioning, Identity and Uniqueness (given pg 16 changes) - Mailing list pgsql-general
From | Ketan Popat |
---|---|
Subject | Re: Partitioning, Identity and Uniqueness (given pg 16 changes) |
Date | |
Msg-id | CAKx+mvZf9xsr1EtdtRm=wiY5vpb3tZMyw9yyemiQw9_=4UvpDA@mail.gmail.com Whole thread Raw |
In response to | Re: Partitioning, Identity and Uniqueness (given pg 16 changes) (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-general |
1) Is my attempt to improve performance of insert AND (as the number of sources goes up, as it has) querying which is invariably by id but it is easy to include "and src = x" in the majority of usage (the other common case is to query across all sources, asking for a set is unusual) a reasonable approach?2) It would be nice to be able to specify the id as pk on the table being partitioned (as it was in the non-partitioned definition of the table) once to document and enforce that the partitions simply inherit the id pk. This would seem only to need the "partition by" validation to allow a column not mentioned in partition by clause to be defined as pk or unique if and only if the pk/unique column is an identity column. Not a big deal but is this practical/valid?
One of the potential solutions could be to include src into your partitioning key as follows.
create table tpart (id int generated always as identity not null, src int, data varchar, primary key (id, src)) partition by hash(src) ;
create table tpart_p001 partition of ttest_part for values with (modulus 4, remainder 0);
...
create table tpart_p032 partition of ttest_part for values with (modulus 32, remainder 31);
App side changes required:
- Ensure all queries come with src as a filter, otherwise it could backfire performance due to partition pruning overhead.
Here is how it could help:
- "generated always as identity" for id would ensure unique id and src can be used for hash partitioning.
- insert performance may improve if you are seeing resource contention related wait events during insert.
- it would greatly help with the access pattern querying time series for single src
Thanks,
Ketan
On Sun, Feb 18, 2024 at 8:23 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 19 Feb 2024 at 16:32, Darryl Green <darryl.green@gmail.com> wrote:
> 2) It would be nice to be able to specify the id as pk on the table being partitioned (as it was in the non-partitioned definition of the table) once to document and enforce that the partitions simply inherit the id pk. This would seem only to need the "partition by" validation to allow a column not mentioned in partition by clause to be defined as pk or unique if and only if the pk/unique column is an identity column. Not a big deal but is this practical/valid?
Unfortunately, it's not as easy as you think. It's not a matter of
dropping the check that requires all PRIMARY KEY columns are present
in the PARTITION BY clause. For this to work a *single* index (i.e.
non-partitioned index) would have to index all partitions. Otherwise,
how would Postgres ensure that the value being inserted doesn't exist
in some other partition?
The problem with a single index is that it kinda defeats the purpose
of partitioning, i.e., "my table is large and I want to split it up".
Operations such as DETACH PARTITION would have to become more than
just a metadata operation when you consider having to trawl through
the index and remove all records belonging to a single partition.
It may be possible to still have it work by doing a speculative record
in the index for the target table then go and check all of the other
indexes before marking the speculative entry as valid. I think it
would be very tricky to make it work well, however. Imagine how
expensive ATTACH PARTITION would be! There are probably other race
conditions I've not thought about too. Likely, we'd get more
complaints about this being a terrible feature than we do due to the
fact that it's unsupported.
David
pgsql-general by date: