Thread: Primary keys and composite unique keys(basic question)
Hello,
We have UUIDs in our tables which are primary keys. But in some cases
we also identify a composite unique key apart from the primary key.
My assumption is that there should be a unique key index created by us using the composite key. And when we fetch using this composite key instead of the primary key we have a performance boost due to the index.
Is this right ? Are there more details I should read to understand this better ? Please point. If I should use the query planner to look at the statistics I will.
.
Thanks,
Mohan
Mohan Radhakrishnan <radhakrishnan.mohan@gmail.com> writes: > We have UUIDs in our tables which are primary keys. But in > some cases > we also identify a composite unique key apart from the primary key. > My assumption is that there should be a unique key index created by us > using the composite key. And when we fetch using this composite key instead > of the primary key we have a performance boost due to the index. You haven't provided a lot of detail, but use-a-UUID-as-a-primary-key is often an antipattern. The UUIDs are quasi-random, meaning there's no locality of reference in the primary key index, resulting in inefficiency in searches and insertions. If the composite key you mention has some actual relationship to your application's usage patterns, it could be winning as a result of better locality of access to that index. regards, tom lane
I will cover the UUIDs first. They are indispensable to us.
1. The data is distributed over regions So we need the row to be unique.
2. This distributed data is sent to services as events. That is the application architecture.
But we don't search using UUIDs always. Only when data from another distributed service
is received we need them and in such cases we have to join using them.
But for local data we can identify another composite unique key. Does PostgreSql
create a unique index for us ? What about a FK that references this composite
unique key ? Does it create a FK index ?
Thank you.
On Wed, Mar 31, 2021 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mohan Radhakrishnan <radhakrishnan.mohan@gmail.com> writes:
> We have UUIDs in our tables which are primary keys. But in
> some cases
> we also identify a composite unique key apart from the primary key.
> My assumption is that there should be a unique key index created by us
> using the composite key. And when we fetch using this composite key instead
> of the primary key we have a performance boost due to the index.
You haven't provided a lot of detail, but use-a-UUID-as-a-primary-key
is often an antipattern. The UUIDs are quasi-random, meaning there's
no locality of reference in the primary key index, resulting in
inefficiency in searches and insertions. If the composite key you
mention has some actual relationship to your application's usage
patterns, it could be winning as a result of better locality of
access to that index.
regards, tom lane
Etiquette on these lists is to reply in line or below the relevant portion, not top-post with full quoting like default gmail behavior.
On Wed, Mar 31, 2021 at 9:18 AM Mohan Radhakrishnan <radhakrishnan.mohan@gmail.com> wrote:
But we don't search using UUIDs always. Only when data from another distributed serviceis received we need them and in such cases we have to join using them.
I haven't used them so I don't recall exactly, but I believe there is a type of UUID generation which has some leading correlation to time which would help with reducing the random I/O issue that Tom Lane mentioned. A quick search of the archive may lead you to that, or someone else may chime in with the name I expect.
But for local data we can identify another composite unique key. Does PostgreSqlcreate a unique index for us ? What about a FK that references this compositeunique key ? Does it create a FK index ?
It is up to you to create whichever fkeys and indexes you require.
On Wed, Mar 31, 2021 at 3:36 AM Mohan Radhakrishnan <radhakrishnan.mohan@gmail.com> wrote: > > Hello, > We have UUIDs in our tables which are primary keys. But in some cases > we also identify a composite unique key apart from the primary key. > > My assumption is that there should be a unique key index created by us using the composite key. And when we fetch usingthis composite key instead of the primary key we have a performance boost due to the index. > > Is this right ? Are there more details I should read to understand this better ? Please point. If I should use the queryplanner to look at the statistics I will. This is one of the great debates in computer science and it is not settled. There are various tradeoffs around using a composite key derived from the data (aka natural key) vs generated identifiers. It's a complex topic with many facets: performance, organization, validation, and correctness are all relevant considerations. I would never use UUIDS for keys though. merlin
On 4/1/21 8:28 PM, Merlin Moncure wrote: > > This is one of the great debates in computer science and it is not > settled. There are various tradeoffs around using a composite key > derived from the data (aka natural key) vs generated identifiers. It's > a complex topic with many facets: performance, organization, > validation, and correctness are all relevant considerations. I would > never use UUIDS for keys though. > > merlin > > And, pray tell, for what exactly would you use universally unique identifiers.
I used uuid4 for customer ids because i needed to interface with payment providers. Is that wrong? All other places except transaction ids, i have used serial ints
On Fri 2 Apr, 2021, 8:56 AM Rob Sargent, <robjsargent@gmail.com> wrote:
On 4/1/21 8:28 PM, Merlin Moncure wrote:
>
> This is one of the great debates in computer science and it is not
> settled. There are various tradeoffs around using a composite key
> derived from the data (aka natural key) vs generated identifiers. It's
> a complex topic with many facets: performance, organization,
> validation, and correctness are all relevant considerations. I would
> never use UUIDS for keys though.
>
> merlin
>
>
And, pray tell, for what exactly would you use universally unique
identifiers.
On Fri, Apr 2, 2021 at 8:57 AM Hemil Ruparel <hemilruparel2002@gmail.com> wrote:
I used uuid4 for customer ids because i needed to interface with payment providers. Is that wrong? All other places except transaction ids, i have used serial intsOn Fri 2 Apr, 2021, 8:56 AM Rob Sargent, <robjsargent@gmail.com> wrote:On 4/1/21 8:28 PM, Merlin Moncure wrote:
>
> This is one of the great debates in computer science and it is not
> settled. There are various tradeoffs around using a composite key
> derived from the data (aka natural key) vs generated identifiers. It's
> a complex topic with many facets: performance, organization,
> validation, and correctness are all relevant considerations. I would
> never use UUIDS for keys though.
>
> merlin
>
>
And, pray tell, for what exactly would you use universally unique
identifiers.
ULID perform better than UUID with btree indexes. The first part is based on a timestamp, so if you don't mind exposing this information, they are a better choice. There is an implementation for postgres here: https://github.com/geckoboard/pgulid
Maybe there are others.
On Thu, 2021-04-01 at 21:28 -0500, Merlin Moncure wrote: > I would never use UUIDS for keys though. That makes me curious for your reasons. I see the following disadvantages: - A UUID requires twice as much storage space as a bigint. - B-tree indexes are space optimized for inserting at the rightmost leaf page, but UUIDs are random. - UUIDs are more expensive to generate. On the other hand, many processes trying to insert into the same index page might lead to contention. Is there anything I have missed? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Thu, Apr 1, 2021 at 10:26 PM Rob Sargent <robjsargent@gmail.com> wrote: > > On 4/1/21 8:28 PM, Merlin Moncure wrote: > > > > This is one of the great debates in computer science and it is not > > settled. There are various tradeoffs around using a composite key > > derived from the data (aka natural key) vs generated identifiers. It's > > a complex topic with many facets: performance, organization, > > validation, and correctness are all relevant considerations. I would > > never use UUIDS for keys though. > > > > merlin > > > > > And, pray tell, for what exactly would you use universally unique > identifiers. I don't disagree that UUID are an ok choice in that scenario. I'll tell you what though, that scenario comes up fairly rarely. However, there are a couple of alternatives if you're curious. *) Generate ids from a generator service. This pattern is fairly common. It has some downsides (slower, more complicated inserts mainly) but works well in other ways. You can mitigate the performance downsides by allocated identifiers in blocks. *) Use sequences, but with a sequence id added as a composite or maksed into the integer. This works pretty well in practice. merlin
On Fri, Apr 2, 2021 at 3:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2021-04-01 at 21:28 -0500, Merlin Moncure wrote: > > I would never use UUIDS for keys though. > > That makes me curious for your reasons. > > I see the following disadvantages: > > - A UUID requires twice as much storage space as a bigint. > > - B-tree indexes are space optimized for inserting at the > rightmost leaf page, but UUIDs are random. > > - UUIDs are more expensive to generate. > > On the other hand, many processes trying to insert into > the same index page might lead to contention. > > Is there anything I have missed? It's a small thing, but UUIDs are absolutely not memorizable by humans; they have zero semantic value. Sequential numeric identifiers are generally easier to transpose and the value gives some clues to its age (of course, in security contexts this can be a downside). Performance-wise, UUIDS are absolutely horrible for data at scale as Tom rightly points out. Everything is randomized, just awful. There are some alternate implementations of UUID that mitigate this but I've never seen them used in the wild in actual code. merlin
I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).
Performance-wise, UUIDS are absolutely horrible for data at scale as
Tom rightly points out. Everything is randomized, just awful. There
are some alternate implementations of UUID that mitigate this but I've
never seen them used in the wild in actual code.
On Mon, Apr 5, 2021 at 9:37 PM Rob Sargent <robjsargent@gmail.com> wrote: > > It's a small thing, but UUIDs are absolutely not memorizable by > humans; they have zero semantic value. Sequential numeric identifiers > are generally easier to transpose and the value gives some clues to > its age (of course, in security contexts this can be a downside). > > I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happenedto perfectly match the wrong thing. > > Performance-wise, UUIDS are absolutely horrible for data at scale as > Tom rightly points out. Everything is randomized, just awful. There > are some alternate implementations of UUID that mitigate this but I've > never seen them used in the wild in actual code. > > > That b-tree’s have been optimized to handle serial ints might be a considered a reaction to that popular (and distasteful)choice. Perhaps there should be a ’non-optimized’ option. It's not just the BTree, but the heap as well. For large tables, you are pretty much guaranteed to read a page for each record you want to load via the key regardless of the pattern of access. It's incredibly wasteful regardless of the speed of the underlying storage fabric. Very few developers actually understand this. If computers were infinitely fast this wouldn't matter, but they aren't :-). merlin
On 4/5/21 9:37 PM, Rob Sargent wrote:
People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.
I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).
People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:
No the problem is “start from one”. User has item/I’d 10875 in hand and types in 10785 which of course in a sequence supplied ID steam is perfectly valid and wrong. Really hard to track down. On 4/5/21 9:37 PM, Rob Sargent wrote:I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).
People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.
--
Angular momentum makes the world go 'round.
On 4/7/21 11:35 AM, Rob Sargent wrote:
That's my point. Adding a check digit (turning 10875 into 108753) would have caught that, since 107853 does not match 107854 (which is 10785 with a check digit added).
On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:No the problem is “start from one”. User has item/I’d 10875 in hand and types in 10785 which of course in a sequence supplied ID steam is perfectly valid and wrong. Really hard to track down. On 4/5/21 9:37 PM, Rob Sargent wrote:I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).
People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.
That's my point. Adding a check digit (turning 10875 into 108753) would have caught that, since 107853 does not match 107854 (which is 10785 with a check digit added).
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 4/7/21 11:59 AM, Ron wrote:
On 4/7/21 11:35 AM, Rob Sargent wrote:Well you forget that 108753 is also a number in the series from 1 to maxint. Maybe you're on to something though: a checksum dispensing sequence!On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:No the problem is “start from one”. User has item/I’d 10875 in hand and types in 10785 which of course in a sequence supplied ID steam is perfectly valid and wrong. Really hard to track down. On 4/5/21 9:37 PM, Rob Sargent wrote:I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).
People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.
That's my point. Adding a check digit (turning 10875 into 108753) would have caught that, since 107853 does not match 107854 (which is 10785 with a check digit added).
On 4/7/21 1:16 PM, Rob Sargent wrote:
Call a function which reads the next value from the sequence, appends the check digit and returns that number. We were doing that 25 years ago.
On 4/7/21 11:59 AM, Ron wrote:On 4/7/21 11:35 AM, Rob Sargent wrote:Well you forget that 108753 is also a number in the series from 1 to maxint. Maybe you're on to something though: a checksum dispensing sequence!On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:No the problem is “start from one”. User has item/I’d 10875 in hand and types in 10785 which of course in a sequence supplied ID steam is perfectly valid and wrong. Really hard to track down. On 4/5/21 9:37 PM, Rob Sargent wrote:I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).
People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.
That's my point. Adding a check digit (turning 10875 into 108753) would have caught that, since 107853 does not match 107854 (which is 10785 with a check digit added).
Call a function which reads the next value from the sequence, appends the check digit and returns that number. We were doing that 25 years ago.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 2021-04-07 10:35:55 -0600, Rob Sargent wrote: > On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote: > > On 4/5/21 9:37 PM, Rob Sargent wrote: > > It's a small thing, but UUIDs are absolutely not memorizable by > humans; they have zero semantic value. Sequential numeric > identifiers > are generally easier to transpose and the value gives some clues to > its age (of course, in security contexts this can be a downside). > > > I take the above as a definite plus. Spent too much of my life > correcting others’ use of “remembered” id’s that just happened to > perfectly match the wrong thing. > > > People seem to have stopped appending check digits to identifiers about 20 > years ago, and I'm not sure why. > > > No the problem is “start from one”. User has item/I’d 10875 in hand and types > in 10785 which of course in a sequence supplied ID steam is perfectly valid and > wrong. Really hard to track down. What I've often done when writing software for an Oracle RDBMS is to use a global sequence instead of one sequence per table (Oracle didn't have a SERIAL type, so you had to use an explicit sequence and write trigger anyway). That caught wrong joins (an id value used in one table is never used in another table) as well as most typos (since ids in a single table were not dense). With PostgreSQL I've stopped doing this since the SERIAL type makes it much more convenient to have a separate sequence per table. But of course that means that almost any table will have a row with id 10785 and one with 10875. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"