Thread: primary key hash index

primary key hash index

From

Rick Otten

Date:

02 January 2018, 17:02:50

After reading this article about keys in relational databases, highlighted on hacker news this morning:

https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html

I keep pondering the performance chart, regarding uuid insert, shown towards the bottom of the article. I believe he was doing that test with PostgreSQL.

My understanding is that the performance is degrading because he has a btree primary key index. Is it possible to try a hash index or some other index type for a uuid primary key that would mitigate the performance issue he is recording?

After all, I can't think of any use case where I query for a "range" of uuid values. They are always exact matches. So a hash index would possibly be a really good fit.

I have many tables, several with more than 1 billion rows, that use uuid's as the primary key. Many of those uuid's are generated off system, so I can't play around with the uuid generation algorithm like he was doing.

I'm hoping to move to PG 10 any day now, and can migrate the data with updated index definitions if it will actually help performance in any way. (I'm always looking for ways to tweak the performance for the better any chance I get.)

Re: primary key hash index

From

Magnus Hagander

Date:

02 January 2018, 17:09:50

On Tue, Jan 2, 2018 at 3:02 PM, Rick Otten <rottenwindfish@gmail.com> wrote:

After reading this article about keys in relational databases, highlighted on hacker news this morning:
https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html

I keep pondering the performance chart, regarding uuid insert, shown towards the bottom of the article. I believe he was doing that test with PostgreSQL.

My understanding is that the performance is degrading because he has a btree primary key index. Is it possible to try a hash index or some other index type for a uuid primary key that would mitigate the performance issue he is recording?

After all, I can't think of any use case where I query for a "range" of uuid values. They are always exact matches. So a hash index would possibly be a really good fit.

I have many tables, several with more than 1 billion rows, that use uuid's as the primary key. Many of those uuid's are generated off system, so I can't play around with the uuid generation algorithm like he was doing.

I'm hoping to move to PG 10 any day now, and can migrate the data with updated index definitions if it will actually help performance in any way. (I'm always looking for ways to tweak the performance for the better any chance I get.)

Hash indexes unfortunately don't support UNIQUE indexes. At least not yet. So while you can use them for regular indexing, they cannot be used as a PRIMARY KEY.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: primary key hash index

From

Jeff Janes

Date:

04 January 2018, 23:01:11

On Tue, Jan 2, 2018 at 6:02 AM, Rick Otten <rottenwindfish@gmail.com> wrote:

After reading this article about keys in relational databases, highlighted on hacker news this morning:
https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html

I keep pondering the performance chart, regarding uuid insert, shown towards the bottom of the article. I believe he was doing that test with PostgreSQL.

My understanding is that the performance is degrading because he has a btree primary key index. Is it possible to try a hash index or some other index type for a uuid primary key that would mitigate the performance issue he is recording?

Hash indexes do not yet support primary keys, but you could always test it with just an plain index, since you already know the keys are unique via the way they are constructed. But I wouldn't expect any real improvement. Hash indexes still trigger FPW and still dirty massive numbers of pages in a random fashion (even worse than btree does as far as randomness goes but since the hash is more compact maybe more of the pages will be re-dirtied and so save on FPW or separate writes). I was surprised that turning off FPW was so effective for him, that suggests that maybe his checkpoints are too close together, which I guess means max_wal_size is too low.

Cheers,

Jeff