Re: UUID v1 optimizations... - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: UUID v1 optimizations... |
Date | |
Msg-id | 20635.1558796270@sss.pgh.pa.us Whole thread Raw |
In response to | UUID v1 optimizations... (Ancoron Luciferis <ancoron.luciferis@googlemail.com>) |
Responses |
Re: UUID v1 optimizations...
|
List | pgsql-performance |
Ancoron Luciferis <ancoron.luciferis@googlemail.com> writes: > So I investigated the PostgreSQL code to see how it is handling UUID's > with respect to storage, sorting, aso. but all I could find was that it > basically falls back to the 16-byte. Yup, they're just blobs to us. > After struggling to find a way to optimize things inside the database, I > reverted to introduce a hack into the application by not shuffling the > timestamp bytes for the UUID's, which makes it look quite serial in > terms of byte order. > So, my question now is: Would it make sense for you to handle these > time-based UUID's differently internally? Specifically un-shuffling the > timestamp before they are going to storage? No, because (1) UUID layout is standardized; (2) such a change would break on-disk compatibility for existing databases; (3) particularly for the case of application-generated UUIDs, we do not have enough information to know that this would actually do anything useful; (4) it in fact *wouldn't* do anything useful, because we'd still have to sort UUIDs in the same order as today, meaning that btree index behavior would remain the same as before. Plus UUID comparison would get a lot more complicated and slower than it is now. (5) even if we ignored all that and did it anyway, it would only help for version-1 UUIDs. The index performance issue would still remain for version-4 (random) UUIDs, which are if anything more common than v1. FWIW, I don't know what tool you're using to get those "bloat" numbers, but just because somebody calls it bloat doesn't mean that it is. The normal, steady-state load factor for a btree index is generally understood to be about 2/3rds, and that looks to be about what you're getting for the regular-UUID-format index. The fact that the serially-loaded index has nearly no dead space is because we hack the page split logic to make that happen --- but that is a hack, and it's not without downsides. It should *not* be taken to be an indication of what you can expect for any other insertion pattern. The insertion-speed aspect is a real problem, but the core of that problem is that use of any sort of standard-format UUID turns applications that might have had considerable locality of reference into applications that have none. If you can manage to keep your whole index in RAM that would not hurt too much, but as soon as it doesn't fit you have a problem. When your app has more or less predictable reference patterns it's best to design a unique key that matches that, instead of expecting that essentially-random keys will work well. Or in short, hacking up the way your app generates UUIDs is exactly the right way to proceed here. regards, tom lane
pgsql-performance by date: