B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | B-Tree support function number 3 (strxfrm() optimization) |
Date | |
Msg-id | CAM3SWZSP7TnJAw2mS2sWOk1_ufu8bz_5MYw6d_T=Wbcf4gSz8g@mail.gmail.com Whole thread Raw |
Responses |
Re: B-Tree support function number 3 (strxfrm() optimization)
|
List | pgsql-hackers |
I have thought a little further about my proposal to have inner B-Tree pages store strxfrm() blobs [1]; my earlier conclusion was that this could not be trusted when equality was indicated [2] due to the "Hungarian problem" [3]. As I noted earlier, that actually doesn't really matter, because we may have missed what's really so useful about strxfrm() blobs: We *can* trust a non-zero result as a proxy for what we would have gotten with a proper bttextcmp(), even if a strcmp() only looks at the first byte of a blob for each of two text strings being compared. Here is what the strxfrm() blob looks like for some common English words when put through glibc's strxfrm() with the en_US.UTF-8 collation (I wrote a wrapper function to do this and output bytea a couple of years ago): [local]/postgres=# select strxfrm_test('Yellow'); strxfrm_test --------------------------------------------\x241017171a220109090909090901100909090909 (1 row) [local]/postgres=# select strxfrm_test('Red'); strxfrm_test --------------------------\x1d100f0109090901100909 (1 row) [local]/postgres=# select strxfrm_test('Orange'); strxfrm_test --------------------------------------------\x1a1d0c1912100109090909090901100909090909 (1 row) [local]/postgres=# select strxfrm_test('Green'); strxfrm_test --------------------------------------\x121d101019010909090909011009090909 (1 row) [local]/postgres=# select strxfrm_test('White'); strxfrm_test --------------------------------------\x2213141f10010909090909011009090909 (1 row) [local]/postgres=# select strxfrm_test('Black'); strxfrm_test --------------------------------------\x0d170c0e16010909090909011009090909 (1 row) Obviously, all of these blobs, while much larger than the original string still differ in their first byte. It's almost as if they're intended to be truncated. The API I envisage is a new support function 3 that operator class authors may optionally provide. Within the support function a text varlena argument is passed, or whatever the type that relates to the opclass happens to be. It returns a Datum to the core system. That datum is always what is actually passed to their sort support routine (B-Tree support function number 2) iff a support function 3 is provided (you must provide number 2 if you provide number 3, but not vice-versa). In respect of support-function-3-providing opclasses, the core system is entitled to take the sort support return value as a proxy for what a proper support function 1 call would indicate iff the sort support routine does not return 0 in respect of two support-function-3 blobs (typically strxfrm() blobs truncated at 8 bytes for convenient storage as pass-by-value Datums). Otherwise, a proper call to support function 1, with fully-formed text arguments is required. I see at least two compelling things we can do with these blobs: 1. Store them as pseudo-columns of inner B-Tree IndexTuples (not leaf pages). Naturally, inner pages are very heterogeneous, so only having 8 bytes is very probably an excellent trade-off there. Typically 1-3% of B-Tree pages are inner pages, so the bloat risk seems acceptable. 2. When building a SortTuple array within TupleSort, we can store far more of these truncated blobs in memory than we can proper strings. if SortTuple.datum1 (the first column to sort on among tuples being sorted, which is currently stored in memory as an optimization) was just an 8 byte truncated strxfrm() blob, and not a pointer to a text string in memory, that would be pretty great for performance for several reasons. So just as with B-Tree inner pages, for SortTuples there can be a pseudo leading key - we need only compare additional sort keys/heap_getattr() when we need a tie-breaker, when those 8 bytes aren't enough to reach a firm conclusion. It doesn't just stop with strxfrm() blobs, though. Why couldn't we create blobs that can be used as reliable proxies for numerics, that are just integers? Sure, you need to reserve a bit to indicate an inability to represent the original value, and possibly work out other details like that, but the only negative thing you can say about that, or indeed applying these techniques to any operator class is that it might not help in certain worst cases (mostly contrived cases). Still, the overhead of doing that bit of extra work is surely quite low anyway - at worst, a extra few instructions wasted per comparison - making these techniques likely quite profitable for the vast majority of real-world applications. Does anyone else want to pick this idea up and run with it? I don't think I'll have time for it. [1] http://www.postgresql.org/message-id/CAM3SWZTcXrdDZSpA11qZXiyo4_jtxwjaNdZpnY54yjzq7d64=A@mail.gmail.com [2] http://www.postgresql.org/message-id/CAM3SWZS7wewrBmRGCi9_yCX49Ug6UgqN2xNGJG3Zq5v8LbDU4g@mail.gmail.com [3] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad -- Peter Geoghegan
pgsql-hackers by date: