Re: Index Tuple Compression Approach? - Mailing list pgsql-hackers

From Chris Browne
Subject Re: Index Tuple Compression Approach?
Date
Msg-id 60r6m4pnty.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to Index Tuple Compression Approach?  (Chris Browne <cbbrowne@acm.org>)
List pgsql-hackers
qnex42@gmail.com ("Dawid Kuroczko") writes:
> On 8/14/07, Chris Browne <cbbrowne@acm.org> wrote:
>> I recently had a chat with someone who was pretty intimate with Adabas
>> for a number of years who's in the process of figuring things out
>> about PostgreSQL.  We poked at bits of the respective implementations,
>> seeing some similarities and differences.  He pointed out one aspect
>> of index handling that could (in principle) be an interesting
>> optimization.
>>
>> Evidently, in Adabas, index leaf nodes were not simply tuples, but
>> lists where the index value would not be repeated.
>>
>> In PostgreSQL, if you have the index value 'abc', and there are 10
>> tuples with that value, then you'll have a page full of tuples of the
>> following form:
>>
>> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>>
>> Now, the Adabas approach was rather different.  It would only have the
>> index value once, and then have the list of tuple pointers:
>>
>> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>>
>> This could allow a fair bit of compression, for cases where the index
>> value is not unique.
>
> Interesting.  Some time ago I've played a little with quite a big table
> which constained path (file path) as a primary key.  It did have sense
> to have a strucure (SELECTs were mostly ORDER BY path WHERE path >
> '/foo' LIMIT n).
> The actual index was only a little bit smaller than the table (there were
> maybe 4 or 5 columns there).
>
> Some time ago I've had an idea that it might be possible to compress
> th index size, even if it is a unique index.  Take the path example.
> My idea would be to to split indexed value to 8-byte chunks.
> For example: /var/lib/postgresql/8.2/main would be split into:
>   "/var/lib"
>   "/postgre"
>   "sql/8.2" -- these would be insertered into a tree as a "scaffold",
> and only vacuum should remove them..
>   "main" -- this would be a leaf node.  It could be repeated in non-unique
> indexes.
>
> [/etc/pas] -- scaffold-node
>  |-"swd"    -- leaf node
> [/etc/sha]
>  |-"dow"
> [/var/lib]   -- a problematic mixed scaffold/leaf node.
>  [/postgre]
>   |-"sql"
>   |-"sql/8.2"
>   [sql/8.2/]
>    |-"main"
>    |-"foobar"
>
> The scaffold nodes would be there to guarantee that there is some
> place to attach leafs to.  They should not be removed by DELETE
> (unless we are sure no other node depends on them).

Note that there is a well-understood name for this; this is assortedly
known as a "Radix tree" or a "Patricia trie".
 <http://en.wikipedia.org/wiki/Radix_tree>

As you observe, the tree/trie edges consist not of individual
characters, but rather of sequences of characters.

> Advantages?  The repeated values (as "/var/lib/postgresql/8.2")
> are not repeated -- they are embedded into tree, as a "scaffold",
> actual nodes that are significant (files, not directories, in my
> example) are put as actual leafs.

Certainly as compared to a traditional trie, this representation leads
to there being a whole lot less nodes and a whole lot less branching.

The Radix/Patricia tree compresses things two ways:
- As you observe, there can be fewer, larger componets
- By combining common prefixes together, this makes cases of  strings that are highly patterned much, much, much
cheaper,as the  tree branches at (and only at) the places where they tend to  differ.
 

It could conceivably make it workable to have indexes on big, highly
repeating things such as blobs of XML.  (Although it *doesn't* get you
the ability to search on substrings, which is probably what you'd also
want...)

> I guess it would reduce large indexes size and at the same time it
> could remove limitation that B-tree index cannot index values larger
> than 1/3 of the database page.  8-byte chunks was given as an
> example here, perhaps larger value would be better.
>
> (Of course redesigning schema to put directories separate from files
> woul be useful, but it would not help with ORDER BY .. LIMIT queries
> -- they would have to be JOIN-ed and re-sorted in memory I'm
> afraid).

I'll gleefully ignore the nature of the example, as it's kind of
beside the point.  The point is to try to compress what's in the
column.  If it's being abused somewhat, and has more crud in it, that
gives a potential for a *bigger* win.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/spiritual.html
"Documentation wants to be obsolete." 
-- Bruce R. Lewis


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: XID wraparound and busy databases
Next
From: Robert Treat
Date:
Subject: Re: XID wraparound and busy databases