Home > mailing lists

Re: GIN improvements part 1: additional information - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: GIN improvements part 1: additional information
Date	December 18, 2013 11:45:25
Msg-id	CAPpHfdvmRhW_rSCo90Hcu+0xBUVvxmtgW0arKRQDbXcDdNANgw@mail.gmail.com Whole thread Raw
In response to	Re: GIN improvements part 1: additional information (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: GIN improvements part 1: additional information
List	pgsql-hackers

Tree view

On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:

On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

When values are packed into small groups, we have to either insert
inefficiently encoded value or re-encode whole right part of values.

It would probably be simplest to store newly inserted items uncompressed,
in a separate area in the page. For example, grow the list of uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.

I hacked together an implementation of a variant of Simple9, to see how it performs. Insertions are handled per the above scheme.

In a limited pg_trgm test case I've been using a lot for this, this reduces the index size about 20%, compared to varbyte encoding. It might be possible to squeeze it a bit more, I handcrafted the "selectors" in the encoding algorithm to suite our needs, but I don't actually have a good idea of how to choose them optimally. Also, the encoding can encode 0 values, but we never need to do that, so you could take advantage of that to pack items tighter.

Compression and decompression speed seems to be about the same.

Patch attached if you want to play with it. WAL replay is still broken, and there are probably bugs.

Good idea. But:
1) We'll still need item indexes in the end of page for fast scan.

Sure.

2) Storage would be easily extendable to hold additional information as
well.
Better compression shouldn't block more serious improvements.

I'm not sure I agree with that. For all the cases where you don't care about additional information - which covers all existing users for example - reducing disk size is pretty important. How are you planning to store the additional information, and how does using another encoding gets in the way of that?

I was planned to store additional information datums between varbyte-encoded tids. I was expected it would be hard to do with PFOR. However, I don't see significant problems in your implementation of Simple9 encoding. I'm going to dig deeper in your version of patch.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 18 December 2013, 11:42:22
Subject: Re: Example query causing param_info to be set in plain rel path

From: Heikki Linnakangas
Date: 18 December 2013, 11:47:10
Subject: Re: [PATCH] SQL assertions prototype

Re: GIN improvements part 1: additional information - Mailing list pgsql-hackers

Previous

Next