Home > mailing lists

Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Performance Improvement by reducing WAL for Update Operation
Date	October 29, 2012 14:32:43
Msg-id	00a201cdb5e2$2f6d9700$8e48c500$@kapila@huawei.com Whole thread Raw
In response to	Re: Performance Improvement by reducing WAL for Update Operation (Heikki Linnakangas <hlinnakangas@vmware.com>)
List	pgsql-hackers

Tree view

On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
> On 27.10.2012 14:27, Amit Kapila wrote:
> > On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
> >> In my previous review, I said:
> >>
> >>     Given [not relying on the executor to know which columns changed],
> >> why not
> >
> > For this patch I am interested to go with delta encoding approach
> based on
> > column boundaries.
> >
> > However I shall try to do it separately and if it gives positive
> results
> > then I will share with hackers.
> > I will try with VCDiff once or let me know if you have any other
> algorithm
> > in mind.
> One idea is to use the LZ format in the WAL record, but use your
> memcmp() code to construct it. I believe the slow part in LZ compression
> is in trying to locate matches in the "history", so if you just replace
> that with your code that's aware of the column boundaries and uses
> simple memcmp() to detect what parts changed, you could create LZ
> compressed output just as quickly as the custom encoded format. It would
> leave the door open for making the encoding smarter or to do actual
> compression in the future, without changing the format and the code to
> decode it.

This is good idea. I shall try it.

In the existing algorithm for storing the new data which is not present in
the history, it needs 1 control byte for 
every 8 bytes of new data which can increase the size of the compressed
output as compare to our delta encoding approach. 

Shall we modify the LZ Algorithm little bit, so that it can work best for
our case:

Approach-1
---------------
Is it possible to increase the control data from 1 bit to 2 bits [0 - new
data, 1 - pick from history based on OFFSET-LENGTH, 2 - Length and new data]
The new bit value (2) is to handle the new field data as a continuous stream
of data, instead of treating every byte as a new data. 

Approach-2
---------------
Use only one bit for control data [0 - Length and new data, 1 - pick from
history based on OFFSET-LENGTH]
The modified bit value (0) is to handle the new field data as a continuous
stream of data, instead of treating every byte as a new data.


With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Amit Kapila
Date: 29 October 2012, 14:15:08
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL

From: Alvaro Herrera
Date: 29 October 2012, 14:32:57
Subject: Re: Creating indexes in the background

Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

Previous

Next