Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 00a201cdb5e2$2f6d9700$8e48c500$@kapila@huawei.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
> On 27.10.2012 14:27, Amit Kapila wrote:
> > On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
> >> In my previous review, I said:
> >>
> >>     Given [not relying on the executor to know which columns changed],
> >> why not
> >
> > For this patch I am interested to go with delta encoding approach
> based on
> > column boundaries.
> >
> > However I shall try to do it separately and if it gives positive
> results
> > then I will share with hackers.
> > I will try with VCDiff once or let me know if you have any other
> algorithm
> > in mind.
> One idea is to use the LZ format in the WAL record, but use your
> memcmp() code to construct it. I believe the slow part in LZ compression
> is in trying to locate matches in the "history", so if you just replace
> that with your code that's aware of the column boundaries and uses
> simple memcmp() to detect what parts changed, you could create LZ
> compressed output just as quickly as the custom encoded format. It would
> leave the door open for making the encoding smarter or to do actual
> compression in the future, without changing the format and the code to
> decode it.

This is good idea. I shall try it.

In the existing algorithm for storing the new data which is not present in
the history, it needs 1 control byte for 
every 8 bytes of new data which can increase the size of the compressed
output as compare to our delta encoding approach. 

Shall we modify the LZ Algorithm little bit, so that it can work best for
our case:

Approach-1
---------------
Is it possible to increase the control data from 1 bit to 2 bits [0 - new
data, 1 - pick from history based on OFFSET-LENGTH, 2 - Length and new data]
The new bit value (2) is to handle the new field data as a continuous stream
of data, instead of treating every byte as a new data. 

Approach-2
---------------
Use only one bit for control data [0 - Length and new data, 1 - pick from
history based on OFFSET-LENGTH]
The modified bit value (0) is to handle the new field data as a continuous
stream of data, instead of treating every byte as a new data.


With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL
Next
From: Alvaro Herrera
Date:
Subject: Re: Creating indexes in the background