Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 00a301cdb5e3$21ae7f20$650b7d60$@kapila@huawei.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Saturday, October 27, 2012 11:34 PM Noah Misch
> On Sat, Oct 27, 2012 at 04:57:46PM +0530, Amit Kapila wrote:
> > On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
> > > Could you elaborate on your reason for continuing to treat TOAST as
> a
> > > special
> > > case?  As best I recall, the only reason to do so before was the
> fact
> > > that
> > > TOAST can change the physical representation of a column even when
> > > executor
> > > did not change its logical content.  Since you're no longer relying
> on
> > > the
> > > executor's opinion of what changed, a TOASTed value is not special.
> >
> > I thought for initial version of patch, without this change, patch
> will have
> > less impact and less test.
> 
> Not that I'm aware.  If you still think so, please explain.
> 
> > For this patch I am interested to go with delta encoding approach
> based on
> > column boundaries.
> 
> Fair enough.
> 
> > > If you conclude that finding sub-column similarity is not
> worthwhile, at
> > > least
> > > teach your algorithm to aggregate runs of changing or unchanging
> columns
> > > into
> > > fewer delta instructions.  If a table contains twenty unchanging
> bool
> > > columns,
> > > you currently use at least 80 bytes to encode that fact.  By
> treating
> > > the run
> > > of columns as a unit for delta encoding purposes, you could encode
> it in
> > > 23
> > > bytes.
> >
> > Do you mean to say handle for non-continuous unchanged columns?
> 
> My statement above was a mess.
> 
> > I believe for continuous unchanged columns its already handled until
> there
> > are any alignment changes. Example
> >
> > create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool,
> f7
> > bool,
> >                  f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13
> bool,
> >                  f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19
> bool,
> >
> >                  f20 bool, f21 bool);
> >
> > insert into tbl values(10,
> >
> 't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t',
> 't',
> > 't');
> >
> > update tbl set f1 = 20;
> >
> > The delta algorithm for the above operation reduced the size of the
> tuple
> > from 24 bytes to 12 bytes.
> >
> > 4 bytes - IGN command and LEN
> > 4 bytes - ADD command and LEN
> > 4 bytes - Data block
> 
> I now see that this case is already handled.  Sorry for the noise.
> Incidentally, I tried this variant:
> 
> create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, f7
> bool,
>                  f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13
> bool,
>                  f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19
> bool,
>                  f20 bool, f21 bool, f22 int, f23 int);
> insert into tbl values(1,
> 't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t',
> 't',
> 't', 2, 3);
> update tbl set f1 = 2, f22 = 4, f23 = 6;
> 
> It yielded an erroneous delta: IGN 4, ADD 4, COPY 24, IGN 4, ADD 4, COPY
> 28,
> IGN 4, ADD 4.  (The delta happens to be longer than the data and goes
> unused).

I think with new algorithm based on inputs by you this case will be handled
in much better way.
I am planning to try 2 approaches:
1. try to Use LZ compression in the manner suggested by Heikki as if  it
works, it can be simpler.
2. devise new algorithm based on your suggestions and referring LZ/VCdiff
algorithms.

With Regards,
Amit Kapila.






pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Creating indexes in the background
Next
From: Alvaro Herrera
Date:
Subject: Re: September 2012 commitfest