Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 20121027180401.GA1870@tornado.leadboat.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Amit kapila <amit.kapila@huawei.com>)
Responses Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On Sat, Oct 27, 2012 at 04:57:46PM +0530, Amit Kapila wrote:
> On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
> > Could you elaborate on your reason for continuing to treat TOAST as a
> > special
> > case?  As best I recall, the only reason to do so before was the fact
> > that
> > TOAST can change the physical representation of a column even when
> > executor
> > did not change its logical content.  Since you're no longer relying on
> > the
> > executor's opinion of what changed, a TOASTed value is not special.
> 
> I thought for initial version of patch, without this change, patch will have
> less impact and less test.

Not that I'm aware.  If you still think so, please explain.

> For this patch I am interested to go with delta encoding approach based on
> column boundaries.

Fair enough.

> > If you conclude that finding sub-column similarity is not worthwhile, at
> > least
> > teach your algorithm to aggregate runs of changing or unchanging columns
> > into
> > fewer delta instructions.  If a table contains twenty unchanging bool
> > columns,
> > you currently use at least 80 bytes to encode that fact.  By treating
> > the run
> > of columns as a unit for delta encoding purposes, you could encode it in
> > 23
> > bytes.  
> 
> Do you mean to say handle for non-continuous unchanged columns?

My statement above was a mess.

> I believe for continuous unchanged columns its already handled until there
> are any alignment changes. Example
> 
> create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, f7
> bool, 
>                  f8 bool, f9 bool, f10 bool, f11 bool, f12 bool, f13 bool, 
>                  f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19 bool,
> 
>                  f20 bool, f21 bool); 
> 
> insert into tbl values(10,
> 't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t',
> 't'); 
> 
> update tbl set f1 = 20; 
> 
> The delta algorithm for the above operation reduced the size of the tuple
> from 24 bytes to 12 bytes. 
> 
> 4 bytes - IGN command and LEN 
> 4 bytes - ADD command and LEN 
> 4 bytes - Data block 

I now see that this case is already handled.  Sorry for the noise.
Incidentally, I tried this variant:

create table tbl(f1 int, f2 bool, f3 bool, f4 bool, f5 bool, f6 bool, f7 bool,                f8 bool, f9 bool, f10
bool,f11 bool, f12 bool, f13 bool,                f14 bool, f15 bool, f16 bool, f17 bool, f18 bool, f19 bool,
    f20 bool, f21 bool, f22 int, f23 int);
 
insert into tbl values(1,
't','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t','t',
't', 2, 3);
update tbl set f1 = 2, f22 = 4, f23 = 6;

It yielded an erroneous delta: IGN 4, ADD 4, COPY 24, IGN 4, ADD 4, COPY 28,
IGN 4, ADD 4.  (The delta happens to be longer than the data and goes unused).

nm



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal - assign result of query to psql variable
Next
From: Heikki Linnakangas
Date:
Subject: Re: Logical to physical page mapping