Re: [WIP] Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit kapila
Subject Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 6C0B27F7206C9E4CA54AE035729E9C382852C382@szxeml509-mbs
Whole thread Raw
List pgsql-hackers

From: Heikki Linnakangas [mailto:heikki.linnakangas@enterprisedb.com]
Sent: Saturday, August 04, 2012 1:33 AM
On 03.08.2012 14:46, Amit kapila wrote:

>> Currently the change is done only for fixed length columns for simple tables and the tuple should not contain >>NULLS.
>
>> This is a Proof of concept, the design and implementation needs to be
>>changed based on final design required for handling other scenario's
>


> The performance will need to be re-verified after you fix these
> limitations. Those limitations need to be fixed before this can be applied.

> It would be nice to use some well-known binary delta algorithm for this,
> rather than invent our own. OTOH, we have more knowledge of the
> attribute boundaries, so a custom algorithm might work better. In any
> case, I'd like to see the code to do the delta encoding/decoding to be
> put into separate functions, outside of heapam.c. It would be good for
> readability, and we might want to reuse this in other places too.


I have modified the patch to handle varible length fields and NULLS as well. I have used custom algorithm based on
attribute boundaries.
Currently it handles updates across pages, but as a concern is raised by Robert and Simon about handle it only if the update is on same page, I can do it in the next version of patch if that is required.

Design of Update operation:
---------------------------------------------
1. Check for tuple whether the optimization can be applicable or not. For following it is applicable
     a. Tuple should not be toasted tuple
     b.  Before update triggers should return NEW
     c.   Length of modified tuple should be more than 128 bytes.
     d.  When the pull page write is enabled. Currently this is checked inside heap_update, however it can be pulled
          to ExecUpdate()
2. Identify the modified columns from the target entry.
3. Based on the modified column list, frame the wal record in the below format.
        WAL update header + Tuple header(no change from previous format) +
        COPY + offset until copy required from old tuple.
         IGN  + length needs to be ignored from the old tuple.
         PAD  + length needs to padded with zero in new tuple because of alignment.
        ADD  + length of data + data which is modified.        
        ....
        ....

Recovery Design
------------------------------
Frame the new tuple from old tuple and WAL record during recovery:

1. For the COPY command, copy the specified length from old tuple.
   Once the old tuple data copied, then increase the offset by the
   copied length.
2. For the IGN command, ignore the specified length in the old tuple.
3. For the PAD command, fill with zeros of the specified length in the new tuple.
4. For the ADD command, copy the corresponding length of data from WAL record to the new tuple.
5. Repeat this procedure until the WAL record reaches the end.
6. If any remaining left out old tuple data will be copied at last.


Test results:

1. The pgbench test run for 10min.
2. The test reult is for modified pgbench (such that total row size is 1800 and updated columns are of length 300) tpc-b testcase.
     The result and modified pgbench code is attached with mail.
3. The performance improvement shown in the m/c I have tested is quite good (more than 100% for sync commit = off).
 
Comments and further suggestions?
 
With Regards,
Amit Kapila.



Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: 9.2RC1 wraps this Thursday ...
Next
From: Qi Huang
Date:
Subject: Re: [PATCH]Tablesample Submission