Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id CAA4eK1LOR9OYiOuDwg0SfgrQ0_Zw1q5MckB6THyUUK=2oY4rKA@mail.gmail.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Performance Improvement by reducing WAL for Update Operation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Feb 6, 2014 at 5:57 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Considering above change as correct, I have tried to see the worst
> case overhead for this patch by having new tuple such that after
> 25% or so of suffix/prefix match, there is a small change in tuple
> and kept rest of tuple same as old tuple and it shows overhead
> for this patch as well.
>
> Updated test script is attached.
>
> Unpatched
>              testname             | wal_generated |     duration
> ----------------------------------+---------------+------------------
>  ten long fields, 8 bytes changed |     348843824 | 5.56866788864136
>  ten long fields, 8 bytes changed |     348844800 | 5.84434294700623
>  ten long fields, 8 bytes changed |     350500000 | 5.92329406738281
> (3 rows)
>
>
>
> wal-update-prefix-suffix-encode-1.patch
>
>              testname             | wal_generated |     duration
> ----------------------------------+---------------+------------------
>  ten long fields, 8 bytes changed |     348845624 | 6.92243480682373
>  ten long fields, 8 bytes changed |     348847000 | 8.35828399658203
>  ten long fields, 8 bytes changed |     350204752 | 7.61826491355896
> (3 rows)
>
> One minor point, can we avoid having prefix tag if prefixlen is 0.
>
> + /* output prefix as a tag */
> + pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen);

I think generating out tag for suffix/prefix has one bug i.e it doesn't
consider the max length of 273 bytes (PGLZ_MAX_MATCH ) which
is mandatory for LZ format.

One more point about this patch is that in function pgrb_delta_encode(),
is it mandatory to return at end in below check:

if (result_size > result_max)
return false;

I mean to say as before starting for copying literal bytes we have
already ensured that the compressed data is greater than >25%, so
may be we can avoid this check. I have tried to take the data by
removing this check and found that it reduces overhead and improves
WAL reduction as well. The data is as below (compare this with data
in above mail for unpatched version data):

            testname             | wal_generated |     duration
----------------------------------+---------------+------------------ten long fields, 8 bytes changed |     300705552 |
6.51416897773743tenlong fields, 8 bytes changed |     300703816 | 6.85267090797424ten long fields, 8 bytes changed |
300701840 | 7.15832996368408
 
(3 rows)

If we want to go with this approach, then I think apart from above
points there is no major change required (may be some comments,
function names etc. can be improved).

> But if we really just want to do prefix/suffix compression, this is a
> crappy and expensive way to do it.  We needn't force everything
> through the pglz tag format just because we elide a common prefix or
> suffix.

Here are you bothered about below code where the patch is
doing byte-by-byte copy after prefix/suffix match?

/* output bytes between prefix and suffix as literals */
dp = &source[prefixlen];
dend = &source[slen - suffixlen];
while (dp < dend)
{
pgrb_out_literal(ctrlp, ctrlb, ctrl, bp, *dp);
dp++; /* Do not do this ++ in the line above! */
}

I think if we want to change LZ format, it will be bit more work and
verification for decoding has to be done much more strenuously.

Note - During performance test, I have focussed mainly on worst case,
because we already know that this idea is good for best and average cases.
However if we decide that this is better and good to proceed, I can take
the data for other cases as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Marios Vodas
Date:
Subject: GiST, getting the record in table that the leaf entry points to
Next
From: Tom Lane
Date:
Subject: Re: jsonb and nested hstore