Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 20140215145109.GC19470@alap3.anarazel.de
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi,


Some quick review comments:

On 2014-02-13 18:14:54 +0530, Amit Kapila wrote:
> +    /*
> +     * EWT can be generated for all new tuple versions created by Update
> +     * operation. Currently we do it when both the old and new tuple versions
> +     * are on same page, because during recovery if the page containing old
> +     * tuple is corrupt, it should not cascade that corruption to other pages.
> +     * Under the general assumption that for long runs most updates tend to
> +     * create new tuple version on same page, there should not be significant
> +     * impact on WAL reduction or performance.
> +     *
> +     * We should not generate EWT when we need to backup the whole block in
> +     * WAL as in that case there is no saving by reduced WAL size.
> +     */
> +
> +    if (RelationIsEnabledForWalCompression(reln) &&
> +        (oldbuf == newbuf) &&
> +        !XLogCheckBufferNeedsBackup(newbuf))
> +    {
> +        uint32        enclen;

You should note that thew check for RelationIsEnabledForWalCompression()
here is racy and that that's ok because the worst that can happen is
that a uselessly generated delta.
>      xlrec.target.node = reln->rd_node;
>      xlrec.target.tid = oldtup->t_self;
>      xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup->t_data);
> @@ -6619,6 +6657,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
>      xlrec.newtid = newtup->t_self;
>      if (new_all_visible_cleared)
>          xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
> +    if (compressed)
> +        xlrec.flags |= XLOG_HEAP_DELTA_ENCODED;

I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and
conditional on !need_tuple_data.



>  /*
> + * Determine whether the buffer referenced has to be backed up. Since we don't
> + * yet have the insert lock, fullPageWrites and forcePageWrites could change
> + * later, but will not cause any problem because this function is used only to
> + * identify whether EWT is required for update.
> + */
> +bool
> +XLogCheckBufferNeedsBackup(Buffer buffer)
> +{

Should note very, very boldly that this can only be used in contexts
where a race is acceptable.

> diff --git a/src/backend/utils/adt/pg_rbcompress.c b/src/backend/utils/adt/pg_rbcompress.c
> new file mode 100644
> index 0000000..877ccd7
> --- /dev/null
> +++ b/src/backend/utils/adt/pg_rbcompress.c
> @@ -0,0 +1,355 @@
> +/* ----------
> + * pg_rbcompress.c -
> + *
> + *        This is a delta encoding scheme specific to PostgreSQL and designed
> + *        to compress similar tuples. It can be used as it is or extended for
> + *        other purpose in PostgrSQL if required.
> + *
> + *        Currently, this just checks for a common prefix and/or suffix, but
> + *        the output format is similar to the LZ format used in pg_lzcompress.c.
> + *
> + * Copyright (c) 1999-2014, PostgreSQL Global Development Group
> + *
> + * src/backend/utils/adt/pg_rbcompress.c
> + * ----------
> + */

This needs significantly more explanations about the algorithm and the
reasoning behind it.


> +static const PGRB_Strategy strategy_default_data = {
> +    32,                            /* Data chunks less than 32 bytes are not
> +                                 * compressed */
> +    INT_MAX,                    /* No upper limit on what we'll try to
> +                                 * compress */
> +    35,                            /* Require 25% compression rate, or not worth
> +                                 * it */
> +};

compression rate looks like it's mismatch between comment and code.

> +/* ----------
> + * pgrb_out_ctrl -
> + *
> + *        Outputs the last and allocates a new control byte if needed.
> + * ----------
> + */
> +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \
> +do { \
> +    if ((__ctrl & 0xff) == 0)                                                \
> +    {                                                                        \
> +        *(__ctrlp) = __ctrlb;                                                \
> +        __ctrlp = (__buf)++;                                                \
> +        __ctrlb = 0;                                                        \
> +        __ctrl = 1;                                                            \
> +    }                                                                        \
> +} while (0)
> +

double underscore variables are reserved for the compiler and os.

> +/* ----------
> + * pgrb_out_literal -
> + *
> + *        Outputs a literal byte to the destination buffer including the
> + *        appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_literal(_ctrlp,_ctrlb,_ctrl,_buf,_byte) \
> +do { \
> +    pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf);                                \
> +    *(_buf)++ = (unsigned char)(_byte);                                        \
> +    _ctrl <<= 1;                                                            \
> +} while (0)
> +
> +
> +/* ----------
> + * pgrb_out_tag -
> + *
> + *        Outputs a backward reference tag of 2-4 bytes (depending on
> + *        offset and length) to the destination buffer including the
> + *        appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_tag(_ctrlp,_ctrlb,_ctrl,_buf,_len,_off) \
> +do { \
> +    pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf);                                \
> +    _ctrlb |= _ctrl;                                                        \
> +    _ctrl <<= 1;                                                            \
> +    if (_len > 17)                                                            \
> +    {                                                                        \
> +        (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | 0x0f);        \
> +        (_buf)[1] = (unsigned char)(((_off) & 0xff));                        \
> +        (_buf)[2] = (unsigned char)((_len) - 18);                            \
> +        (_buf) += 3;                                                        \
> +    } else {                                                                \
> +        (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | ((_len) - 3)); \
> +        (_buf)[1] = (unsigned char)((_off) & 0xff);                            \
> +        (_buf) += 2;                                                        \
> +    }                                                                        \
> +} while (0)
> +

What's the reason to use macros here? Just use inline functions when
dealing with file-local stuff.

> +/* ----------
> + * pgrb_delta_encode - find common prefix/suffix between inputs and encode.
> + *
> + *    source is the input data to be compressed
> + *    slen is the length of source data
> + *  history is the data which is used as reference for compression
> + *    hlen is the length of history data
> + *    The encoded result is written to dest, and its length is returned in
> + *    finallen.
> + *    The return value is TRUE if compression succeeded,
> + *    FALSE if not; in the latter case the contents of dest
> + *    are undefined.
> + *    ----------
> + */
> +bool
> +pgrb_delta_encode(const char *source, int32 slen,
> +                  const char *history, int32 hlen,
> +                  char *dest, uint32 *finallen,
> +                  const PGRB_Strategy *strategy)
> +{
> +    unsigned char *bp = ((unsigned char *) dest);
> +    unsigned char *bstart = bp;
> +    const char *dp = source;
> +    const char *dend = source + slen;
> +    const char *hp = history;
> +    unsigned char ctrl_dummy = 0;
> +    unsigned char *ctrlp = &ctrl_dummy;
> +    unsigned char ctrlb = 0;
> +    unsigned char ctrl = 0;
> +    int32        result_size;
> +    int32        result_max;
> +    int32        need_rate;
> +    int            prefixlen;
> +    int            suffixlen;
> +
> +    /*
> +     * Tuples of length greater than PGRB_HISTORY_SIZE are not allowed for
> +     * delta encode as this is the maximum size of history offset.
> +     * XXX: still true?
> +     */

Why didn't you define a maximum tuple size in the strategy definition
above then?

> +    if (hlen >= PGRB_HISTORY_SIZE || hlen < PGRB_MIN_MATCH)
> +        return false;
> +
> +    /*
> +     * Our fallback strategy is the default.
> +     */
> +    if (strategy == NULL)
> +        strategy = PGRB_strategy_default;
>
> +    /*
> +     * If the strategy forbids compression (at all or if source chunk size out
> +     * of range), fail.
> +     */
> +    if (slen < strategy->min_input_size ||
> +        slen > strategy->max_input_size)
> +        return false;
> +
> +    need_rate = strategy->min_comp_rate;
> +    if (need_rate < 0)
> +        need_rate = 0;
> +    else if (need_rate > 99)
> +        need_rate = 99;

Is there really need for all this stuff here? This is so specific to the
usecase that I have significant doubts that all the pglz boiler plate
makes much sense.

> +    /*
> +     * Compute the maximum result size allowed by the strategy, namely the
> +     * input size minus the minimum wanted compression rate.  This had better
> +     * be <= slen, else we might overrun the provided output buffer.
> +     */
> +    /*if (slen > (INT_MAX / 100))
> +    {
> +        /* Approximate to avoid overflow */
> +        /*result_max = (slen / 100) * (100 - need_rate);
> +    }
> +    else
> +    {
> +        result_max = (slen * (100 - need_rate)) / 100;
> +    }*/

err?

> +--
> +-- Test to update continuos and non continuos columns
> +--

*continuous

I have to admit, I have serious doubts about this approach. I have a
very hard time believing this won't cause performance regression in many
common cases... More importantly I don't think doing the compression on
this level is that interesting. I know Heikki argued for it, but I think
extending the bitmap that's computed for HOT to cover all columns and
doing this on a column level sounds much more sensible to me.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: commit fest 2014-01 week 4 report
Next
From: Peter Eisentraut
Date:
Subject: Re: Create function prototype as part of PG_FUNCTION_INFO_V1