Thread: pg_lzcompress patch for 8.3, 8.2 branch

pg_lzcompress patch for 8.3, 8.2 branch

From
Zdenek Kotala
Date:
I attached backported pg_lzcompress patch which is already in head for version
8.2 and 8.3.

Version 8.1 and prior contains more changes in decompress code and they does not
contain any check. Shell I backported it as well or it will be better to keep it
untouched?


        Zdenek
*** src/backend/utils/adt/pg_lzcompress.c    2006/10/05 23:33:33    1.23
--- src/backend/utils/adt/pg_lzcompress.c    2008/03/08 01:09:36    1.31
*************** pglz_compress(const char *source, int32
*** 631,656 ****
  void
  pglz_decompress(const PGLZ_Header *source, char *dest)
  {
!     const unsigned char *dp;
!     const unsigned char *dend;
!     unsigned char *bp;
!     unsigned char ctrl;
!     int32        ctrlc;
!     int32        len;
!     int32        off;
!     int32        destsize;
!
!     dp = ((const unsigned char *) source) + sizeof(PGLZ_Header);
!     dend = ((const unsigned char *) source) + VARATT_SIZE(source);
!     bp = (unsigned char *) dest;

!     while (dp < dend)
      {
          /*
!          * Read one control byte and process the next 8 items.
           */
!         ctrl = *dp++;
!         for (ctrlc = 0; ctrlc < 8 && dp < dend; ctrlc++)
          {
              if (ctrl & 1)
              {
--- 641,666 ----
  void
  pglz_decompress(const PGLZ_Header *source, char *dest)
  {
!     const unsigned char *sp;
!     const unsigned char *srcend;
!     unsigned char *dp;
!     unsigned char *destend;
!
!     sp = ((const unsigned char *) source) + sizeof(PGLZ_Header);
!     srcend = ((const unsigned char *) source) + VARATT_SIZE(source);
!     dp = (unsigned char *) dest;
!     destend = dp + source->rawsize;

!     while (sp < srcend && dp < destend)
      {
          /*
!          * Read one control byte and process the next 8 items (or as many
!          * as remain in the compressed input).
           */
!         unsigned char ctrl = *sp++;
!         int        ctrlc;
!
!         for (ctrlc = 0; ctrlc < 8 && sp < srcend; ctrlc++)
          {
              if (ctrl & 1)
              {
*************** pglz_decompress(const PGLZ_Header *sourc
*** 661,671 ****
                   * coded as 18, another extension tag byte tells how much
                   * longer the match really was (0-255).
                   */
!                 len = (dp[0] & 0x0f) + 3;
!                 off = ((dp[0] & 0xf0) << 4) | dp[1];
!                 dp += 2;
                  if (len == 18)
!                     len += *dp++;

                  /*
                   * Now we copy the bytes specified by the tag from OUTPUT to
--- 671,697 ----
                   * coded as 18, another extension tag byte tells how much
                   * longer the match really was (0-255).
                   */
!                 int32        len;
!                 int32        off;
!
!                 len = (sp[0] & 0x0f) + 3;
!                 off = ((sp[0] & 0xf0) << 4) | sp[1];
!                 sp += 2;
                  if (len == 18)
!                     len += *sp++;
!
!                 /*
!                  * Check for output buffer overrun, to ensure we don't
!                  * clobber memory in case of corrupt input.  Note: we must
!                  * advance dp here to ensure the error is detected below
!                  * the loop.  We don't simply put the elog inside the loop
!                  * since that will probably interfere with optimization.
!                  */
!                 if (dp + len > destend)
!                 {
!                     dp += len;
!                     break;
!                 }

                  /*
                   * Now we copy the bytes specified by the tag from OUTPUT to
*************** pglz_decompress(const PGLZ_Header *sourc
*** 675,682 ****
                   */
                  while (len--)
                  {
!                     *bp = bp[-off];
!                     bp++;
                  }
              }
              else
--- 701,708 ----
                   */
                  while (len--)
                  {
!                     *dp = dp[-off];
!                     dp++;
                  }
              }
              else
*************** pglz_decompress(const PGLZ_Header *sourc
*** 685,691 ****
                   * An unset control bit means LITERAL BYTE. So we just copy
                   * one from INPUT to OUTPUT.
                   */
!                 *bp++ = *dp++;
              }

              /*
--- 711,720 ----
                   * An unset control bit means LITERAL BYTE. So we just copy
                   * one from INPUT to OUTPUT.
                   */
!                 if (dp >= destend)    /* check for buffer overrun */
!                     break;            /* do not clobber memory */
!
!                 *dp++ = *sp++;
              }

              /*
*************** pglz_decompress(const PGLZ_Header *sourc
*** 696,709 ****
      }

      /*
!      * Check we decompressed the right amount, else die.  This is a FATAL
!      * condition if we tromped on more memory than expected (we assume we
!      * have not tromped on shared memory, though, so need not PANIC).
!      */
!     destsize = (char *) bp - dest;
!     if (destsize != source->rawsize)
!         elog(destsize > source->rawsize ? FATAL : ERROR,
!              "compressed data is corrupt");

      /*
       * That's it.
--- 725,734 ----
      }

      /*
!      * Check we decompressed the right amount.
!      */
!     if (dp != destend || sp != srcend)
!         elog(ERROR, "compressed data is corrupt");

      /*
       * That's it.
*** src/backend/utils/adt/pg_lzcompress.c    2008/01/01 19:45:52    1.29
--- src/backend/utils/adt/pg_lzcompress.c    2008/03/08 01:09:36    1.31
*************** pglz_compress(const char *source, int32
*** 631,656 ****
  void
  pglz_decompress(const PGLZ_Header *source, char *dest)
  {
!     const unsigned char *dp;
!     const unsigned char *dend;
!     unsigned char *bp;
!     unsigned char ctrl;
!     int32        ctrlc;
!     int32        len;
!     int32        off;
!     int32        destsize;
!
!     dp = ((const unsigned char *) source) + sizeof(PGLZ_Header);
!     dend = ((const unsigned char *) source) + VARSIZE(source);
!     bp = (unsigned char *) dest;

!     while (dp < dend)
      {
          /*
!          * Read one control byte and process the next 8 items.
           */
!         ctrl = *dp++;
!         for (ctrlc = 0; ctrlc < 8 && dp < dend; ctrlc++)
          {
              if (ctrl & 1)
              {
--- 641,666 ----
  void
  pglz_decompress(const PGLZ_Header *source, char *dest)
  {
!     const unsigned char *sp;
!     const unsigned char *srcend;
!     unsigned char *dp;
!     unsigned char *destend;
!
!     sp = ((const unsigned char *) source) + sizeof(PGLZ_Header);
!     srcend = ((const unsigned char *) source) + VARSIZE(source);
!     dp = (unsigned char *) dest;
!     destend = dp + source->rawsize;

!     while (sp < srcend && dp < destend)
      {
          /*
!          * Read one control byte and process the next 8 items (or as many
!          * as remain in the compressed input).
           */
!         unsigned char ctrl = *sp++;
!         int        ctrlc;
!
!         for (ctrlc = 0; ctrlc < 8 && sp < srcend; ctrlc++)
          {
              if (ctrl & 1)
              {
*************** pglz_decompress(const PGLZ_Header *sourc
*** 661,671 ****
                   * coded as 18, another extension tag byte tells how much
                   * longer the match really was (0-255).
                   */
!                 len = (dp[0] & 0x0f) + 3;
!                 off = ((dp[0] & 0xf0) << 4) | dp[1];
!                 dp += 2;
                  if (len == 18)
!                     len += *dp++;

                  /*
                   * Now we copy the bytes specified by the tag from OUTPUT to
--- 671,697 ----
                   * coded as 18, another extension tag byte tells how much
                   * longer the match really was (0-255).
                   */
!                 int32        len;
!                 int32        off;
!
!                 len = (sp[0] & 0x0f) + 3;
!                 off = ((sp[0] & 0xf0) << 4) | sp[1];
!                 sp += 2;
                  if (len == 18)
!                     len += *sp++;
!
!                 /*
!                  * Check for output buffer overrun, to ensure we don't
!                  * clobber memory in case of corrupt input.  Note: we must
!                  * advance dp here to ensure the error is detected below
!                  * the loop.  We don't simply put the elog inside the loop
!                  * since that will probably interfere with optimization.
!                  */
!                 if (dp + len > destend)
!                 {
!                     dp += len;
!                     break;
!                 }

                  /*
                   * Now we copy the bytes specified by the tag from OUTPUT to
*************** pglz_decompress(const PGLZ_Header *sourc
*** 675,682 ****
                   */
                  while (len--)
                  {
!                     *bp = bp[-off];
!                     bp++;
                  }
              }
              else
--- 701,708 ----
                   */
                  while (len--)
                  {
!                     *dp = dp[-off];
!                     dp++;
                  }
              }
              else
*************** pglz_decompress(const PGLZ_Header *sourc
*** 685,691 ****
                   * An unset control bit means LITERAL BYTE. So we just copy
                   * one from INPUT to OUTPUT.
                   */
!                 *bp++ = *dp++;
              }

              /*
--- 711,720 ----
                   * An unset control bit means LITERAL BYTE. So we just copy
                   * one from INPUT to OUTPUT.
                   */
!                 if (dp >= destend)    /* check for buffer overrun */
!                     break;            /* do not clobber memory */
!
!                 *dp++ = *sp++;
              }

              /*
*************** pglz_decompress(const PGLZ_Header *sourc
*** 696,709 ****
      }

      /*
!      * Check we decompressed the right amount, else die.  This is a FATAL
!      * condition if we tromped on more memory than expected (we assume we have
!      * not tromped on shared memory, though, so need not PANIC).
!      */
!     destsize = (char *) bp - dest;
!     if (destsize != source->rawsize)
!         elog(destsize > source->rawsize ? FATAL : ERROR,
!              "compressed data is corrupt");

      /*
       * That's it.
--- 725,734 ----
      }

      /*
!      * Check we decompressed the right amount.
!      */
!     if (dp != destend || sp != srcend)
!         elog(ERROR, "compressed data is corrupt");

      /*
       * That's it.

Re: pg_lzcompress patch for 8.3, 8.2 branch

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I attached backported pg_lzcompress patch which is already in head for version
> 8.2 and 8.3.

> Version 8.1 and prior contains more changes in decompress code and they does not
> contain any check. Shell I backported it as well or it will be better to keep it
> untouched?

AFAICS the only nontrivial patch in pg_lzcompress.c between 7.4 and 8.2
is my cleanup patch here:
http://archives.postgresql.org/pgsql-committers/2006-10/msg00076.php
That's been in the tree long enough that I wouldn't have any hesitation
about back-porting it, so that all the supported versions would have
the same lzcompress code.  About the only reason I can see not to do it
is that conceivably some third-party code somewhere might be calling
pglz_compress directly; in which case an API change in a minor release
would be a problem for them.

On the other hand, I remain unconvinced that this problem is severe
enough to justify much backporting work.  AFAIK we've only seen one
occurence of a problem to date.

Thoughts?

            regards, tom lane

Re: pg_lzcompress patch for 8.3, 8.2 branch

From
Zdenek Kotala
Date:
Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> I attached backported pg_lzcompress patch which is already in head for version
>> 8.2 and 8.3.
>
>> Version 8.1 and prior contains more changes in decompress code and they does not
>> contain any check. Shell I backported it as well or it will be better to keep it
>> untouched?
>
> AFAICS the only nontrivial patch in pg_lzcompress.c between 7.4 and 8.2
> is my cleanup patch here:
> http://archives.postgresql.org/pgsql-committers/2006-10/msg00076.php
> That's been in the tree long enough that I wouldn't have any hesitation
> about back-porting it, so that all the supported versions would have
> the same lzcompress code.

OK I will backport your and my patch together back to the 7.4.

> About the only reason I can see not to do it
> is that conceivably some third-party code somewhere might be calling
> pglz_compress directly; in which case an API change in a minor release
> would be a problem for them.

Hmm, It brings me idea that we should have some stable API definition for C
function. For example API for datatype  helps to upgrade user datatypes
without any changes (no recompilation). The same model uses e.g. Solaris for
driver. Binary drivers are compatible and drivers from S8 should work on S10.

> On the other hand, I remain unconvinced that this problem is severe
> enough to justify much backporting work.  AFAIK we've only seen one
> occurence of a problem to date.

I know about two occurrence. One was reported on -bug
(http://archives.postgresql.org/pgsql-bugs/2008-04/msg00206.php)
and second was reported from our customer.

The main problem is that you are not able to fix corrupted data. When database
raise error on damaged data you are able to catch this error and remove affected
row. If database crashes than only gdb guru is able to mine ctid of affected row
or any other row specification from core dump.

        Zdenek


Re: pg_lzcompress patch for 8.3, 8.2 branch

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Tom Lane napsal(a):
>> On the other hand, I remain unconvinced that this problem is severe
>> enough to justify much backporting work.  AFAIK we've only seen one
>> occurence of a problem to date.

> I know about two occurrence. One was reported on -bug
> (http://archives.postgresql.org/pgsql-bugs/2008-04/msg00206.php)
> and second was reported from our customer.

I'm still not impressed.  Bear in mind that the patch you are so eager
to backport has received *zero* field testing, which means there's a
non-negligible risk that there's something wrong with it.  Add on the
non-negligible risk of messing up something associated with back-porting
the earlier patch, and consider that back-branch minor releases go out
with no field testing to speak of (there's the build farm but that's
about it).  You have to seriously question whether the risk is worth
what is surely an extremely marginal stability improvement.

On the whole I think we've done enough here.

            regards, tom lane

Re: pg_lzcompress patch for 8.3, 8.2 branch

From
Zdenek Kotala
Date:
Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> Tom Lane napsal(a):
>>> On the other hand, I remain unconvinced that this problem is severe
>>> enough to justify much backporting work.  AFAIK we've only seen one
>>> occurence of a problem to date.
>
>> I know about two occurrence. One was reported on -bug
>> (http://archives.postgresql.org/pgsql-bugs/2008-04/msg00206.php)
>> and second was reported from our customer.
>
> I'm still not impressed.  Bear in mind that the patch you are so eager
> to backport has received *zero* field testing, which means there's a
> non-negligible risk that there's something wrong with it.

Our customers uses the patch (version 8.2) on 2TB heavy loaded table which
contains text field with average size  ~10kB. He have used it for two months
without any problem.  I think it is good field testing. It helped him to fix
corrupted data problems without any random crash or downtime.

> Add on the
> non-negligible risk of messing up something associated with back-porting
> the earlier patch, and consider that back-branch minor releases go out
> with no field testing to speak of (there's the build farm but that's
> about it).  You have to seriously question whether the risk is worth
> what is surely an extremely marginal stability improvement.

I don't need it to backport to 8.1 and older. Yeah, It was my eager activity.
I'm happy with 8.3 and 8.2 backport.

        thanks Zdenek