Thread: Patch for PGunescapeBytea

Patch for PGunescapeBytea

From
Ben Lamb
Date:
Hi,

I found the libpq function PGunescapeBytea a little slow. It was taking a
minute and a half to decode a 500Kb on a fairly fast machine. I think the
culprit is sscanf.

I attach a patch that replaces the function with one used to perform the same
task in pyPgSQL (a Python interface to PostgreSQL). This code was written by
Billy Allie, author of pyPgSQL. I've changed a few variable names to match
those in the original code and removed a bit of Pythonness.

Billy has kindly looked at the code and points out that it is slightly
stricter than the original implementation and if it encounters an invalid
bytea such as '\12C' it drops the unescape '\' and outputs '12C'.

The code is licensed by the author under a BSD license.

I've performed limited testing of the function by putting JPEGs into
PostgreSQL, extracting them using them using the new function and diffing
against the original files.

The new function is significantly faster on my machine with the JPEGs being
decoded in less than a second. I attach a modified libpq example program that
I used for my testing.

Regards,

Ben Lamb.

The patch modifies fe-exec.c in /src/interfaces/libpq from PostgreSQL 7.3.2.

Attachment

Re: Patch for PGunescapeBytea

From
"Magnus Naeslund(f)"
Date:
Ben Lamb <pgsql-patches@zurgy.org> wrote:
>> Hi,
>>
>> I found the libpq function PGunescapeBytea a little slow. It was
>> taking a minute and a half to decode a 500Kb on a fairly fast
>> machine. I think the culprit is sscanf.
[snip]

I think these lines:

buffer = realloc(buffer, buflen);
---
if (buffer == NULL)
    return NULL;
---

are wrong. Shouldn't one do:
---
tmpbuf=realloc(buf,...);
if (!tmpbuf)
    free(buf), return 0;
---

to avoid a memory leak?

... just checking ;)

Magnus


Re: Patch for PGunescapeBytea

From
Ben Lamb
Date:
On Monday 05 May 2003 5:45 pm, Magnus Naeslund(f) wrote:
> I think these lines:
>
> buffer = realloc(buffer, buflen);
> ---
> if (buffer == NULL)
>     return NULL;
> ---
>
> are wrong. Shouldn't one do:
> ---
> tmpbuf=realloc(buf,...);
> if (!tmpbuf)
>     free(buf), return 0;
> ---

Thanks for pointing this out, do I need to send an updated patch to the list?

Ben.


Re: Patch for PGunescapeBytea

From
Ben Lamb
Date:
Here is a new patch for src/interfaces/libpq/fe-exec.c that incorporates the
change suggested by Magnus. The patch significantly improves the speed of
PGunescapeBytea().

Ben.


> [snip]
>
> I think these lines:
>
> buffer = realloc(buffer, buflen);
> ---
> if (buffer == NULL)
>     return NULL;
> ---
>
> are wrong. Shouldn't one do:
> ---
> tmpbuf=realloc(buf,...);
> if (!tmpbuf)
>     free(buf), return 0;
> ---
>
> to avoid a memory leak?
>
> ... just checking ;)
>
> Magnus


Attachment

Re: Patch for PGunescapeBytea

From
Bruce Momjian
Date:
(I will apply the newer version.)

Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Ben Lamb wrote:
> Hi,
>
> I found the libpq function PGunescapeBytea a little slow. It was taking a
> minute and a half to decode a 500Kb on a fairly fast machine. I think the
> culprit is sscanf.
>
> I attach a patch that replaces the function with one used to perform the same
> task in pyPgSQL (a Python interface to PostgreSQL). This code was written by
> Billy Allie, author of pyPgSQL. I've changed a few variable names to match
> those in the original code and removed a bit of Pythonness.
>
> Billy has kindly looked at the code and points out that it is slightly
> stricter than the original implementation and if it encounters an invalid
> bytea such as '\12C' it drops the unescape '\' and outputs '12C'.
>
> The code is licensed by the author under a BSD license.
>
> I've performed limited testing of the function by putting JPEGs into
> PostgreSQL, extracting them using them using the new function and diffing
> against the original files.
>
> The new function is significantly faster on my machine with the JPEGs being
> decoded in less than a second. I attach a modified libpq example program that
> I used for my testing.
>
> Regards,
>
> Ben Lamb.
>
> The patch modifies fe-exec.c in /src/interfaces/libpq from PostgreSQL 7.3.2.

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Patch for PGunescapeBytea

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


Ben Lamb wrote:
> Here is a new patch for src/interfaces/libpq/fe-exec.c that incorporates the
> change suggested by Magnus. The patch significantly improves the speed of
> PGunescapeBytea().
>
> Ben.
>
>
> > [snip]
> >
> > I think these lines:
> >
> > buffer = realloc(buffer, buflen);
> > ---
> > if (buffer == NULL)
> >     return NULL;
> > ---
> >
> > are wrong. Shouldn't one do:
> > ---
> > tmpbuf=realloc(buf,...);
> > if (!tmpbuf)
> >     free(buf), return 0;
> > ---
> >
> > to avoid a memory leak?
> >
> > ... just checking ;)
> >
> > Magnus
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/interfaces/libpq/fe-exec.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-exec.c,v
retrieving revision 1.137
diff -c -c -r1.137 fe-exec.c
*** src/interfaces/libpq/fe-exec.c    8 Jun 2003 17:43:00 -0000    1.137
--- src/interfaces/libpq/fe-exec.c    12 Jun 2003 01:16:16 -0000
***************
*** 1690,1695 ****
--- 1690,1697 ----
      return result;
  }

+ #define VAL(CH) ((CH) - '0')
+
  /*
   *        PQunescapeBytea - converts the null terminated string representation
   *        of a bytea, strtext, into binary, filling a buffer. It returns a
***************
*** 1697,1795 ****
   *        buffer in retbuflen. The pointer may subsequently be used as an
   *        argument to the function free(3). It is the reverse of PQescapeBytea.
   *
!  *        The following transformations are reversed:
!  *        '\0' == ASCII  0 == \000
!  *        '\'' == ASCII 39 == \'
!  *        '\\' == ASCII 92 == \\
   *
-  *        States:
-  *        0    normal        0->1->2->3->4
-  *        1    \               1->5
-  *        2    \0               1->6
-  *        3    \00
-  *        4    \000
-  *        5    \'
-  *        6    \\
   */
  unsigned char *
  PQunescapeBytea(const unsigned char *strtext, size_t *retbuflen)
  {
!     size_t        buflen;
!     unsigned char *buffer,
!                *bp;
!     const unsigned char *sp;
!     unsigned int state = 0;

!     if (strtext == NULL)
          return NULL;
!     buflen = strlen(strtext);    /* will shrink, also we discover if
!                                  * strtext */
!     buffer = (unsigned char *) malloc(buflen);    /* isn't NULL terminated */
      if (buffer == NULL)
          return NULL;
!     for (bp = buffer, sp = strtext; *sp != '\0'; bp++, sp++)
      {
!         switch (state)
          {
!             case 0:
!                 if (*sp == '\\')
!                     state = 1;
!                 *bp = *sp;
!                 break;
!             case 1:
!                 if (*sp == '\'')    /* state=5 */
!                 {                /* replace \' with 39 */
!                     bp--;
!                     *bp = '\'';
!                     buflen--;
!                     state = 0;
!                 }
!                 else if (*sp == '\\')    /* state=6 */
!                 {                /* replace \\ with 92 */
!                     bp--;
!                     *bp = '\\';
!                     buflen--;
!                     state = 0;
!                 }
                  else
                  {
!                     if (isdigit(*sp))
!                         state = 2;
!                     else
!                         state = 0;
!                     *bp = *sp;
                  }
                  break;
-             case 2:
-                 if (isdigit(*sp))
-                     state = 3;
-                 else
-                     state = 0;
-                 *bp = *sp;
-                 break;
-             case 3:
-                 if (isdigit(*sp))        /* state=4 */
-                 {
-                     int            v;

!                     bp -= 3;
!                     sscanf(sp - 2, "%03o", &v);
!                     *bp = v;
!                     buflen -= 3;
!                     state = 0;
!                 }
!                 else
!                 {
!                     *bp = *sp;
!                     state = 0;
!                 }
!                 break;
          }
      }
!     buffer = realloc(buffer, buflen);
!     if (buffer == NULL)
!         return NULL;

      *retbuflen = buflen;
!     return buffer;
  }
--- 1699,1762 ----
   *        buffer in retbuflen. The pointer may subsequently be used as an
   *        argument to the function free(3). It is the reverse of PQescapeBytea.
   *
!  *        The following transformations are made:
!  *        \'   == ASCII 39 == '
!  *        \\   == ASCII 92 == \
!  *        \ooo == a byte whose value = ooo (ooo is an octal number)
!  *        \x   == x (x is any character not matched by the above transformations)
   *
   */
  unsigned char *
  PQunescapeBytea(const unsigned char *strtext, size_t *retbuflen)
  {
!     size_t strtextlen, buflen;
!     unsigned char *buffer, *tmpbuf;
!     int i, j, byte;

!     if (strtext == NULL) {
          return NULL;
!     }
!
!     strtextlen = strlen(strtext);    /* will shrink, also we discover if
!                                      * strtext isn't NULL terminated */
!     buffer = (unsigned char *)malloc(strtextlen);
      if (buffer == NULL)
          return NULL;
!
!     for (i = j = buflen = 0; i < strtextlen;)
      {
!         switch (strtext[i])
          {
!             case '\\':
!                 i++;
!                 if (strtext[i] == '\\')
!                     buffer[j++] = strtext[i++];
                  else
                  {
!                     if ((isdigit(strtext[i]))   &&
!                         (isdigit(strtext[i+1])) &&
!                         (isdigit(strtext[i+2])))
!                     {
!                         byte = VAL(strtext[i++]);
!                         byte = (byte << 3) + VAL(strtext[i++]);
!                         buffer[j++] = (byte << 3) + VAL(strtext[i++]);
!                     }
                  }
                  break;

!             default:
!                 buffer[j++] = strtext[i++];
          }
      }
!     buflen = j; /* buflen is the length of the unquoted data */
!     tmpbuf = realloc(buffer, buflen);
!
!     if (!tmpbuf)
!     {
!         free(buffer);
!         return 0;
!     }

      *retbuflen = buflen;
!     return tmpbuf;
  }