Re: wchareq improvement - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: wchareq improvement
Date
Msg-id 200505252258.j4PMwlk29971@candle.pha.pa.us
Whole thread Raw
In response to wchareq improvement  (a_ogawa <a_ogawa@hi-ho.ne.jp>)
Responses Re: wchareq improvement
List pgsql-patches
Patch applied with adjustment --- the second part of your patch that
skips comparing the first byte seemed unnecessary.  It seemed likely
to cause a cpu stall, so just doing the loop seemed faster.

Did you test if the second part of your patch actually caused a speedup?

---------------------------------------------------------------------------

a_ogawa wrote:
>
> I forgot to attach a patch. I do post once again.
> In SQL that uses 'like' operator, wchareq is used to compare characters.
>
> At the head of wchareq, length of (multibyte) character is compared by
> using pg_mblen. Therefore, pg_mblen is executed many times, and it
> becomes a bottleneck.
>
> This patch makes a short cut, and reduces execution frequency of pg_mblen.
>
> test.sql:
> select count(*) from accounts
> where aid like '%1';
> ... (repeated 10 times)
>
> test command:
> $ psql -f test.sql
>
> result of original code(compile option "-O2 -pg"):
> -----------------------------------------------------------------------
> Each sample counts as 0.01 seconds.
>  %  cumulative   self            self   total
> time  seconds   seconds    calls s/call s/call name
>  7.82     0.32     0.32 17566930   0.00   0.00 pg_euc_mblen
>  7.09     0.61     0.29 17566930   0.00   0.00 pg_mblen
>  6.60     0.88     0.27  1000000   0.00   0.00 MBMatchText
>  5.38     1.10     0.22  1000000   0.00   0.00 HeapTupleSatisfiesSnapshot
>  5.13     1.31     0.21   999990   0.00   0.00 ExecMakeFunctionResultNoSets
>  4.89     1.51     0.20 17566930   0.00   0.00 pg_eucjp_mblen
>
> result of patched code(compile option "-O2 -pg"):
> ------------------------------------------------------------
> Each sample counts as 0.01 seconds.
>  %  cumulative  self             self   total
> time  seconds  seconds     calls s/call s/call name
>  8.56     0.32    0.32   1000000   0.00   0.00 MBMatchText
>  7.75     0.61    0.29   1000000   0.00   0.00 HeapTupleSatisfiesSnapshot
>  6.42     0.85    0.24   1000000   0.00   0.00 slot_deform_tuple
>  5.88     1.07    0.22   8789050   0.00   0.00 pg_euc_mblen
>  5.88     1.29    0.22   1000012   0.00   0.00 heapgettup
>  5.61     1.50    0.21    999990   0.00   0.00 ExecMakeFunctionResultNoSets
>
> execution time(compile option "-O2"):
>  original code: 4.795sec
>  patched code:  4.496sec
>
> regards,
>
> --- Atsushi Ogawa

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/adt/like.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/like.c,v
retrieving revision 1.59
diff -c -c -r1.59 like.c
*** src/backend/utils/adt/like.c    31 Dec 2004 22:01:22 -0000    1.59
--- src/backend/utils/adt/like.c    25 May 2005 22:24:46 -0000
***************
*** 50,61 ****
  static int
  wchareq(unsigned char *p1, unsigned char *p2)
  {
!     int            l;

!     l = pg_mblen(p1);
!     if (pg_mblen(p2) != l)
          return (0);
!     while (l--)
      {
          if (*p1++ != *p2++)
              return (0);
--- 50,67 ----
  static int
  wchareq(unsigned char *p1, unsigned char *p2)
  {
!     int            p1_len;

!     /* Optimization:  quickly compare the first byte. */
!     if(*p1 != *p2)
          return (0);
!
!     p1_len = pg_mblen(p1);
!     if (pg_mblen(p2) != p1_len)
!         return (0);
!
!     /* They are the same length */
!     while (p1_len--)
      {
          if (*p1++ != *p2++)
              return (0);
*** ./src/backend/utils/adt/like.c.orig    Tue Apr 12 17:58:22 2005
--- ./src/backend/utils/adt/like.c    Tue Apr 12 18:45:07 2005
***************
*** 52,60 ****
--- 52,75 ----
  {
      int            l;

+     /*
+      * short cut. When first byte of p1 and p2 is different, these
+      * characters will not match.
+      */
+     if(*p1 != *p2)
+         return (0);
+
      l = pg_mblen(p1);
      if (pg_mblen(p2) != l)
          return (0);
+
+     /*
+      * Skip first byte of p1 and p2. These are already checked at
+      * top of this function.
+      */
+     l--;
+     p1++;
+     p2++;

      while (l--)
      {

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Not to to confusing
Next
From: Bruce Momjian
Date:
Subject: Re: Implementation of SQLSTATE and SQLERRM variables