Thread: Improvement of search for a binary operator

Improvement of search for a binary operator

From
Atsushi Ogawa
Date:
Here is a gprof result of the pgbench.

  %   cumulative   self             self    total
 time   seconds   seconds    calls  s/call  s/call  name
  4.77      4.04     4.04    14000    0.00    0.00  OpernameGetCandidates
  4.70      8.02     3.98  2007008    0.00    0.00
HeapTupleSatisfiesSnapshot
  4.50     11.83     3.81    14001    0.00    0.00  base_yyparse
  3.78     15.03     3.20  2838312    0.00    0.00  AllocSetAlloc
  3.73     18.18     3.15    40001    0.00    0.00  heapgetpage

The OpernameGetCandidates called from oper. The function of oper is
search for a binary operator. It does the following processing:

(1)Create candidates of operator that matches operator name and
operator kind by OpernameGetCandidates.
(2)Find an operator that matches exactly ltypeId and rtypeId from
the candidates of operator by binary_oper_exact.
(3)If not found, find an operator from the candidates of operator by
oper_select_candidate. The oper_select_candidate accepts coerce type
and resolve the conflict.

I think that we can search the system catalog cache instead of
retrieval from the candidates of operator in the binary_oper_exact,
and reverse the order of executing (1) and (2) for performance
improvement. Because for a general operator such as '=', the number
of the candidates of operator exceeds 50. And in common cases the
typeIds matches exactly.

I tested following commands:

pgbench -i
pgbench -t 4000 (5 times)

results (tps):
             1       2       3       4       5     average
-----------------------------------------------------------
 original: 214.33  184.60  192.52  158.62  170.04  184.02
 patched : 223.86  198.72  207.48  165.70  179.67  195.09

regards,

---
Atsushi Ogawa

*** ./src/backend/catalog/namespace.c.orig    Tue Apr 25 23:16:57 2006
--- ./src/backend/catalog/namespace.c    Wed Apr 26 00:14:33 2006
***************
*** 681,686 ****
--- 681,744 ----
      return visible;
  }

+ /*
+  * OpernameGet
+  *        Given a possibly-qualified operator name, left and right type IDs,
+  *        find an operator in the current search path.
+  */
+ Oid
+ OpernameGet(List *names, Oid ltypeId, Oid rtypeId)
+ {
+     char       *schemaname;
+     char       *opername;
+     Oid            namespaceId;
+     HeapTuple    tup = NULL;
+
+     /* deconstruct the name list */
+     DeconstructQualifiedName(names, &schemaname, &opername);
+
+     if (schemaname)
+     {
+         /* use exact schema given */
+         namespaceId = LookupExplicitNamespace(schemaname);
+
+         tup = SearchSysCache(OPERNAMENSP,
+                              CStringGetDatum(opername),
+                              ObjectIdGetDatum(ltypeId),
+                              ObjectIdGetDatum(rtypeId),
+                              ObjectIdGetDatum(namespaceId));
+     }
+     else
+     {
+         /* we need namespace search */
+         ListCell   *nsp;
+
+         recomputeNamespacePath();
+
+         foreach(nsp, namespaceSearchPath)
+         {
+             namespaceId = lfirst_oid(nsp);
+
+             tup = SearchSysCache(OPERNAMENSP,
+                                  CStringGetDatum(opername),
+                                  ObjectIdGetDatum(ltypeId),
+                                  ObjectIdGetDatum(rtypeId),
+                                  ObjectIdGetDatum(namespaceId));
+             if (HeapTupleIsValid(tup))
+                 break;
+         }
+     }
+
+     if (HeapTupleIsValid(tup))
+     {
+         Oid operOid = HeapTupleGetOid(tup);
+         ReleaseSysCache(tup);
+         return operOid;
+     }
+
+     return InvalidOid;
+ }
+

  /*
   * OpernameGetCandidates
*** ./src/backend/parser/parse_oper.c.orig    Tue Apr 25 22:12:52 2006
--- ./src/backend/parser/parse_oper.c    Wed Apr 26 00:19:04 2006
***************
*** 29,36 ****
  #include "utils/typcache.h"


! static Oid binary_oper_exact(Oid arg1, Oid arg2,
!                   FuncCandidateList candidates);
  static FuncDetailCode oper_select_candidate(int nargs,
                        Oid *input_typeids,
                        FuncCandidateList candidates,
--- 29,35 ----
  #include "utils/typcache.h"


! static Oid binary_oper_exact(List *opname, Oid arg1, Oid arg2);
  static FuncDetailCode oper_select_candidate(int nargs,
                        Oid *input_typeids,
                        FuncCandidateList candidates,
***************
*** 387,397 ****
   * be reduced to its base type to find an "exact" match.
   */
  static Oid
! binary_oper_exact(Oid arg1, Oid arg2,
!                   FuncCandidateList candidates)
  {
      FuncCandidateList cand;
      bool        was_unknown = false;

      /* Unspecified type for one of the arguments? then use the other */
      if ((arg1 == UNKNOWNOID) && (arg2 != InvalidOid))
--- 386,396 ----
   * be reduced to its base type to find an "exact" match.
   */
  static Oid
! binary_oper_exact(List *opname, Oid arg1, Oid arg2)
  {
      FuncCandidateList cand;
      bool        was_unknown = false;
+     Oid            operOid;

      /* Unspecified type for one of the arguments? then use the other */
      if ((arg1 == UNKNOWNOID) && (arg2 != InvalidOid))
***************
*** 405,415 ****
          was_unknown = true;
      }

!     for (cand = candidates; cand != NULL; cand = cand->next)
!     {
!         if (arg1 == cand->args[0] && arg2 == cand->args[1])
!             return cand->oid;
!     }

      if (was_unknown)
      {
--- 404,412 ----
          was_unknown = true;
      }

!     operOid = OpernameGet(opname, arg1, arg2);
!     if (OidIsValid(operOid))
!         return operOid;

      if (was_unknown)
      {
***************
*** 418,428 ****

          if (basetype != arg1)
          {
!             for (cand = candidates; cand != NULL; cand = cand->next)
!             {
!                 if (basetype == cand->args[0] && basetype == cand->args[1])
!                     return cand->oid;
!             }
          }
      }

--- 415,423 ----

          if (basetype != arg1)
          {
!             operOid = OpernameGet(opname, basetype, basetype);
!             if (OidIsValid(operOid))
!                 return operOid;
          }
      }

***************
*** 509,530 ****
      FuncDetailCode fdresult = FUNCDETAIL_NOTFOUND;
      HeapTuple    tup = NULL;

!     /* Get binary operators of given name */
!     clist = OpernameGetCandidates(opname, 'b');

!     /* No operators found? Then fail... */
!     if (clist != NULL)
      {
          /*
!          * Check for an "exact" match.
           */
-         operOid = binary_oper_exact(ltypeId, rtypeId, clist);
-         if (!OidIsValid(operOid))
-         {
-             /*
-              * Otherwise, search for the most suitable candidate.
-              */

              /*
               * Unspecified type for one of the arguments? then use the other
               * (XXX this is probably dead code?)
--- 504,526 ----
      FuncDetailCode fdresult = FUNCDETAIL_NOTFOUND;
      HeapTuple    tup = NULL;

!     /*
!      * Check for an "exact" match.
!      */
!     operOid = binary_oper_exact(opname, ltypeId, rtypeId);

!     if (!OidIsValid(operOid))
      {
          /*
!          * Otherwise, search for the most suitable candidate.
           */

+         /* Get binary operators of given name */
+         clist = OpernameGetCandidates(opname, 'b');
+
+         /* No operators found? Then fail... */
+         if (clist != NULL)
+         {
              /*
               * Unspecified type for one of the arguments? then use the other
               * (XXX this is probably dead code?)
***************
*** 537,548 ****
              inputOids[1] = rtypeId;
              fdresult = oper_select_candidate(2, inputOids, clist, &operOid);
          }
-         if (OidIsValid(operOid))
-             tup = SearchSysCache(OPEROID,
-                                  ObjectIdGetDatum(operOid),
-                                  0, 0, 0);
      }

      if (!HeapTupleIsValid(tup) && !noError)
          op_error(pstate, opname, 'b', ltypeId, rtypeId, fdresult, location);

--- 533,545 ----
              inputOids[1] = rtypeId;
              fdresult = oper_select_candidate(2, inputOids, clist, &operOid);
          }
      }

+     if (OidIsValid(operOid))
+         tup = SearchSysCache(OPEROID,
+                              ObjectIdGetDatum(operOid),
+                              0, 0, 0);
+
      if (!HeapTupleIsValid(tup) && !noError)
          op_error(pstate, opname, 'b', ltypeId, rtypeId, fdresult, location);


Re: Improvement of search for a binary operator

From
Tom Lane
Date:
Atsushi Ogawa <a_ogawa@hi-ho.ne.jp> writes:
> The OpernameGetCandidates called from oper. The function of oper is
> search for a binary operator. It does the following processing:

> (1)Create candidates of operator that matches operator name and
> operator kind by OpernameGetCandidates.
> (2)Find an operator that matches exactly ltypeId and rtypeId from
> the candidates of operator by binary_oper_exact.
> (3)If not found, find an operator from the candidates of operator by
> oper_select_candidate. The oper_select_candidate accepts coerce type
> and resolve the conflict.

> I think that we can search the system catalog cache instead of
> retrieval from the candidates of operator in the binary_oper_exact,
> and reverse the order of executing (1) and (2) for performance
> improvement.

AFAICT, this will make things a bit faster if there is an exact match,
and quite a bit slower if there is not (especially if the search path
is long).  I've known for awhile that OpernameGetCandidates is a
bottleneck, but I don't want a solution that optimizes some cases at the
price of making others worse.  pgbench is not a good model of the real
world for such tradeoffs.

Whatever solution we find, it should be applied to the unary operator
paths as well.  It's not appropriate to introduce gratuitous differences
between the binary and unary operator paths.  (Again, pgbench is a poor
model of the real world ... I don't think it even uses any unary
operators.)

            regards, tom lane

Re: Improvement of search for a binary operator

From
Tom Lane
Date:
Atsushi Ogawa <a_ogawa@hi-ho.ne.jp> writes:
> I think that we can search the system catalog cache instead of
> retrieval from the candidates of operator in the binary_oper_exact,
> and reverse the order of executing (1) and (2) for performance
> improvement.

I've been thinking more about how this might be improved.  In a test
involving the same trivial query repeated many times ("select * from tab
where foo = 'constant'") I see oprofile results like this:

samples  %        symbol name
95234     5.5458  AllocSetAlloc
89622     5.2190  hash_search
86038     5.0103  OpernameGetCandidates
76747     4.4692  SearchCatCache
64042     3.7294  base_yyparse
39473     2.2987  LWLockAcquire
32195     1.8748  base_yylex
25024     1.4572  nocachegetattr
21913     1.2761  MemoryContextAllocZeroAligned
21268     1.2385  _bt_compare
19665     1.1452  fmgr_info_cxt_security
18430     1.0732  LWLockRelease
17312     1.0081  expression_tree_walker
...
7787      0.4535  SearchCatCacheList
...

The problem with the patch you proposed is that when there *isn't* an
exact match, it costs N extra SearchCatCache probes, where N is the
length of the search_path.  We need a solution that limits the damage
in that case.

The solution I'm considering is to add an additional namespace.c routine
defined like
    Oid OpnameGetOprid(List *opname, Oid oprleft, Oid oprright)
that returns the first exact match in the search path, or 0 if none.
(parse_oper.c would try this before OpernameGetCandidates.)  The secret
is that it should use a SearchCatCacheList lookup instead of repeated
SearchCatCache probles.  If the list lookup is successful, it will
normally produce a very short result (typically only one member) and so
finding the first visible member if any is cheap.

With this approach the penalty for failure is just one extra
SearchCatCacheList lookup (which has cost comparable to SearchCatCache,
I believe, once the cache is set up) instead of N SearchCatCache
lookups.  In the example above this should add at most about half a
percent of total system runtime.  The potential of making a large dent
in OpernameGetCandidates' five percent makes that look like a good trade.

If this works out we might want to think about converting some of the
other namespace.c routines to use similar tactics.  I'm not sure I've
ever seen them high in a profile though.

            regards, tom lane

Re: Improvement of search for a binary operator

From
Tom Lane
Date:
I wrote:
> The solution I'm considering is to add an additional namespace.c routine
> defined like
>     Oid OpnameGetOprid(List *opname, Oid oprleft, Oid oprright)

I coded this up, and it seems to be a win just on code cleanliness
grounds, because there are several places that want to do a search for
an operator with exact input types given; this routine fits their
needs exactly while OpernameGetCandidates was a poor fit.  It seems to
behave as advertised in terms of getting rid of OpernameGetCandidates
overhead in the exact-match case --- this only makes for a few percent
overall improvement in the test case I'm looking at, but that's about
what's expected.

I made a non-exact-match test case by changing char(N) to varchar(N)
in the table definitions, and what I see is

samples  %        symbol name
189998   11.9853  SearchCatCache
68888     4.3455  AllocSetAlloc
66793     4.2134  hash_search
49284     3.1089  OpernameGetCandidates
46472     2.9315  base_yyparse
28268     1.7832  LWLockAcquire
27487     1.7339  DirectFunctionCall1
27071     1.7077  DLMoveToFront
26375     1.6638  nocachegetattr
24956     1.5743  SearchSysCache
21461     1.3538  base_yylex
20176     1.2727  heap_getsysattr
19466     1.2279  FunctionCall2
18148     1.1448  CatalogCacheComputeHashValue
17608     1.1107  _bt_compare
16606     1.0475  MemoryContextAllocZeroAligned
...
6870      0.4334  SearchCatCacheList

This case is significantly slower than the other one anyway, and the
reason is evidently a lot more SearchCatCache calls.  AFAICT those
can all be blamed on the getBaseType() calls that are sprinkled through
parse_func.c and parse_oper.c.  I thought those would probably come back
to haunt us from a performance standpoint someday :-(

Anyway, I can't measure any performance difference in the
non-exact-match case.  Maybe if we got rid of the getBaseType calls
it'd be worth checking again, but for now this seems like a win.
I'll go ahead and commit it.

            regards, tom lane