Thread: postgresql-7.2.3 contrib/tsearch

postgresql-7.2.3 contrib/tsearch

From
Colin M Strickland
Date:
Hi,

Recently,I've been playing around with contrib/tsearch for the first
time as I have some columns with large text fields that need fast
substring searches. The tsearch functions seem to provide the sort of
functionality I'm looking for and work really well for my (moderately
large) dataset . However I noticed a peculiarity when parsing for
stopwords with operators.

Certain combinations of stopwords and operators cause the rewriter to
incorrectly deduce that all of the query terms are stopwords and bail
out. English examples of this would be

 txtid ## 'monkey|the|breakfast'

or

 txtid ## 'monkey&breakfast&!the'

which both fail with the claim
ERROR:  Your query contained only stopword(s), ignored

Attached is my patch that attempts to fix this situation. It is
written against contrib/tsearch/rewrite.c in the postgresql-7.2.3
stable release, only touches that file. The tsearch code doesnt look
to have changed that much in the later CVS so it should apply cleanly
there.

I don't know my way around the postgresql internals too well and only
really focused on this operator logic, so apologies in advance for
wasting anyone's time if I've misunderstood this problem.

*** ./contrib/tsearch/rewrite.c.orig    Tue Nov 12 15:50:16 2002
--- ./contrib/tsearch/rewrite.c    Tue Nov 12 15:57:51 2002
***************
*** 210,258 ****
              return NULL;
          }
      }
!     else if (node->valnode->val == (int4) '|')
      {
          NODE       *res = node;

          node->left = clean_fakeval_intree(node->left, &lresult);
          node->right = clean_fakeval_intree(node->right, &rresult);
!         if (lresult == V_TRUE || rresult == V_TRUE)
          {
              freetree(node);
              *result = V_TRUE;
              return NULL;
          }
!         else if (lresult == V_FALSE && rresult == V_FALSE)
          {
              freetree(node);
              *result = V_FALSE;
              return NULL;
          }
!         else if (lresult == V_FALSE)
!         {
!             res = node->right;
!             pfree(node);
!         }
!         else if (rresult == V_FALSE)
!         {
!             res = node->left;
!             pfree(node);
!         }
!         return res;
!     }
!     else
!     {
!         NODE       *res = node;
!
!         node->left = clean_fakeval_intree(node->left, &lresult);
!         node->right = clean_fakeval_intree(node->right, &rresult);
!         if (lresult == V_FALSE || rresult == V_FALSE)
          {
              freetree(node);
!             *result = V_FALSE;
              return NULL;
          }
!         else if (lresult == V_TRUE && rresult == V_TRUE)
          {
              freetree(node);
              *result = V_TRUE;
--- 210,240 ----
              return NULL;
          }
      }
!     else
      {
          NODE       *res = node;

          node->left = clean_fakeval_intree(node->left, &lresult);
          node->right = clean_fakeval_intree(node->right, &rresult);
!         if (lresult == V_TRUE && rresult == V_TRUE)
          {
              freetree(node);
              *result = V_TRUE;
              return NULL;
          }
!         else  if (lresult == V_FALSE && rresult == V_FALSE)
          {
              freetree(node);
              *result = V_FALSE;
              return NULL;
          }
!         else if (lresult == V_TRUE && rresult == V_FALSE)
          {
              freetree(node);
!             *result = V_TRUE;
              return NULL;
          }
!         else  if (lresult == V_FALSE && rresult == V_TRUE)
          {
              freetree(node);
              *result = V_TRUE;
***************
*** 267,272 ****
--- 249,264 ----
          {
              res = node->left;
              pfree(node);
+         }
+         else if (lresult == V_FALSE)
+         {
+             res = node->right;
+             pfree(node);
+         }
+         else if (rresult == V_FALSE)
+         {
+             res = node->left;
+             pfree(node);
          }
          return res;
      }


--
Regards,
Colin M Strickland -- "Tape My Beatworm!"