Re: tsearch refactorings - Mailing list pgsql-patches

From Teodor Sigaev
Subject Re: tsearch refactorings
Date
Msg-id 46DED16A.9000505@sigaev.ru
Whole thread Raw
In response to tsearch refactorings  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: tsearch refactorings  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-patches
Heikki, I see some strange changes in your patch, not related to tsearch at all:
contrib/pageinspect/pageinspect.sql.in
contrib/pageinspect/rawpage.c

> The usage of the QueryItem struct was very confusing. It was used for
> both operators and operands. For operators, "val" was a single character
> casted to a int4, marking the operator type. For operands, val was the
> CRC-32 of the value. Other fields were used only either for operands or
> for operators. The biggest change in the patch is that I broke the
> QueryItem struct into QueryOperator and QueryOperand. Type was really
...
 > - Removed ParseQueryNode struct used internally by makepol and friends.
 > push*-functions now construct QueryItems directly.

It's needed to set unused bytes in QueryItem to zero, it's common requiremens
for types in pgsql. After allocating space for tsquery in parse_tsquery you copy
  just sizeof(QueryOperator) bytes and leave sizeof(QueryItem) -
sizeof(QueryOperator) bytes untouched. QueryOperand is a biggest component in
QueryItem union. I don't check other places.



> that? And parse_query always produces trees that are in prefix notation,
> so the left-field is really redundant, but using tsqueryrecv, you could
> inject queries that are not in prefix notation; is there anything in the
> code that depends on that?
It's used by TS_execute for optimization reason. With clear postfix notation you
should go through every nodes. For example:
FALSE FALSE & FALSE &
You will go to the end of query to produce correct result.
In fact, TSQuery is a prefix notation with pointer to another operand or, by
another words, just a plain view of tree where right operand of operation is
always placed after operation.
That notation allows to calculate only one of operand if it possible:
& FALSE & FALSE FALSE
1   2   3   4     5      --Nodes
After evaluating of second node you can return FALSE for whole expression and do
not evaluate nodes 3-5. For query
& TRUE & FALSE & FALSE
it's needed to evaluate 1,2,3,4 nodes. In most cases checking  QI_VAL node is
much more expensive that QI_OPR



>
> - There's many internal intermediate representations of a query:
> TSQuery, a QTNode-tree, NODE-tree (in tsquery_cleanup.c), prefix
> notation stack of QueryItems (in parser), infix-tree. Could we remove
> some of these?
I havn't strong objections, QTNode and NODE are tree-like structures, but
TSQuery is a postfix notation for storage in plain memory. NODE is used only
cleanup stop-word placeholders, so it's a binary tree while  QTNode represents
t-ary tree (with any number of children).

Thank you for your interesting in tsearch - after recheck of problem pointed
above I'll commit your patch.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: GSS warnings
Next
From: Teodor Sigaev
Date:
Subject: Re: tsearch refactorings