Thread: building tsquery directly in memory (avoid makepol)

building tsquery directly in memory (avoid makepol)

From
Ivan Sergio Borgonovo
Date:
I know in advance the structure of a whole tsquery, it has already
been reduced and lexemes have been already computed.
I'd like to directly write it in memory without having to pass
through pushValue/makepol.

Anyway I'm not pretty sure about what is the layout of a tsquery in
memory and I still haven't been able to find the MACRO that could
help me [1].

Before doing it the trial and error way can somebody just make me an
example?
I'm not pretty sure about my interpretation of the comments of the
documentation.

This is how I'd write
X:AB | YY:C | ZZZ:D

TSQuery vl_len_ (total # of bytes of the whole following structure QueryItems*size + total lexeme length) size (# of
QueryItemsin the query) QueryItem   type QI_OPR   oper OP_OR   left -> distance from QueryItem X:AB QueryItem   type
QI_OPR  oper OP_OR   left -> distance from QueryItem ZZZ:D QueryItem (X)    type QI_VAL   weight 1100   valcrc ???
lenght1   distance QueryItem (YY)   type QI_VAL   weight 0010   valcrc ???   lenght 2   distance QueryItem (ZZZ)   type
QI_VAL  weight 0001   valcrc ???   lenght 3   distance     X     YY     ZZZ
 

[1] the equivalent of POSTDATALEN, WEP_GETWEIGHT, macro to compute
the size of various parts of TSQuery etc...

I couldn't see any place in the code where TSQuery is built in "one
shot" in spite of using pushValue.

Another thing I'd like to know is: what is going to be preferred
during a scan between
'java:1A,2B '::tsvector @@ to_tsquery('java:A | java:B');
vs.
'java:1A,2B '::tsvector @@ to_tsquery('java:AB')
?
they look equivalent. Are they?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it



Re: building tsquery directly in memory (avoid makepol)

From
Teodor Sigaev
Date:
> Before doing it the trial and error way can somebody just make me an
> example?
> I'm not pretty sure about my interpretation of the comments of the
> documentation.
> TSQuery
[skipped]
Right, valcrc is computed in pushValue

> I couldn't see any place in the code where TSQuery is built in "one
> shot" in spite of using pushValue.
That because in all places we could parse rather complex structure. Simple OR-ed
query could be hardcoded as
pushValue('X')
pushValue('YY')
pushOperator(OP_OR);
pushValue('ZZZ')
pushOperator(OP_OR);

You need to call pushValue/pushOperator imagery order of polish notation.
Note, you can do another order:
pushValue('X')
pushValue('YY')
pushValue('ZZZ')
pushOperator(OP_OR);
pushOperator(OP_OR);

So, first example will produce ( X | YY ) | ZZZ, second one  X | ( YY | XXX )




>
> Another thing I'd like to know is: what is going to be preferred
> during a scan between
> 'java:1A,2B '::tsvector @@ to_tsquery('java:A | java:B');
> vs.
> 'java:1A,2B '::tsvector @@ to_tsquery('java:AB')
> ?
> they look equivalent. Are they?

Yes, but second one should be more efficient.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: building tsquery directly in memory (avoid makepol)

From
Ivan Sergio Borgonovo
Date:
On Thu, 04 Feb 2010 22:13:02 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> > Before doing it the trial and error way can somebody just make
> > me an example?
> > I'm not pretty sure about my interpretation of the comments of
> > the documentation.
> > TSQuery
> [skipped]
> Right, valcrc is computed in pushValue

Anyway the structure I posted is correct, isn't it?
Is there any equivalent MACRO to POSTDATALEN, WEP_GETWEIGHT and
macro to know the memory size of a TSQuery?
I think I've seen MACRO that could help me to determine the size of
a TSQuery... but I haven't noticed anything like POSTDATALEN that
could come very handy to traverse a TSQuery.

I was thinking to skip pushValue and directly build the TSQuery in
memory since my queries have very simple structure and they are easy
to reduce...
Still it is not immediate to know the memory size in advance.
For OR queries it is easy but for AND queries I'll have to loop over
a tsvector, filter the weight according to a passed parameter and
see how many time I've to duplicate a lexeme for each weight.

eg.

tsvector_to_tsquery( 'pizza:1A,2B risotto:2C,4D barolo:5A,6C', '&', 'ACD'
);

should be turned into

pizza:A & risotto:C & risotto:D & barolo:A & barolo:C

I noticed you actually loop over the tsvector in tsvectorout to
allocate the memory for the string buffer and I was wondering if it
is really worth for my case as well.

Any good receipt in Moscow? ;)

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it