Thread: building tsquery directly in memory (avoid makepol)

building tsquery directly in memory (avoid makepol)

From

Ivan Sergio Borgonovo

Date:

04 February 2010, 14:24:48

I know in advance the structure of a whole tsquery, it has already
been reduced and lexemes have been already computed.
I'd like to directly write it in memory without having to pass
through pushValue/makepol.

Anyway I'm not pretty sure about what is the layout of a tsquery in
memory and I still haven't been able to find the MACRO that could
help me [1].

Before doing it the trial and error way can somebody just make me an
example?
I'm not pretty sure about my interpretation of the comments of the
documentation.

This is how I'd write
X:AB | YY:C | ZZZ:D

TSQuery vl_len_ (total # of bytes of the whole following structure QueryItems*size + total lexeme length) size (# of
QueryItemsin the query) QueryItem type QI_OPR oper OP_OR left -> distance from QueryItem X:AB QueryItem type
QI_OPR oper OP_OR left -> distance from QueryItem ZZZ:D QueryItem (X) type QI_VAL weight 1100 valcrc ???
lenght1 distance QueryItem (YY) type QI_VAL weight 0010 valcrc ??? lenght 2 distance QueryItem (ZZZ) type
QI_VAL weight 0001 valcrc ??? lenght 3 distance X YY ZZZ

[1] the equivalent of POSTDATALEN, WEP_GETWEIGHT, macro to compute
the size of various parts of TSQuery etc...

I couldn't see any place in the code where TSQuery is built in "one
shot" in spite of using pushValue.

Another thing I'd like to know is: what is going to be preferred
during a scan between
'java:1A,2B '::tsvector @@ to_tsquery('java:A | java:B');
vs.
'java:1A,2B '::tsvector @@ to_tsquery('java:AB')
?
they look equivalent. Are they?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: building tsquery directly in memory (avoid makepol)

From

Teodor Sigaev

Date:

04 February 2010, 15:13:37

> Before doing it the trial and error way can somebody just make me an
> example?
> I'm not pretty sure about my interpretation of the comments of the
> documentation.
> TSQuery
[skipped]
Right, valcrc is computed in pushValue

> I couldn't see any place in the code where TSQuery is built in "one
> shot" in spite of using pushValue.
That because in all places we could parse rather complex structure. Simple OR-ed
query could be hardcoded as
pushValue('X')
pushValue('YY')
pushOperator(OP_OR);
pushValue('ZZZ')
pushOperator(OP_OR);

You need to call pushValue/pushOperator imagery order of polish notation.
Note, you can do another order:
pushValue('X')
pushValue('YY')
pushValue('ZZZ')
pushOperator(OP_OR);
pushOperator(OP_OR);

So, first example will produce ( X | YY ) | ZZZ, second one  X | ( YY | XXX )




>
> Another thing I'd like to know is: what is going to be preferred
> during a scan between
> 'java:1A,2B '::tsvector @@ to_tsquery('java:A | java:B');
> vs.
> 'java:1A,2B '::tsvector @@ to_tsquery('java:AB')
> ?
> they look equivalent. Are they?

Yes, but second one should be more efficient.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: building tsquery directly in memory (avoid makepol)

From

Ivan Sergio Borgonovo

Date:

04 February 2010, 22:13:43

On Thu, 04 Feb 2010 22:13:02 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> > Before doing it the trial and error way can somebody just make
> > me an example?
> > I'm not pretty sure about my interpretation of the comments of
> > the documentation.
> > TSQuery
> [skipped]
> Right, valcrc is computed in pushValue

Anyway the structure I posted is correct, isn't it?
Is there any equivalent MACRO to POSTDATALEN, WEP_GETWEIGHT and
macro to know the memory size of a TSQuery?
I think I've seen MACRO that could help me to determine the size of
a TSQuery... but I haven't noticed anything like POSTDATALEN that
could come very handy to traverse a TSQuery.

I was thinking to skip pushValue and directly build the TSQuery in
memory since my queries have very simple structure and they are easy
to reduce...
Still it is not immediate to know the memory size in advance.
For OR queries it is easy but for AND queries I'll have to loop over
a tsvector, filter the weight according to a passed parameter and
see how many time I've to duplicate a lexeme for each weight.

eg.

tsvector_to_tsquery( 'pizza:1A,2B risotto:2C,4D barolo:5A,6C', '&', 'ACD'
);

should be turned into

pizza:A & risotto:C & risotto:D & barolo:A & barolo:C

I noticed you actually loop over the tsvector in tsvectorout to
allocate the memory for the string buffer and I was wondering if it
is really worth for my case as well.

Any good receipt in Moscow? ;)

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it