Thread: pgsql: Phrase full text search.

pgsql: Phrase full text search.

From
Teodor Sigaev
Date:
Phrase full text search.

Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
  should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
  readable

Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/bb140506df605fab58f48926ee1db1f80bdafb59

Modified Files
--------------
contrib/tsearch2/expected/tsearch2.out  |  56 ++---
doc/src/sgml/datatype.sgml              |   9 +-
doc/src/sgml/func.sgml                  |  39 ++++
doc/src/sgml/textsearch.sgml            | 182 ++++++++++++++-
src/backend/tsearch/to_tsany.c          | 187 +++++++--------
src/backend/tsearch/ts_parse.c          |  15 +-
src/backend/tsearch/ts_selfuncs.c       |   3 +-
src/backend/tsearch/wparser_def.c       |  31 ++-
src/backend/utils/adt/tsginidx.c        |  57 +++--
src/backend/utils/adt/tsgistidx.c       |   4 +-
src/backend/utils/adt/tsquery.c         | 311 +++++++++++++++++++------
src/backend/utils/adt/tsquery_cleanup.c | 362 +++++++++++++++++++++++++++--
src/backend/utils/adt/tsquery_op.c      |  54 ++++-
src/backend/utils/adt/tsquery_util.c    |  11 +-
src/backend/utils/adt/tsrank.c          | 263 ++++++++++++++-------
src/backend/utils/adt/tsvector.c        |   2 +-
src/backend/utils/adt/tsvector_op.c     | 326 +++++++++++++++++++++++---
src/backend/utils/adt/tsvector_parser.c |  10 +-
src/include/catalog/catversion.h        |   2 +-
src/include/catalog/pg_operator.h       |   3 +
src/include/catalog/pg_proc.h           |   7 +
src/include/tsearch/ts_public.h         |  22 +-
src/include/tsearch/ts_type.h           |  30 ++-
src/include/tsearch/ts_utils.h          |  15 +-
src/test/regress/expected/tsdicts.out   |  36 ++-
src/test/regress/expected/tsearch.out   | 395 +++++++++++++++++++++++++++++---
src/test/regress/expected/tstypes.out   | 369 ++++++++++++++++++++++++++++-
src/test/regress/sql/tsdicts.sql        |   3 +
src/test/regress/sql/tsearch.sql        | 101 ++++++++
src/test/regress/sql/tstypes.sql        |  75 +++++-
30 files changed, 2536 insertions(+), 444 deletions(-)


Re: pgsql: Phrase full text search.

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Phrase full text search.

Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator?  I'm looking at
the patch delta in ts_type.h.

            regards, tom lane


Re: pgsql: Phrase full text search.

From
Tom Lane
Date:
I wrote:
> ... I'm looking at the patch delta in ts_type.h.

BTW, while I'm looking at that: comparePos() was a perfectly OK
name for a static function within tsvector.c, but it seems like a
pretty horrid name for a globally exposed linker symbol.  Please
rename it to something less generic.

            regards, tom lane


Re: pgsql: Phrase full text search.

From
Teodor Sigaev
Date:
>> Phrase full text search.
>
> Hasn't this patch broken on-disk compatibility of type tsquery by
> renumbering the values of QueryOperator.operator?  I'm looking at
> the patch delta in ts_type.h.

Distance field is placed exactly in hole between two uint8_t fields and uint32_t
field, as I known any known platform which we support uses 4-byte aligment for
int32 type. Am I wrong? If yes then I will move distance to the end of struct.
QueryOpertor struct isn't used directly to store to disk, it's used in union
QueryItem.
sizeof(QueryItem) = 12
sizeof(QueryOperator) = 8, so we can add distance to the end without growning
size of QueryItem.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: pgsql: Phrase full text search.

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> Hasn't this patch broken on-disk compatibility of type tsquery by
>> renumbering the values of QueryOperator.operator?  I'm looking at
>> the patch delta in ts_type.h.

> Distance field is placed exactly in hole between two uint8_t fields and uint32_t
> field, as I known any known platform which we support uses 4-byte aligment for
> int32 type. Am I wrong?

No, I'm worried about the fact that you changed the OP_xxx constants.
Won't that cause a pre-existing tsquery operator to be read incorrectly?

Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
they were before, which means you need to give up on the assumption that
the numerical values of the OP_xxx constants correspond directly to their
syntactic priority.  But that assumption was never going to survive the
next tsquery expansion anyway.  I'd suggest a static const array mapping
the OP values into their syntactic priorities.

            regards, tom lane


Re: pgsql: Phrase full text search.

From
Teodor Sigaev
Date:
> Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
> they were before, which means you need to give up on the assumption that
> the numerical values of the OP_xxx constants correspond directly to their
> syntactic priority.  But that assumption was never going to survive the
> next tsquery expansion anyway.  I'd suggest a static const array mapping
> the OP values into their syntactic priorities.

Oh, I see. Will fix.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/