Thread: pgsql: Phrase full text search.
Phrase full text search. Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery. On-disk and binary in/out format of tsquery are backward compatible. It has two side effect: - change order for tsquery, so, users, who has a btree index over tsquery, should reindex it - less number of parenthesis in tsquery output, and tsquery becomes more readable Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov Reviewers: Alexander Korotkov, Artur Zakirov Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/bb140506df605fab58f48926ee1db1f80bdafb59 Modified Files -------------- contrib/tsearch2/expected/tsearch2.out | 56 ++--- doc/src/sgml/datatype.sgml | 9 +- doc/src/sgml/func.sgml | 39 ++++ doc/src/sgml/textsearch.sgml | 182 ++++++++++++++- src/backend/tsearch/to_tsany.c | 187 +++++++-------- src/backend/tsearch/ts_parse.c | 15 +- src/backend/tsearch/ts_selfuncs.c | 3 +- src/backend/tsearch/wparser_def.c | 31 ++- src/backend/utils/adt/tsginidx.c | 57 +++-- src/backend/utils/adt/tsgistidx.c | 4 +- src/backend/utils/adt/tsquery.c | 311 +++++++++++++++++++------ src/backend/utils/adt/tsquery_cleanup.c | 362 +++++++++++++++++++++++++++-- src/backend/utils/adt/tsquery_op.c | 54 ++++- src/backend/utils/adt/tsquery_util.c | 11 +- src/backend/utils/adt/tsrank.c | 263 ++++++++++++++------- src/backend/utils/adt/tsvector.c | 2 +- src/backend/utils/adt/tsvector_op.c | 326 +++++++++++++++++++++++--- src/backend/utils/adt/tsvector_parser.c | 10 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_operator.h | 3 + src/include/catalog/pg_proc.h | 7 + src/include/tsearch/ts_public.h | 22 +- src/include/tsearch/ts_type.h | 30 ++- src/include/tsearch/ts_utils.h | 15 +- src/test/regress/expected/tsdicts.out | 36 ++- src/test/regress/expected/tsearch.out | 395 +++++++++++++++++++++++++++++--- src/test/regress/expected/tstypes.out | 369 ++++++++++++++++++++++++++++- src/test/regress/sql/tsdicts.sql | 3 + src/test/regress/sql/tsearch.sql | 101 ++++++++ src/test/regress/sql/tstypes.sql | 75 +++++- 30 files changed, 2536 insertions(+), 444 deletions(-)
Teodor Sigaev <teodor@sigaev.ru> writes: > Phrase full text search. Hasn't this patch broken on-disk compatibility of type tsquery by renumbering the values of QueryOperator.operator? I'm looking at the patch delta in ts_type.h. regards, tom lane
I wrote: > ... I'm looking at the patch delta in ts_type.h. BTW, while I'm looking at that: comparePos() was a perfectly OK name for a static function within tsvector.c, but it seems like a pretty horrid name for a globally exposed linker symbol. Please rename it to something less generic. regards, tom lane
>> Phrase full text search. > > Hasn't this patch broken on-disk compatibility of type tsquery by > renumbering the values of QueryOperator.operator? I'm looking at > the patch delta in ts_type.h. Distance field is placed exactly in hole between two uint8_t fields and uint32_t field, as I known any known platform which we support uses 4-byte aligment for int32 type. Am I wrong? If yes then I will move distance to the end of struct. QueryOpertor struct isn't used directly to store to disk, it's used in union QueryItem. sizeof(QueryItem) = 12 sizeof(QueryOperator) = 8, so we can add distance to the end without growning size of QueryItem. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: >> Hasn't this patch broken on-disk compatibility of type tsquery by >> renumbering the values of QueryOperator.operator? I'm looking at >> the patch delta in ts_type.h. > Distance field is placed exactly in hole between two uint8_t fields and uint32_t > field, as I known any known platform which we support uses 4-byte aligment for > int32 type. Am I wrong? No, I'm worried about the fact that you changed the OP_xxx constants. Won't that cause a pre-existing tsquery operator to be read incorrectly? Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what they were before, which means you need to give up on the assumption that the numerical values of the OP_xxx constants correspond directly to their syntactic priority. But that assumption was never going to survive the next tsquery expansion anyway. I'd suggest a static const array mapping the OP values into their syntactic priorities. regards, tom lane
> Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what > they were before, which means you need to give up on the assumption that > the numerical values of the OP_xxx constants correspond directly to their > syntactic priority. But that assumption was never going to survive the > next tsquery expansion anyway. I'd suggest a static const array mapping > the OP values into their syntactic priorities. Oh, I see. Will fix. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/