Re: Tsvector editing functions - Mailing list pgsql-hackers

From Stas Kelvich
Subject Re: Tsvector editing functions
Date
Msg-id 66804074-E0FA-45BF-B898-CE9BBEA64F9F@postgrespro.ru
Whole thread Raw
In response to Re: Tsvector editing functions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Tsvector editing functions  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Hi

> On 22 Jan 2016, at 19:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> OK, although I do recommend using more sensible variable names, i.e. why how to use 'lexemes' instead of 'lexarr' for
example?Similarly for the other functions. 


Changed. With old names I tried to follow conventions in surrounding code, but probably that is a good idea to switch
tomore meaningful names in new code. 

>>
>>
>> delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of tsv_filter from tsin. When lexeme in
tsv_filterhas no positions function will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have
positionsfunction will delete them from positions of matching lexeme in tsin. If after such removal resulting positions
setis empty then function will delete that lexeme from resulting tsvector. 
>>
>
> I can't really imagine situation in which I'd need this, but if you do have a use case for it ... although in the
initialparagraph you say "... but if somebody wants to delete for example ..." which suggests you may not have such use
case.
>
> Based on bad experience with extending API based on vague ideas, I recommend only really adding functions with
existingneed. It's easy to add a function later, much more difficult to remove it or change the signature. 

I tried to create more or less self-contained api, e.g. have ability to negate effect of concatenation. But i’ve also
askedpeople around what they think about extending API and everybody convinced that it is better to stick to smaller
API.So let’s drop it. At least that functions exists in mail list in case if somebody will google for such kind of
behaviour.

>>
>> Also if we want some level of completeness of API and taking into account that concat() function shift positions on
secondargument I thought that it can be useful to also add function that can shift all positions of specific value.
Thishelps to undo concatenation: delete one of concatenating tsvectors and then shift positions in resulting tsvector.
SoI also wrote one another small function: 
>>
>> shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset
>
> That seems rather too low-level. Shouldn't it be really built into delete() directly somehow?


I think it is ambiguous task on delete. But if we are dropping support of delete(tsvector, tsvector) I don’t see points
inkeeping that functions. 

>>>
>>> 7) Some of the functions use intexterm that does not match the function
>>>   name. I see two such cases - to_tsvector and setweight. Is there a
>>>   reason for that?
>>>
>>
>> Because sgml compiler wants unique indexterm. Both functions that
>> youmentioned use overloading of arguments and have non-unique name.
>
> As Michael pointed out, that should probably be handled by using <primary> and <secondary> tags.


Done.


> On 19 Jan 2016, at 00:21, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
>
> It's a bit funny that you reintroduce the "unrecognized weight: %d"
> (instead of %c) in tsvector_setweight_by_filter.
>


Ah, I was thinking about moving it to separate diff and messed. Fixed and attaching diff with same fix for old
tsvector_setweight.





---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: "Igal @ Lucee.org"
Date:
Subject: Implementing a new Scripting Language
Next
From: Vladimir Sitnikov
Date:
Subject: Re: Implementing a new Scripting Language