Re: Empty string in lexeme for tsvector - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: Empty string in lexeme for tsvector
Date
Msg-id CAEudQAqpKnCbnVonG6xSXhf9j4D0-RJs7x7At+=ATVxUC=NX-Q@mail.gmail.com
Whole thread Raw
In response to Re: Empty string in lexeme for tsvector  (Jean-Christophe Arnu <jcarnu@gmail.com>)
List pgsql-hackers
Em sex., 24 de set. de 2021 às 09:39, Jean-Christophe Arnu <jcarnu@gmail.com> escreveu:


Le ven. 24 sept. 2021 à 13:03, Ranier Vilela <ranier.vf@gmail.com> a écrit :

Comments are more than welcome!
1. Would be better to add this test-and-error before tsvector_bsearch call.

+ if (lex_len == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+ errmsg("lexeme array may not contain empty strings")));
+

If lex_len is equal to zero, better get out soon.

2. The second test-and-error can use lex_len, just like the first test,
I don't see the point in recalculating the size of lex_len if that's already done.

+ if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+ errmsg("lexeme array may not contain empty strings")));
+

Hello Ranier,
Thank you for your comments.
Here's a new patch file taking your comments into account.
Thanks.


I was just wondering if empty string eviction is done in the right place.
As you rightfully commented, lex_len is calculated later (once again for a
right purpose) and my code checks for empty strings as soon as possible.
To me, it seems to be the right thing to do (prevent further processing on lexemes
as soon as possible) but I might omit something.
It's always good to avoid unnecessary processing.

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Jeevan Ladhe
Date:
Subject: Re: refactoring basebackup.c
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side