Hello Hackers,
This is my second proposal for a patch, so I hope not to make "rookie" mistakes.
This proposal patch is based on a simple use case :
If one creates a table this way
CREATE TABLE tst_table AS (SELECT array_to_tsvector(ARRAY['','abc','def']));
the table content is :
array_to_tsvector
-------------------
'' 'abc' 'def'
(1 row)
First it can be strange to have an empty string for tsvector lexeme but anyway, keep going to the real point.
Once dumped, this table dump contains that empty string that can't be restored.
tsvector_parse (./utils/adt/tsvector_parser.c) raises an error.
Thus it is not possible for data to be restored this way.
There are two ways to consider this : is it alright to have empty strings in lexemes ?
* If so, empty strings should be correctly parsed by tsvector_parser.
* If not, one should prevent empty strings from being stored into tsvectors.
Since "empty strings" seems not to be a valid lexeme, I undertook to change some functions dealing with tsvector to check whether string arguments are empty. This might be the wrong path as I'm not familiar with tsvector usage... (OTOH, I can provide a fix patch for tsvector_parser() if I'm wrong).
This involved changing the way functions like array_to_tsvector(), ts_delete() and setweight() behave. As for NULL values, empty string values are checked and an error is raised for such a value. It appears to me that ERRCODE_ZERO_LENGTH_CHARACTER_STRING (2200F) matched this behaviour but I may be wrong.
Since this patch changes the way functions behave, consider it as a simple try to overcome a strange situation we've noticed on a specific use case.
This included patch manages that checks for empty strings on the pointed out functions. It comes with modified regression tests. Patch applies along head/master and 14_RC1.
Comments are more than welcome!