[ couldn't let go of this ... ]
I wrote:
> It's fairly confusing that this code manages to ignore not-ISOPERATOR
> punctuation. It seems like that gets eaten by gettoken_tsvector()
> and then later we decide there's not really a word there.
Yeah, further investigation shows that such cases effectively act
like stopwords: they are passed back to makepol() as VAL strings,
but then lexize processing rejects them as not words.
> I'm also confused how come the same thing doesn't happen in the
> english tsconfig. Not sure it's worth poking at more, though.
D'oh: "or" is a stopword in the english config. The english case
is still wrong of course, just differently:
regression=# select websearch_to_tsquery('english', 'foo or (baz bar) or (ding dong)');
websearch_to_tsquery
-----------------------------------------
'foo' | 'baz' & 'bar' & 'ding' & 'dong'
(1 row)
regards, tom lane