Hi list,
Since PostgreSQL 9.1, GIN has new cost estimation code. This code
assumes that the only expression type it's going to see is OpExpr.
However, ScalarArrayOpExpr has also been possible in earlier versions.
Estimating col <op> ANY (<array>) queries segfaults in 9.1 if there's
a GIN index on the column.
Case in point:
create table words (word text);
create index on words using gin (to_tsvector('english', word));
explain analyze select * from words where to_tsvector('english', word)
@@ any ('{foo}');
(It seems that RowCompareExpr and NullTest clauses are impossible for
a GIN index -- at least my efforts to find such cases failed)
Attached is an attempted fix for the issue. I split out the code for
the extract call and now run that for each array element, adding
together the average of (partialEntriesInQuals, exactEntriesInQuals,
searchEntriesInQuals) for each array element. After processing all
quals, I multiply the entries by the number of array_scans (which is
the product of all array lengths) to get the total cost.
This required a fair bit of refactoring, but I tried to follow the
patterns for OpExpr pretty strictly -- discounting scans over NULL
elements, returning 0 cost when none of the array elements can match,
accounting for cache effects when there are multiple scans, etc. But
it's also possible that I have no idea what I'm really doing. :)
I also added regression tests for this to tsearch and pg_trgm.
Regards,
Marti