[PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes - Mailing list pgsql-hackers

From Marti Raudsepp
Subject [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Date
Msg-id CABRT9RCN+KTb5eHHkHFmdsioJ6qjGJMubOuMfMHq_esTqqDPqA@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
List pgsql-hackers
Hi list,

Since PostgreSQL 9.1, GIN has new cost estimation code. This code
assumes that the only expression type it's going to see is OpExpr.
However, ScalarArrayOpExpr has also been possible in earlier versions.
Estimating col <op> ANY (<array>) queries segfaults in 9.1 if there's
a GIN index on the column.

Case in point:
create table words (word text);
create index on words using gin (to_tsvector('english', word));
explain analyze select * from words where to_tsvector('english', word)
@@ any ('{foo}');

(It seems that RowCompareExpr and NullTest clauses are impossible for
a GIN index -- at least my efforts to find such cases failed)

Attached is an attempted fix for the issue. I split out the code for
the extract call and now run that for each array element, adding
together the average of (partialEntriesInQuals, exactEntriesInQuals,
searchEntriesInQuals) for each array element. After processing all
quals, I multiply the entries by the number of array_scans (which is
the product of all array lengths) to get the total cost.

This required a fair bit of refactoring, but I tried to follow the
patterns for OpExpr pretty strictly -- discounting scans over NULL
elements, returning 0 cost when none of the array elements can match,
accounting for cache effects when there are multiple scans, etc. But
it's also possible that I have no idea what I'm really doing. :)

I also added regression tests for this to tsearch and pg_trgm.

Regards,
Marti

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Review: Non-inheritable check constraints
Next
From: "Kevin Grittner"
Date:
Subject: Re: Page Checksums