Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Date
Msg-id 10309.1324357729@sss.pgh.pa.us
Whole thread Raw
In response to [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes  (Marti Raudsepp <marti@juffo.org>)
Responses Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes  (Marti Raudsepp <marti@juffo.org>)
List pgsql-hackers
Marti Raudsepp <marti@juffo.org> writes:
> Since PostgreSQL 9.1, GIN has new cost estimation code. This code
> assumes that the only expression type it's going to see is OpExpr.
> However, ScalarArrayOpExpr has also been possible in earlier versions.
> Estimating col <op> ANY (<array>) queries segfaults in 9.1 if there's
> a GIN index on the column.

Ugh.  I think we subconsciously assumed that ScalarArrayOpExpr couldn't
appear here because GIN doesn't set amsearcharray, but of course that's
wrong.

> (It seems that RowCompareExpr and NullTest clauses are impossible for
> a GIN index -- at least my efforts to find such cases failed)

No, those are safe for the moment --- indxpath.c has a hard-wired
assumption that RowCompareExpr is only usable with btree, and NullTest
is only considered to be indexable if amsearchnulls is set.  Still,
it'd likely be better if this code ignored unrecognized qual expression
types rather than Assert'ing they're not there.  It's not like the cases
it *does* handle are done so perfectly that ignoring an unknown qual
could be said to bollix the estimate unreasonably.

> Attached is an attempted fix for the issue. I split out the code for
> the extract call and now run that for each array element, adding
> together the average of (partialEntriesInQuals, exactEntriesInQuals,
> searchEntriesInQuals) for each array element. After processing all
> quals, I multiply the entries by the number of array_scans (which is
> the product of all array lengths) to get the total cost.

> This required a fair bit of refactoring, but I tried to follow the
> patterns for OpExpr pretty strictly -- discounting scans over NULL
> elements, returning 0 cost when none of the array elements can match,
> accounting for cache effects when there are multiple scans, etc. But
> it's also possible that I have no idea what I'm really doing. :)

Hmm.  I am reminded of how utterly unreadable "diff -u" format is for
anything longer than single-line changes :-( ... but I think I don't
like this refactoring much.  Will take a closer look tomorrow.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: JSON for PG 9.2
Next
From: Nikhil Sontakke
Date:
Subject: Re: Review: Non-inheritable check constraints