ANALYZE versus expression indexes with nondefault opckeytype - Mailing list pgsql-hackers

From Tom Lane
Subject ANALYZE versus expression indexes with nondefault opckeytype
Date
Msg-id 16619.1280548406@sss.pgh.pa.us
Whole thread Raw
Responses Re: ANALYZE versus expression indexes with nondefault opckeytype
List pgsql-hackers
I've been poking at the FTS problem example recently submitted by
Artur Dabrowski.  It contains an index declared like so:

CREATE INDEX idx_keywords_ger ON search_tabUSING gin (to_tsvector('german'::regconfig, keywords));

which can be queried like so

select * from search_tab where to_tsvector('german', keywords) @@ to_tsquery('german', 'whatever');

which is all fine and matches suggestions in our documentation.  But I
was dismayed to discover that ANALYZE wasn't storing any statistics
for this expression index, which rather hobbles the planner's ability
to produce useful estimates for the selectivity of such WHERE clauses.

The reason for that turns out to be this bit in analyze.c:
                       /*                        * Can't analyze if the opclass uses a storage type
  * different from the expression result type. We'd get                        * confused because the type shown in
pg_attributefor                        * the index column doesn't match what we are getting                        *
fromthe expression. Perhaps this can be fixed                        * someday, but for now, punt.
 */                       if (exprType(indexkey) !=                           Irel[ind]->rd_att->attrs[i]->atttypid)
                      continue;
 

Since a tsvector index does have a storage type different from tsvector
(in fact, it's shown as TEXT), this code decides to punt and not analyze
the column.  I think it's past time we did something about this.

After a bit of study of the code, it appears to me that it's not too
difficult to fix: we just have to use the expression's result type
rather than the index column's atttypid in the subsequent processing.
ANALYZE never actually looks at the index column contents (indeed
cannot, since our index AMs provide no API for extracting the contents),
so atttypid isn't really all that relevant to it.  However, this'll
imply an API/ABI break for custom typanalyze functions: we'll need to
store the actual type OID in struct VacAttrStats, and use that rather
than looking to attr->atttypid or any of the other type-dependent
fields of the Form_pg_attribute.  So that seems to make it not practical
to back-patch.

I thought of an ugly hack that would avoid an API/ABI break: since
VacAttrStats.attr is a copy of the pg_attribute data, we could scribble
on it, changing atttypid, attlen, attbyval, etc to match the index
expression result type.  This seems pretty grotty, but it would allow
the fix to be back-patched into existing branches.

Comments?  I'm not sure which way to jump here.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: reducing NUMERIC size for 9.1
Next
From: Tom Lane
Date:
Subject: rbtree code breaks GIN's adherence to maintenance_work_mem