Re: ANALYZE patch for review - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Re: ANALYZE patch for review |
Date | |
Msg-id | 29126.1075072056@sss.pgh.pa.us Whole thread Raw |
In response to | ANALYZE patch for review ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>) |
List | pgsql-patches |
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > Here is a first attempt at a patch to allow a customised ANALYZE > function to be specified for user-defined types, relating to the > following two threads: > http://archives.postgresql.org/pgsql-hackers/2003-10/msg00113.php and > http://archives.postgresql.org/pgsql-hackers/2004-01/msg00236.php This seems to be going in mostly the right direction, but I think you've made a few errors. One is that I think that the VacAttrStats structure should be palloc'd by the type-specific typanalyze function, and not by general code. The reason for this is that a type-specific function could allocate a larger struct of which VacAttrStats is just the first field, leaving room for additional fields that can communicate data to the later type-specific analysis routine (which would know that it is receiving this type and not just generic VacAttrStats). This is already useful for the standard analysis code --- it would want to keep eqopr, eqfunc, and ltopr there, so as to avoid the extra lookups required in your patch. Other datatypes would likely need comparable functionality. This would mean that pretty much the whole of examine_attribute is treated as type-specific code. In consequence there would be a certain amount of duplication of code across different type-specific setup routines, but that does not bother me. Also, I'd suggest that it would be better to define a zero in pg_type.typanalyze as selecting the default analysis routine. This would put the special case in just one place rather than having that knowledge all through pg_type.h plus CreateType. (If you do this, I think the default analysis routine wouldn't even need a pg_proc entry.) > The user-defined function uses a new STATISTICS_KIND_CUSTOM > constant to hold the data as I am not 100% sure how the planner > interprets the > STATISTIC_KIND_* values in the pg_statistic table. My aim is to > be able to store > a 2D histogram so that spatial row count estimates can be used > by the planner > in PostGIS (http://postgis.refractions.net) without having to > maintain the > statistics by manually running a stored procedure. The > pg_statistic.h file doesn't > seem clear to me as to what the values of the various columns > should be when not > dealing with single one-dimensional histograms..... Obviously we'd need to define more STATISTIC_KIND_xxx values to represent any additional kinds of statistics that are getting stored. "STATISTICS_KIND_CUSTOM" is exactly the way *not* to do that, as it would lead people to use the same ID code for different things. What we probably need to do is think of a suitable scheme for reserving STATISTIC_KIND codes for different uses. A really off-the-cuff suggestion is: 1-100: reserved for assignment by the core Postgres project 100-199: reserved for assignment by PostGIS 200-9999: reserved for other globally-known stats kinds 10000-32767: reserved for private site-local use People writing private datatypes would select random values above 10000 and have little chance of conflict. Anybody who wanted to write code for public distribution would want to come to us and get an assignment of space below 10000. > 4) The VacAttrStats structure may need to be moved out to a > different .h file > for access by the programmer (currently it is in > commands/vacuum.h) - maybe a > new analyze.h would be better? vacuum.h seems fine. regards, tom lane
pgsql-patches by date: