Re: ANALYZE patch for review - Mailing list pgsql-patches

From Tom Lane
Subject Re: ANALYZE patch for review
Date
Msg-id 29126.1075072056@sss.pgh.pa.us
Whole thread Raw
In response to ANALYZE patch for review  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
List pgsql-patches
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> Here is a first attempt at a patch to allow a customised ANALYZE
> function to be specified for user-defined types, relating to the
> following two threads:
> http://archives.postgresql.org/pgsql-hackers/2003-10/msg00113.php and
> http://archives.postgresql.org/pgsql-hackers/2004-01/msg00236.php

This seems to be going in mostly the right direction, but I think you've
made a few errors.

One is that I think that the VacAttrStats structure should be palloc'd
by the type-specific typanalyze function, and not by general code.  The
reason for this is that a type-specific function could allocate a larger
struct of which VacAttrStats is just the first field, leaving room for
additional fields that can communicate data to the later type-specific
analysis routine (which would know that it is receiving this type and
not just generic VacAttrStats).  This is already useful for the standard
analysis code --- it would want to keep eqopr, eqfunc, and ltopr there,
so as to avoid the extra lookups required in your patch.  Other
datatypes would likely need comparable functionality.

This would mean that pretty much the whole of examine_attribute is
treated as type-specific code.  In consequence there would be a certain
amount of duplication of code across different type-specific setup
routines, but that does not bother me.

Also, I'd suggest that it would be better to define a zero in
pg_type.typanalyze as selecting the default analysis routine.
This would put the special case in just one place rather than
having that knowledge all through pg_type.h plus CreateType.
(If you do this, I think the default analysis routine wouldn't
even need a pg_proc entry.)

>     The user-defined function uses a new STATISTICS_KIND_CUSTOM
>     constant to hold the data as I am not 100% sure how the planner
> interprets the
>     STATISTIC_KIND_* values in the pg_statistic table. My aim is to
> be able to store
>     a 2D histogram so that spatial row count estimates can be used
> by the planner
>     in PostGIS (http://postgis.refractions.net) without having to
> maintain the
>     statistics by manually running a stored procedure. The
> pg_statistic.h file doesn't
>     seem clear to me as to what the values of the various columns
> should be when not
>     dealing with single one-dimensional histograms.....

Obviously we'd need to define more STATISTIC_KIND_xxx values to
represent any additional kinds of statistics that are getting stored.
"STATISTICS_KIND_CUSTOM" is exactly the way *not* to do that, as it
would lead people to use the same ID code for different things.
What we probably need to do is think of a suitable scheme for reserving
STATISTIC_KIND codes for different uses.  A really off-the-cuff
suggestion is:

    1-100:    reserved for assignment by the core Postgres project
    100-199: reserved for assignment by PostGIS
    200-9999: reserved for other globally-known stats kinds
    10000-32767: reserved for private site-local use

People writing private datatypes would select random values above 10000
and have little chance of conflict.  Anybody who wanted to write code
for public distribution would want to come to us and get an assignment
of space below 10000.

>     4) The VacAttrStats structure may need to be moved out to a
> different .h file
>     for access by the programmer (currently it is in
> commands/vacuum.h) - maybe a
>     new analyze.h would be better?

vacuum.h seems fine.

            regards, tom lane

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: next draft of list rewrite
Next
From: Neil Conway
Date:
Subject: Re: next draft of list rewrite