Re: ANALYZE patch for review - Mailing list pgsql-patches

From Mark Cave-Ayland
Subject Re: ANALYZE patch for review
Date
Msg-id 8F4A22E017460A458DB7BBAB65CA6AE502654A@openmanage
Whole thread Raw
In response to ANALYZE patch for review  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
Responses Re: ANALYZE patch for review  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
Hi Tom,

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 25 January 2004 23:08
> To: Mark Cave-Ayland
> Cc: pgsql-patches@postgresql.org
> Subject: Re: [PATCHES] ANALYZE patch for review
>
>
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > Here is a first attempt at a patch to allow a customised ANALYZE
> > function to be specified for user-defined types, relating to the
> > following two threads:
> >
> http://archives.postgresql.org/pgsql-hackers/2003-10/msg00113.php and
> > http://archives.postgresql.org/pgsql-hackers/2004-01/msg00236.php
>
> This seems to be going in mostly the right direction, but I
> think you've made a few errors.
>
> One is that I think that the VacAttrStats structure should be
> palloc'd by the type-specific typanalyze function, and not by
> general code.  The reason for this is that a type-specific
> function could allocate a larger struct of which VacAttrStats
> is just the first field, leaving room for additional fields
> that can communicate data to the later type-specific analysis
> routine (which would know that it is receiving this type and
> not just generic VacAttrStats).  This is already useful for
> the standard analysis code --- it would want to keep eqopr,
> eqfunc, and ltopr there, so as to avoid the extra lookups
> required in your patch.  Other datatypes would likely need
> comparable functionality.
>
> This would mean that pretty much the whole of
> examine_attribute is treated as type-specific code.  In
> consequence there would be a certain amount of duplication of
> code across different type-specific setup routines, but that
> does not bother me.

My thinking behind this was that examine_attribute decides whether a
column of a given type is analyzable so it made sense to provide the
user with the type and column information. The one thing I do want to
ensure is that the user function doesn't have to do things like check if
the column is dropped or if the stats target is zero - that should be
done by Postgres IMHO. I do like the idea of having a custom structure
passed between functions as like you say it makes things a lot more
flexible.... Will see what I can do about it.

> Also, I'd suggest that it would be better to define a zero in
> pg_type.typanalyze as selecting the default analysis routine.
> This would put the special case in just one place rather than
> having that knowledge all through pg_type.h plus CreateType.
> (If you do this, I think the default analysis routine
> wouldn't even need a pg_proc entry.)

I'll have a look at this too. The thing I really want to avoid is having
to create lots of 'special case' code depending upon whether the default
analysis routine is used or an external function. At the moment, the
logic just calls the typanalyze entry using OidFunctionCall1 regardless
which seems quite a neat solution.

> >     The user-defined function uses a new STATISTICS_KIND_CUSTOM
> >     constant to hold the data as I am not 100% sure how the planner
> > interprets the
> >     STATISTIC_KIND_* values in the pg_statistic table. My
> aim is to be
> > able to store
> >     a 2D histogram so that spatial row count estimates can
> be used by the
> > planner
> >     in PostGIS (http://postgis.refractions.net) without having to
> > maintain the
> >     statistics by manually running a stored procedure. The
> pg_statistic.h
> > file doesn't
> >     seem clear to me as to what the values of the various
> columns should
> > be when not
> >     dealing with single one-dimensional histograms.....
>
> Obviously we'd need to define more STATISTIC_KIND_xxx values
> to represent any additional kinds of statistics that are
> getting stored. "STATISTICS_KIND_CUSTOM" is exactly the way
> *not* to do that, as it would lead people to use the same ID
> code for different things. What we probably need to do is
> think of a suitable scheme for reserving STATISTIC_KIND codes
> for different uses.  A really off-the-cuff suggestion is:
>
>     1-100:    reserved for assignment by the core Postgres project
>     100-199: reserved for assignment by PostGIS
>     200-9999: reserved for other globally-known stats kinds
>     10000-32767: reserved for private site-local use
>
> People writing private datatypes would select random values
> above 10000 and have little chance of conflict.  Anybody who
> wanted to write code for public distribution would want to
> come to us and get an assignment of space below 10000.

Would it really be necessary to need to do this? My understanding would
be that using STATISTICS_KIND_CUSTOM merely marks the pg_statistic entry
as being valid, i.e. it is valid data but the row sta* contents can only
be interpreted by the specified type and operator combination, and
should not attempt to be interpreted by any of the standard estimation
functions.

Also in a selectivity estimator function would it be reasonable to call
get_attstatsslot(i.e. is it a relatively stable API for external as well
as internal use) since this function does a lot of the work that would
be required by an estimation function?

> >     4) The VacAttrStats structure may need to be moved out
> to a different
> > .h file
> >     for access by the programmer (currently it is in
> > commands/vacuum.h) - maybe a
> >     new analyze.h would be better?
>
> vacuum.h seems fine.

OK, I'll leave it where it is.


Many thanks,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.



pgsql-patches by date:

Previous
From: Patrick Samson
Date:
Subject: Re: pltcl - "Cache lookup for attribute" error - version 2
Next
From: Neil Conway
Date:
Subject: Re: improve join_collapse_limit docs