Re: ANALYZE patch for review - Mailing list pgsql-patches

From Tom Lane
Subject Re: ANALYZE patch for review
Date
Msg-id 22605.1075390271@sss.pgh.pa.us
Whole thread Raw
In response to Re: ANALYZE patch for review  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
List pgsql-patches
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> Is what you're saying that a custom function would do something like
> this:

> typedef struct
> {
>     VacAttrStats *v;
>     int    mydata1;
>     int    mydata2;
>     int    mydata3;
> } mystruct;

> myanalyzeinfo = palloc0(sizeof(mystruct));
> ..
> ..
> return myanalyzeinfo;

Exactly, except that the return would look like

return (VacAttrStruct *) myanalyzeinfo;

(or at least it would if you weren't converting to Datum anyway).  The
main analyze code would think it had a plain pointer to VacAttrStats,
but the data analysis subroutine associated with this setup routine
would know it could cast the pointer back to (mystruct *).

> VacAttrStats *vacattrptr;
> vacattrptr = (VacAttrStats *)myanalyzeinfo;
> vacattrptr->minrows = 2;

Seems like the hard way; you can just access myanalyzeinfo->v.minrows
when you're working with a pointer declared as mystruct *.

> is there a way of arranging
> the structures to ensure that the compiler throws a warning if
> VacAttrStats is not the first element of the structure?

Not that I know of, but we use techniques like this in very many places
in the PG code and have not had troubles of that sort.  I'm not too
concerned about that.

> I'm beginning to think that perhaps we're looking at this in the wrong
> way, and that a more elegant version of what you're suggesting could be
> implemented using a major/minor method of identifying a statistics type.

If you suppose that the "major" field is the upper bits of the
statistics ID value, then this is just a slightly different way of
thinking about the range-based allocation method I suggested before.
However, the range-based method can adapt to allocating different
amounts of identifier space to different owners, whereas a major/minor
approach can't easily do that since you've defined it to be 2^N minor
IDs for each major code.

> I suppose this also means that there might need to be a mechanism where
> a custom analysis function would want to add more than one row to the
> pg_statistic table (ie it collects more than one set of statistics per
> column);

As long as the custom analysis function is tied to a datatype, I don't
see the need for that.  It already has the ability to define up to four
different "kinds" of stats for its column, and if someone makes a case
for wanting more, I'd be inclined to widen pg_statistic rather than get
into multiple rows per column.  (To point out just one problem, such an
arrangement would break the unique index scheme presently used.)  I'm
not sure yet about whether we need special hooks for keeping stats on
functional indexes, but if we do they'd surely not be looked up by
datatype, or at least not datatype alone.

            regards, tom lane

pgsql-patches by date:

Previous
From: "Mark Cave-Ayland"
Date:
Subject: Re: ANALYZE patch for review
Next
From: Brian Moore
Date:
Subject: Re: support for printing/exporting xml