Re: ANALYZE patch for review - Mailing list pgsql-patches

From Tom Lane
Subject Re: ANALYZE patch for review
Date
Msg-id 23674.1075223783@sss.pgh.pa.us
Whole thread Raw
In response to Re: ANALYZE patch for review  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
List pgsql-patches
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
>> This would mean that pretty much the whole of
>> examine_attribute is treated as type-specific code.  In
>> consequence there would be a certain amount of duplication of
>> code across different type-specific setup routines, but that
>> does not bother me.

> My thinking behind this was that examine_attribute decides whether a
> column of a given type is analyzable so it made sense to provide the
> user with the type and column information. The one thing I do want to
> ensure is that the user function doesn't have to do things like check if
> the column is dropped or if the stats target is zero - that should be
> done by Postgres IMHO.

[ shrug ... ]  It's only a couple lines either way, and you could argue
that the present behavior for attstattarget = 0 should be subject to
customization anyway.  But if you prefer to pass in the pg_attribute and
pg_type rows already fetched, I won't object.

> I'll have a look at this too. The thing I really want to avoid is having
> to create lots of 'special case' code depending upon whether the default
> analysis routine is used or an external function. At the moment, the
> logic just calls the typanalyze entry using OidFunctionCall1 regardless
> which seems quite a neat solution.

My point was that there is going to be a 'special case' to provide the
default somewhere --- if not here, then in the CREATE TYPE code.  It
seems to me that localizing knowledge of what the default behavior is
wouldn't be a bad idea.

>> What we probably need to do is
>> think of a suitable scheme for reserving STATISTIC_KIND codes
>> for different uses.

> Would it really be necessary to need to do this? My understanding would
> be that using STATISTICS_KIND_CUSTOM merely marks the pg_statistic entry
> as being valid, i.e. it is valid data but the row sta* contents can only
> be interpreted by the specified type and operator combination, and
> should not attempt to be interpreted by any of the standard estimation
> functions.

I don't much like that.  It amounts to assuming that you will always be
able to guess from context what a STATISTICS_KIND_CUSTOM entry actually
is.  I think that's shortsighted.  As soon as there are a few different
custom estimation functions out there, they will be tripping over each
others' data.  You may be thinking that only one kind of custom data
would actually exist for any one datatype, but I'd turn that around:
what of custom analysis functions that are supposed to work for a range
of datatypes?  It'd be very unsafe for such functions to all use the
same STATISTICS_KIND_CUSTOM label.  The operator selectivity functions
that need to use the data couldn't be sure what they were looking at.

> Also in a selectivity estimator function would it be reasonable to call
> get_attstatsslot(i.e. is it a relatively stable API for external as well
> as internal use)

Yes, it's as stable as any other part of Postgres ... I don't see any
reason that an external selectivity estimator should be coded any
differently from the built-in ones.

            regards, tom lane

pgsql-patches by date:

Previous
From: Neil Conway
Date:
Subject: Re: improve join_collapse_limit docs
Next
From: Steven Singer
Date:
Subject: contrib/dbmirror