Thread: get_relation_stats_hook()

get_relation_stats_hook()

From
Simon Riggs
Date:
Currently we have a plugin capability for get_relation_info_hook(), but
no corresponding capability for statistics info.

So, all calls to SearchSysCache would be replaced with a call to
get_relation_info_hook(), if present.

Any objections, thoughts?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: get_relation_stats_hook()

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> Currently we have a plugin capability for get_relation_info_hook(), but
> no corresponding capability for statistics info.
> 
> So, all calls to SearchSysCache would be replaced with a call to
> get_relation_info_hook(), if present.

I assume you meant get_relation_stats_hook in the last paragraph, but I
can't understand what you mean with SearchSysCache (surely that's not
it.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: get_relation_stats_hook()

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Currently we have a plugin capability for get_relation_info_hook(), but
> no corresponding capability for statistics info.

> So, all calls to SearchSysCache would be replaced with a call to
> get_relation_info_hook(), if present.

Surely you didn't mean ALL calls.  Please be more specific about what
you're proposing.
        regards, tom lane


Re: get_relation_stats_hook()

From
Simon Riggs
Date:
On Thu, 2008-06-26 at 11:18 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Currently we have a plugin capability for get_relation_info_hook(), but
> > no corresponding capability for statistics info.
> 
> > So, all calls to SearchSysCache would be replaced with a call to
> > get_relation_info_hook(), if present.
> 
> Surely you didn't mean ALL calls.  Please be more specific about what
> you're proposing.

The statistics relation STATRELATT is accessed in a few places in the
planner. Since it is in the syscache it is accessed directly from there.
I would like to add hooks so that stats data can come from somewhere
else other than the syscache for tables, just as we can already do with
get_relation_stats_hook(). These new changes would complete the existing
feature to ensure it is fully usabled in the way originally intended.

In selfunc.c: There are 3 calls to SearchSysCache(STATRELATT,...).

In lsyscache.c: There is 1 call to SearchSysCache(STATRELATT...) in
get_attavgwidth() and 2 calls to SysCacheGetAttr(STATRELATT...) in
get_attstatsslot().

Calls to SearchSysCache(STATRELATT...) would be replaced by a call to an
external module, if present, with a function pointer to
get_relation_stats_hook(). This returns a tuple with stats info about
that relation.

Calls to SysCacheGetAttr(STATRELATT...) would be replaced by a call to
an external module, if present with a function pointer to
get_attribute_stats_hook(). This returns a stats slot.

The call in ANALYZE would not be touched.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: get_relation_stats_hook()

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
>> Surely you didn't mean ALL calls.  Please be more specific about what
>> you're proposing.

> The statistics relation STATRELATT is accessed in a few places in the
> planner. Since it is in the syscache it is accessed directly from there.
> I would like to add hooks so that stats data can come from somewhere
> else other than the syscache for tables, just as we can already do with
> get_relation_stats_hook().

Well, defining the hooks as replacing STATRELATT lookups seems the wrong
level of abstraction.  What if you want to insert stats that can't be
represented by the current pg_statistic definition?  Even if there's no
functional limitation, cons'ing up a dummy pg_statistic tuple seems the
hard way to do it --- we didn't define get_relation_info_hook as working
by supplying made-up catalog rows.  Furthermore, many of the interesting
cases that someone might want to hook into are code paths that don't
even try to look up a pg_statistic row because they know there won't be
one, such as two out of the three cases in examine_variable().

I think you need to move up a level, and perhaps refactor some of the
existing code to make it easier to inject made-up stats.
        regards, tom lane


Re: get_relation_stats_hook()

From
Simon Riggs
Date:
On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> >> Surely you didn't mean ALL calls.  Please be more specific about what
> >> you're proposing.
> 
> > The statistics relation STATRELATT is accessed in a few places in the
> > planner. Since it is in the syscache it is accessed directly from there.
> > I would like to add hooks so that stats data can come from somewhere
> > else other than the syscache for tables, just as we can already do with
> > get_relation_stats_hook().
> 
> Well, defining the hooks as replacing STATRELATT lookups seems the wrong
> level of abstraction.  What if you want to insert stats that can't be
> represented by the current pg_statistic definition?  Even if there's no
> functional limitation, cons'ing up a dummy pg_statistic tuple seems the
> hard way to do it --- we didn't define get_relation_info_hook as working
> by supplying made-up catalog rows.  Furthermore, many of the interesting
> cases that someone might want to hook into are code paths that don't
> even try to look up a pg_statistic row because they know there won't be
> one, such as two out of the three cases in examine_variable().

The reason for doing it this way is I'm interested in using stats
literally copied from other servers. So the pg_statistic tuples will be
available for us directly. I'm building a tool to allow people to export
their production environment to a test system, so that SQL developers
can experiment with query tuning and optimizer developers can recreate
problems.

> I think you need to move up a level, and perhaps refactor some of the
> existing code to make it easier to inject made-up stats.

Both sound like good ideas. I wasn't really after ultimate flexibility,
but perhaps I should be.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: get_relation_stats_hook()

From
Gregory Stark
Date:
Hm, I assume we want to be able to turn on and off plugins in a running
session? I think the "free_using_plugin" flag:


!             if (get_relation_stats_hook)
!                 vardata->statsTuple = (*get_relation_stats_hook) 
!                                                 (rte->relid, 
!                                                  var->varattno);
! 
!             if (vardata->statsTuple)
!                 vardata->free_using_plugin = true;                
!             else
!                 vardata->statsTuple = SearchSysCache(STATRELATT,

is insufficient to handle this. vardata->free_using_plugin could be true but
by the time the variable is released the plugin pointer could have been
cleared. Or worse, set to a different plugin.

The easiest way to fix this seems like also the best way, instead of storing a
boolean store the pointer to the release function.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: get_relation_stats_hook()

From
Simon Riggs
Date:
On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote:

> The easiest way to fix this seems like also the best way, instead of storing a
> boolean store the pointer to the release function.

OK, I like that better anyhow.

Hadn't thought about turning plugin off, but I can see the requirement
now your raise it.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: get_relation_stats_hook()

From
Simon Riggs
Date:
On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote:

> The easiest way 

Did you have further review comments? If so, I'll wait for those before
making further mods. Thanks for ones so far.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: get_relation_stats_hook()

From
Simon Riggs
Date:
On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote:

> The easiest way to fix this seems like also the best way, instead of storing a
> boolean store the pointer to the release function.

Implemented as suggested. v5 enclosed.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment