Thread: Traffic jams in fn_extra

Traffic jams in fn_extra

From
Paul Ramsey
Date:
As an extension with a lot of CPU load, we (postgis) tend to use flinfo->fn_extra a lot, for caching things that are
intensiveto calculate at the start of a query and reuse throughout subsequent functions calls. 
 

- coordinate projection objects
- indexes of the edges of large geometries
- other kinds of indexes of the edges of large geometries :)
- schemas of lidar pointcloud collections

As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functions
tryingto stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra, to
holdthe different kinds of stuff we might want to store, a GenericCacheCollection
 

https://github.com/postgis/postgis/blob/svn-trunk/libpgcommon/lwgeom_cache.c#L46-L48

As (by now) a connoisseur of fn_extra caching, I've noticed while reading bits of PgSQL code that there are far more
placesthat stuff state into fn_extra than I ever knew, and that they do so without any substantial concern that other
statemight already be there. (Well, that's not true, they usually check for NULL and they give up if fn_extra is
alreadyin use.) The range types, I was surprised to find doing some performance caching in fn_extra. The set-returning
functionmacros use it to hold state. And many others I'm sure.
 

Would it be good/wise to add another, better managed, slot to flinfo, one that isn't just void* but is a hash? (Or, has
somemanagement macros to handle it and make it a hash* if it's null, whatever API makes sense) so that multiple bits of
codecan cache state over function calls without banging into one another?
 

flinfo->fn_extra_hash perhaps?

If this sounds OK, I'd be honored to try and make it my first submission to PgSQL.

P. 

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net





Re: Traffic jams in fn_extra

From
Tom Lane
Date:
Paul Ramsey <pramsey@cleverelephant.ca> writes:
> As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functions
tryingto stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra, to
holdthe different kinds of stuff we might want to store, a GenericCacheCollection
 

TBH, I fail to understand what you're on about here.  Any one function
owns the value of fn_extra in calls to it, and is solely responsible for
its usage; furthermore, there's no way for any other code to mangle that
pointer unless the owner explicitly makes it available.  So where is
the problem?  And if there is a problem, how does adding another field
of exactly the same kind make it better?
        regards, tom lane



Re: Traffic jams in fn_extra

From
Paul Ramsey
Date:
On Tue, Nov 19, 2013 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Paul Ramsey <pramsey@cleverelephant.ca> writes:
>> As we've added different kinds of caching, in our own project, we've banged up against problems of multiple
functionstrying to stuff information into the same pointer, and ended up putting an extra container of our own into
fn_extra,to hold the different kinds of stuff we might want to store, a GenericCacheCollection 
>
> TBH, I fail to understand what you're on about here.  Any one function
> owns the value of fn_extra in calls to it, and is solely responsible for
> its usage; furthermore, there's no way for any other code to mangle that
> pointer unless the owner explicitly makes it available.  So where is
> the problem?  And if there is a problem, how does adding another field
> of exactly the same kind make it better?

Right, sorry, I'm reasoning overly aggressively from the specific to
the general. The specific problems have been

- two lines of geometry caching code, either of which can be called
within a single function, depending on the inputs, which mostly didn't
result in conflicts, except when they did, which was eventually
rectified by layering a GenericCacheCollection into the function
- a cached lidar schema object which would have been very useful to
have around in an SRF, but couldn't because the SRF needed the
fn_extra slot

The first case is an application-specific problem, and since we've
coded around it, the only advantage to a pgsql-specific fix would be
to allow others who also need to cache several independent things to
not reinvent that wheel.

The second case is one of the instances where the pgsql code itself is
getting in the way and cannot be worked around at the application
level. My solution was just not to do caching for that function.

So, that's what I perceive the problem to be. Now that you point it
out to me, yes, it's pretty small bore stuff.

On the solution, I wasn't suggesting another void* slot, but rather a
slot that holds a hash table, so that an arbitrary number of things
can be stuffed in. Overkill, really, since in 99.9% of times only one
thing would be in there, and in the other 0.1% of times two things. In
our own GenericCacheCollection, we just statically allocate 16 slots.

P.



Re: Traffic jams in fn_extra

From
Simon Riggs
Date:
On 19 November 2013 23:08, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

> On the solution, I wasn't suggesting another void* slot, but rather a
> slot that holds a hash table, so that an arbitrary number of things
> can be stuffed in. Overkill, really, since in 99.9% of times only one
> thing would be in there, and in the other 0.1% of times two things. In
> our own GenericCacheCollection, we just statically allocate 16 slots.

Why do you need to do this dance with fn_extra?

It's possible to allocate a hash table in a Transaction-lifetime
memory context on first call into a function then cache things there.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Traffic jams in fn_extra

From
Paul Ramsey
Date:
Hi Simon,

We do the dance because it’s how we always have and don’t know any other way, any better way. :) The usual explanation.
Isthere any place you can point to that demonstrates your technique?   

Thanks!

P

--
Paul Ramsey
http://cleverelephant.ca/
http://postgis.net/


On Sunday, November 24, 2013 at 8:21 AM, Simon Riggs wrote:

> On 19 November 2013 23:08, Paul Ramsey <pramsey@cleverelephant.ca (mailto:pramsey@cleverelephant.ca)> wrote:
>
> > On the solution, I wasn't suggesting another void* slot, but rather a
> > slot that holds a hash table, so that an arbitrary number of things
> > can be stuffed in. Overkill, really, since in 99.9% of times only one
> > thing would be in there, and in the other 0.1% of times two things. In
> > our own GenericCacheCollection, we just statically allocate 16 slots.
>
>
>
> Why do you need to do this dance with fn_extra?
>
> It's possible to allocate a hash table in a Transaction-lifetime
> memory context on first call into a function then cache things there.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services






Re: Traffic jams in fn_extra

From
Simon Riggs
Date:
On 24 November 2013 16:02, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

> We do the dance because it’s how we always have and don’t know any other way, any better way. :) The usual
explanation.Is there any place you can point to that demonstrates your technique? 

src/backend/utils/mmgr/README

You can create memory contexts as children of other contexts, so for
example you might create "PostGIS Cache Context" as a sub-context of
TopTransactionContext. So it can be created dynamically as needed and
will automatically go away at end of xact.

Or you could use CurTransactionContext if you want to do things at
subtransaction level.

This is all used very heavily within Postgres itself, including the
various caches in different parts of the code.

Obviously, if you start cacheing too much then people will claim that
PostGIS is leaking memory, so it depends how far you go. But then you
might alleviate that with a postgis.session_cache parameter to
acknowledge and allow control.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Traffic jams in fn_extra

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 24 November 2013 16:02, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>> We do the dance because it�s how we always have and don�t know any other way, any better way. :) The usual
explanation.Is there any place you can point to that demonstrates your technique?
 

> src/backend/utils/mmgr/README

> You can create memory contexts as children of other contexts, so for
> example you might create "PostGIS Cache Context" as a sub-context of
> TopTransactionContext. So it can be created dynamically as needed and
> will automatically go away at end of xact.

The real question of course is whether transaction-level caching is
appropriate for what they're storing.  If they want only statement-level
caching then using fn_extra is often the right thing.

Also note that having the cache go away is the easy part.  The hard part
is knowing whether you've created it yet in the current transaction, and
finding it if you have.  The usual method is to keep a static variable
pointing to it, and plugging into the transaction cleanup callback
mechanism with a routine that'll reset the pointer to NULL at transaction
end.  For examples, look for callers of RegisterXactCallback().
        regards, tom lane



Re: Traffic jams in fn_extra

From
Paul Ramsey
Date:
On Sunday, November 24, 2013 at 4:42 PM, Tom Lane wrote:
> The real question of course is whether transaction-level caching is
> appropriate for what they're storing. If they want only statement-level
> caching then using fn_extra is often the right thing.

I'm not certain it is... we get some great effects out of the statement level stuff, and it really works great except
forthose cases where somebody else is already taking the slot (SRF_*)
 
> Also note that having the cache go away is the easy part. The hard part
> is knowing whether you've created it yet in the current transaction, and
> finding it if you have. The usual method is to keep a static variable
> pointing to it, and plugging into the transaction cleanup callback
> mechanism with a routine that'll reset the pointer to NULL at transaction
> end. For examples, look for callers of RegisterXactCallback().
> 

I'm glad you said that, because I felt too stoopid to ask :) my previous spelunking through memory contexts showed that
it'seasy to get the memory, hard to find it again. Which is why fn_extra is so appealing, it's just sitting there, and
theparent context goes away at the end of the query, so it's wonderfully simple to use... if there's not already
someonein it.
 

Thanks for the advice, it could be very helpful for my pointcloud work, since having access to a cache object during
SRF_*stuff could improve performance quite a bit, so the complexity trade-off of using the transactional context could
bewell worth it.
 

P.





Re: Traffic jams in fn_extra

From
Greg Stark
Date:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Sun, Nov 24, 2013 at 4:21 PM, Simon Riggs
<spandir="ltr"><<a href="mailto:simon@2ndquadrant.com" target="_blank">simon@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
id=":3tm"style="overflow:hidden">Why do you need to do this dance with fn_extra?<br /><br /> It's possible to allocate
ahash table in a Transaction-lifetime<br /> memory context on first call into a function then cache things
there.</div></blockquote></div><br/></div><div class="gmail_extra">fn_extra gives a handle per function call site.<br
/><br/>It sounds to me like the complexity is coming not from having many Postgres functions but from having lots of
infrastructurebacking up those functions. So if many of their Postgres functions call a C function to do various things
andthe C function wants to cache something somewhere related to the object they've been passed then the natural thing
todo is have the Postgres function pass fn_extra down to the C function but if they have many such C functions it gets
abit tricky.<br /><br /><br /></div><div class="gmail_extra">But you could declare fn_extra to be a hash table yourself
sinceit's your Postgres function that gets to decide how fn_extra is going to be used. You could then pass that hash
tabledown to the various C functions to cache state. However that might still be a bit odd. If you call the same C
functiontwice from the same Postgres function it'll get the same hash table for both calls. fn_extra is per Postgres
functioncall site.<br /><br clear="all" /><br />-- <br />greg<br /></div></div>