Any advice about function caching? - Mailing list pgsql-hackers

From Mark Cave-Ayland
Subject Any advice about function caching?
Date
Msg-id 9EB50F1A91413F4FA63019487FCD251D23931B@WEBBASEDDC.webbasedltd.local
Whole thread Raw
Responses Re: Any advice about function caching?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi everyone,

I'm currently in the middle of trying to implement a cache for a PostGIS
server-side function that uses an external library with an expensive startup
cost, and was hoping to find some advice on the best way to implement it.

The function currently takes 2 parameters - a customised C geometry datatype
and a destination SRID which is the primary key to the library parameters in
the spatial_ref_sys table. Currently for each invocation of the function it
is necessary to lookup the parameters for both the source and destinations
SRIDs in spatial_ref_sys, create a handle for each SRID in the external
library, perform the calculation, and then return the result.

Obviously this process is quite expensive when calling the function on large
tables of data. I've currently implemented a basic cache for both the
spatial_ref_sys lookups and the external library functions which gives a
dramatic performance improvement, dropping my test query from over 2 minutes
to 6s(!), but the part I'm experiencing difficulty with is working out when
the start and end of a query occurs.

My current code works by using MemoryContextCreate() to create a child
context to MessageContext and using the Init()/Delete() functions to
initialise and destroy a cache in the local backend. However, this doesn't
really work particularly well when using cursors and prepared queries since
it appears what I should be doing is using a cache per portal rather than a
cache per backend. The other complication is that the external library
handles cannot be free()d directly but instead a customised free function
must be called.

So my questions are:

1) What is the best way of keeping track of whether the cache has been
initalised for a given portal? Looking in contrib/dblink.c it looks as if
the best way to go would be to create a hash table per backend based upon
the portal name in ActivePortal, attach the child context to PortalContext,
and use the Delete() function to call the customised free function and
remove the hash table entry. However, which memory context would be the best
one to use in order to store the hash table entries?

2) Is there a better way of doing this?

3) Would anyone consider a patch to the source tree to make implementing
something like this easier in future?


Many thanks,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk
http://www.infomapper.com
http://www.swtc.co.uk

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.




pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Interval aggregate regression failure (expected seems
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Crash during elog.c...