Thread: Rethinking sinval callback hook API
Currently, we have two types of callbacks that can be registered to get control when an invalidation message is received: syscache callbacks and relcache callbacks. It strikes me that we might be better advised to unify these into a single type of callback that gets a SharedInvalidationMessage struct pointer (we could pass NULL to signify a cache reset event). That would avoid having to add another registration list every time we decide that there's a reason for callbacks to see another type of inval message. There are a couple of reasonably near-term reasons why we might want to do this: 1. Robert was speculating the other day about wanting to be able to snoop the inval traffic. Right now, callbacks can only snoop a fairly small subset of it. 2. In conjunction with what I'm doing with plancache, it struck me that it might be nice to subdivide relcache inval messages into two types, one indicating a schema (DDL) change and one that just indicates that statistics changed; this would allow us to avoid redoing parse analysis and rewrite for a cached query as a consequence of autovacuum stats updates. With the current scheme, plancache.c would need to register two different kinds of callbacks to handle that, or we'd need some other API change for relcache callbacks so they could distinguish. A small disadvantage of this is that callbacks would have to know about struct SharedInvalidationMessage, which right now isn't tremendously widely known, but that doesn't seem like a fatal objection to me. In practice that struct definition has been pretty stable. Also, right now (9.2 cycle) would be a good time to do this since we already changed the API for syscache callbacks as a result of my cache bug investigations last week. Any third-party code that's hooking into this area is going to need changes for 9.2 anyway. Thoughts? regards, tom lane
On Fri, Aug 19, 2011 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Currently, we have two types of callbacks that can be registered to get > control when an invalidation message is received: syscache callbacks and > relcache callbacks. It strikes me that we might be better advised to > unify these into a single type of callback that gets a > SharedInvalidationMessage struct pointer (we could pass NULL to signify > a cache reset event). That would avoid having to add another > registration list every time we decide that there's a reason for > callbacks to see another type of inval message. There are a couple of > reasonably near-term reasons why we might want to do this: > > 1. Robert was speculating the other day about wanting to be able to > snoop the inval traffic. Right now, callbacks can only snoop a fairly > small subset of it. Is that true? It appears to me that the events that aren't exposed at all are smgr and relmap invalidations, which don't seem terribly important, and presumably not a majority of the traffic. > 2. In conjunction with what I'm doing with plancache, it struck me that > it might be nice to subdivide relcache inval messages into two types, > one indicating a schema (DDL) change and one that just indicates that > statistics changed; this would allow us to avoid redoing parse analysis > and rewrite for a cached query as a consequence of autovacuum stats > updates. With the current scheme, plancache.c would need to register > two different kinds of callbacks to handle that, or we'd need some other > API change for relcache callbacks so they could distinguish. Would this be enough for us to find a noticeable performance improvement? > A small disadvantage of this is that callbacks would have to know about > struct SharedInvalidationMessage, which right now isn't tremendously > widely known, but that doesn't seem like a fatal objection to me. > In practice that struct definition has been pretty stable. I'm not opposed to trying to create a better/more universal API, but I find that a bit grotty. We've already resorted to some ugly bit-space-preserving hacks in that structure, and I'm not sure we won't have more in the future. In particular, exposing the backend_lo/backend_hi thing seems like a recipe for distributed confusion. Would it be too expensive to expose an opaque struct with accessor functions? Or pass the extracted values as separate arguments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Aug 19, 2011 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. Robert was speculating the other day about wanting to be able to >> snoop the inval traffic. �Right now, callbacks can only snoop a fairly >> small subset of it. > Is that true? It appears to me that the events that aren't exposed at > all are smgr and relmap invalidations, which don't seem terribly > important, and presumably not a majority of the traffic. Well, "important" is in the eye of the beholder here --- if you did need to see one of those, you're flat outta luck. It's also the case that you can only snoop one catcache per registration, so there aren't enough slots available in the fixed-size list to watch all the catcache traffic. >> 2. In conjunction with what I'm doing with plancache, it struck me that >> it might be nice to subdivide relcache inval messages into two types, >> one indicating a schema (DDL) change and one that just indicates that >> statistics changed; this would allow us to avoid redoing parse analysis >> and rewrite for a cached query as a consequence of autovacuum stats >> updates. �With the current scheme, plancache.c would need to register >> two different kinds of callbacks to handle that, or we'd need some other >> API change for relcache callbacks so they could distinguish. > Would this be enough for us to find a noticeable performance improvement? Impossible to say without coding it up and trying it ... >> A small disadvantage of this is that callbacks would have to know about >> struct SharedInvalidationMessage, which right now isn't tremendously >> widely known, but that doesn't seem like a fatal objection to me. >> In practice that struct definition has been pretty stable. > I'm not opposed to trying to create a better/more universal API, but I > find that a bit grotty. We've already resorted to some ugly > bit-space-preserving hacks in that structure, and I'm not sure we > won't have more in the future. In particular, exposing the > backend_lo/backend_hi thing seems like a recipe for distributed > confusion. Would it be too expensive to expose an opaque struct with > accessor functions? Or pass the extracted values as separate > arguments? Well, that's exactly the reasoning that led to the current API. But we've not done well at extending that API to handle newly-added sinval message types, and it's also designed firmly around the notion that individual callbacks only want to see a small part of the traffic. If I wanted to write a callback that saw *all* the traffic, the current API is just not supportive of that. Exposing SharedInvalidationMessage could be going too far in the other direction, but I'm thinking we really ought to do something. regards, tom lane
On Sun, Aug 21, 2011 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Aug 19, 2011 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> 1. Robert was speculating the other day about wanting to be able to >>> snoop the inval traffic. Right now, callbacks can only snoop a fairly >>> small subset of it. > >> Is that true? It appears to me that the events that aren't exposed at >> all are smgr and relmap invalidations, which don't seem terribly >> important, and presumably not a majority of the traffic. > > Well, "important" is in the eye of the beholder here --- if you did need > to see one of those, you're flat outta luck. It's also the case that > you can only snoop one catcache per registration, so there aren't enough > slots available in the fixed-size list to watch all the catcache > traffic. Yeah, I'm not opposed to making this more generic; regardless of whether we have an immediate use case for it, it seems like a pretty good idea. I was just surprised that you described the available portion as a "small subset". The one-catcache-per-registration limitation is an interesting point. I doubt that we want to move the "is this the relevant catcache?" test inside all the callbacks, but we might want to have a special value that means "call me back when there's a change that affects ANY catcache"... or even more generally "call me back when there's a change that affects ANY system catalog, regardless of whether there is an associated catcache or not." sepgsql, for example, really wants to be able to get a callback when pg_seclabel or pg_shseclabel is updated, precisely because it wants to maintain its own special-purpose cache on a catalog that on which we DON'T want to add a catcache. > Exposing SharedInvalidationMessage could be going too far in the other > direction, but I'm thinking we really ought to do something. I think the best option might be to expose it as an opaque struct. In other words, the header file available to other backends would have something like: struct SharedInvalidationMessage; typedef struct SharedInvalidationMessage SharedInvalidationMessage; typedef enum { SIM_CATCACHE, SIM_CATALOG, SIM_RELCACHE, SIM_SMGR, SIM_RELMAP } SIMType; SIMType SIMGetType(SharedInvalidationMessage *); Oid SIMGetDatabase(SharedInvalidationMessage *); BackendId SIMGetBackendId(SharedInvalidationMessage *); /* etc. */ That allows us to do things like change the number of bits we use to store the backend ID (e.g. from the current 24 to 32 or 16) without needing to change the callers. In fact, you could probably even add whole new message types and most callers wouldn't need to care, since the typical caller is going to do something like ... if (SIMGetType(&msg) != SIM_SOMETHING) return right off the bat. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company