Thread: global temporary tables
A couple of recent threads made got me thinking again about the idea of global temporary tables. There seem to be two principal issues: 1. What is a global temporary table? 2. How could we implement that? Despite rereading the "idea: global temp tables" thread from April 2009 in some detail, I was not able to get a clear understanding of (1). What I *think* it is supposed to mean is that the table is a permanent object which is "globally" visible - that is, it's part of some non-temp schema like public or $user and it's column definitions etc. are visible to all backends - and it's not automatically removed on commit, backend exit, etc. - but the *contents* of the table are temporary and backend-local, so that each new backend initially sees it as empty and can then insert, update, and delete data independently of what any other backend does. As to (2), my thought is that perhaps we could implement this by instantiating a separate relfilenode for the relation for each backend which accesses it. relfilenode would be 0 in pg_class, as it is for "mapped" relations, but every time a backend touched the rel, we'd allocate a relfilenode and associated the oid of the temp table to it using some kind of backend-local storage - actually similar to what the relmapper code does, except without the complexity of ever actually having to persist the value; and perhaps using a hash table rather than an array, since the number of mapped rels that a backend can need to deal with is rather more limited than the number of temp tables it might want to use. Thoughts? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > A couple of recent threads made got me thinking again about the idea > of global temporary tables. There seem to be two principal issues: > 1. What is a global temporary table? > 2. How could we implement that? > Despite rereading the "idea: global temp tables" thread from April > 2009 in some detail, I was not able to get a clear understanding of > (1). I believe that the spec's distinction between global and local temp tables has to do with whether they are visible across module boundaries. Since we haven't implemented modules, that distinction is meaningless to us. In the spec, *both* types of temp tables have the property that the definition (schema) of the table is global across all sessions, and only the content of the table is session-local. This arrangement clearly is useful for some applications, but so is our current definition wherein different sessions can have different schemas for the same temp table name. So eventually it'd be good to support both. But the GLOBAL/LOCAL TEMP TABLE distinction is something entirely different. PG's behavior does not correspond to either of those. Your idea of using the relmapper layer to instantiate copies of temp tables is an interesting one. It's only a small piece of the puzzle though. In particular, what you described would result in the table having the same OID in all sessions, even though the relfilenodes are different --- amd since locking is done on the basis of OID, that's probably *not* what we want. It would be much better for performance if the different sessions' versions of the table were independently lockable. I also kind of wonder what is supposed to happen if someone DROPs or ALTERs the temp table definition ... regards, tom lane
On Fri, Apr 23, 2010 at 11:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> A couple of recent threads made got me thinking again about the idea >> of global temporary tables. There seem to be two principal issues: > >> 1. What is a global temporary table? > >> 2. How could we implement that? > >> Despite rereading the "idea: global temp tables" thread from April >> 2009 in some detail, I was not able to get a clear understanding of >> (1). > > I believe that the spec's distinction between global and local temp > tables has to do with whether they are visible across module > boundaries. Since we haven't implemented modules, that distinction > is meaningless to us. In the spec, *both* types of temp tables have > the property that the definition (schema) of the table is global > across all sessions, and only the content of the table is session-local. > > This arrangement clearly is useful for some applications, but so is our > current definition wherein different sessions can have different schemas > for the same temp table name. So eventually it'd be good to support > both. But the GLOBAL/LOCAL TEMP TABLE distinction is something entirely > different. PG's behavior does not correspond to either of those. I don't really care what we call it, although I find the GLOBAL name convenient and descriptive. > Your idea of using the relmapper layer to instantiate copies of temp > tables is an interesting one. It's only a small piece of the puzzle > though. In particular, what you described would result in the table > having the same OID in all sessions, even though the relfilenodes are > different --- amd since locking is done on the basis of OID, that's > probably *not* what we want. It would be much better for performance > if the different sessions' versions of the table were independently > lockable. Well, it depends on what operation we're talking about. For operations that involve only the table contents, yeah, we'd like to lock the versions independently. But for this sort of thing: > I also kind of wonder what is supposed to happen if someone DROPs or > ALTERs the temp table definition ... ...not so much. Here you REALLY want a DROP attempt to acquire an AccessExclusiveLock that will conflict with any outstanding AccessShareLocks. Similarly, you're only going to be able to modify the schema for the relation if it's not otherwise in use. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 23, 2010 at 11:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I also kind of wonder what is supposed to happen if someone DROPs or >> ALTERs the temp table definition ... > ...not so much. Here you REALLY want a DROP attempt to acquire an > AccessExclusiveLock that will conflict with any outstanding > AccessShareLocks. Similarly, you're only going to be able to modify > the schema for the relation if it's not otherwise in use. I think you're presuming the answer to the question. We could also view the desired behavior as being that each session clones the temp table definition at some instant (eg, first use). The approach that you're assuming seems fraught with large downsides: in particular, implementing ALTER TABLE would be a mess. The would-be alterer would need access to the physical copies of all sessions, which throws out not only the assumption that the relmapper entries can be private data, but all of the access optimizations we currently have in the local buffer manager. Not to mention the coding mess of having to repeat the ALTER operation for each of N copies, some of which might disappear while we're trying to do it (or if they don't, we're blocking backends from exiting). I don't even know how you'd do the ALTER over again N times if you only have one set of catalog entries describing the N copies. regards, tom lane
On Fri, Apr 23, 2010 at 11:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Apr 23, 2010 at 11:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I also kind of wonder what is supposed to happen if someone DROPs or >>> ALTERs the temp table definition ... > >> ...not so much. Here you REALLY want a DROP attempt to acquire an >> AccessExclusiveLock that will conflict with any outstanding >> AccessShareLocks. Similarly, you're only going to be able to modify >> the schema for the relation if it's not otherwise in use. > > I think you're presuming the answer to the question. We could also view > the desired behavior as being that each session clones the temp table > definition at some instant (eg, first use). The approach that you're > assuming seems fraught with large downsides: in particular, implementing > ALTER TABLE would be a mess. The would-be alterer would need access to > the physical copies of all sessions, which throws out not only the > assumption that the relmapper entries can be private data, but all of > the access optimizations we currently have in the local buffer manager. I agree, that would be pretty unfortunate, althogh maybe it's the only way to make it work. It's not what I had in mind. I was thinking that the would-be ALTERER could just take an AccessExclusiveLock, but now that I think about it that doesn't work, since a backend could have the table unlocked between transactions but still have private contents in it. :-( > Not to mention the coding mess of having to repeat the ALTER operation > for each of N copies, some of which might disappear while we're trying > to do it (or if they don't, we're blocking backends from exiting). > I don't even know how you'd do the ALTER over again N times if you > only have one set of catalog entries describing the N copies. Well, if you clone the table, that just pushes the problem around. When I run ALTER TABLE on one of these thingamabobs, does it modify my clone? The original? Both? If it modifies my clone, how do we modify the original? If it modifies the original, won't I be rather surprised to find my clone unaffected? If it modifies both, how do we avoid complete havoc if the original has since been modified (perhaps incompatibly, perhaps not) by some other backend doing its own ALTER TABLE? ...Robert
2010/4/24 Robert Haas <robertmhaas@gmail.com>: > A couple of recent threads made got me thinking again about the idea > of global temporary tables. There seem to be two principal issues: > > 1. What is a global temporary table? > > 2. How could we implement that? > > Despite rereading the "idea: global temp tables" thread from April > 2009 in some detail, I was not able to get a clear understanding of > (1). What I *think* it is supposed to mean is that the table is a > permanent object which is "globally" visible - that is, it's part of > some non-temp schema like public or $user and it's column definitions > etc. are visible to all backends - and it's not automatically removed > on commit, backend exit, etc. - but the *contents* of the table are > temporary and backend-local, so that each new backend initially sees > it as empty and can then insert, update, and delete data independently > of what any other backend does. > > As to (2), my thought is that perhaps we could implement this by > instantiating a separate relfilenode for the relation for each backend > which accesses it. relfilenode would be 0 in pg_class, as it is for > "mapped" relations, but every time a backend touched the rel, we'd > allocate a relfilenode and associated the oid of the temp table to it > using some kind of backend-local storage - actually similar to what > the relmapper code does, except without the complexity of ever > actually having to persist the value; and perhaps using a hash table > rather than an array, since the number of mapped rels that a backend > can need to deal with is rather more limited than the number of temp > tables it might want to use. it is good idea. I missing some ideas about statistics, about indexes. Regards Pavel Stehule > > Thoughts? > > ...Robert > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >... > surprised to find my clone unaffected? If it modifies both, how do we > avoid complete havoc if the original has since been modified (perhaps > incompatibly, perhaps not) by some other backend doing its own ALTER > TABLE? Since this is such a thorny problem, and this is a temporary table, why not just disallow ALTER completely for the first pass? - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 201004241201 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkvTFfsACgkQvJuQZxSWSsjrVACePmmNglGi6KoZgYZ7zjUm4gPm o2wAoNYYuiZl1HZXsgiAOQkJzNUmaORm =IijV -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: >> surprised to find my clone unaffected? If it modifies both, how do we >> avoid complete havoc if the original has since been modified (perhaps >> incompatibly, perhaps not) by some other backend doing its own ALTER >> TABLE? > Since this is such a thorny problem, and this is a temporary table, why > not just disallow ALTER completely for the first pass? Usually the way we approach these kinds of problems is that we want to see some plausible outline for how they might be fixed before we move forward with the base feature. IOW, I wouldn't object to not having ALTER in the first release, but if we have no idea how to do ALTER at all I'd be too worried that we were painting ourselves into a corner. Or maybe you can make a case that there's no need to allow ALTER at all, ever. But surely DROP needs to be possible, and that seems to already introduce some of the same issues. regards, tom lane
On Sat, Apr 24, 2010 at 12:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Greg Sabino Mullane" <greg@turnstep.com> writes: >>> surprised to find my clone unaffected? If it modifies both, how do we >>> avoid complete havoc if the original has since been modified (perhaps >>> incompatibly, perhaps not) by some other backend doing its own ALTER >>> TABLE? > >> Since this is such a thorny problem, and this is a temporary table, why >> not just disallow ALTER completely for the first pass? > > Usually the way we approach these kinds of problems is that we want > to see some plausible outline for how they might be fixed before we > move forward with the base feature. IOW, I wouldn't object to not > having ALTER in the first release, but if we have no idea how to do > ALTER at all I'd be too worried that we were painting ourselves into > a corner. > > Or maybe you can make a case that there's no need to allow ALTER at > all, ever. But surely DROP needs to be possible, and that seems to > already introduce some of the same issues. I had the same thought as GSM this morning. More specifically, it seems to me that the problematic cases are precisely those in which you might feel an urge to touch somebody else's local buffers, so I think we should disallow, approximately, those ALTER TABLE cases which require a full-table rewrite. I don't see the problem with DROP. Under the proposed design, it's approximately equivalent to dropping a table that someone else has truncated. You just wait for the necessary lock and then do it. At least AIUI, the use case for this feature is that you want to avoid creating "the same" temporary table over and over again. The schema is fixed and doesn't change much, but you're creating it lots and lots of times in lots and lots of different backends, leading to both management and performance difficulties. If you want to be able to change the schema frequently or in a backend-local way, use the existing temporary table feature. Now, there is ONE problem with DROP, which is that you might orphan some heaps. Of course, that can also happen due to a backend crash. Currently, autovacuum arranges to drop any orphaned temp tables that have passed the wraparound threshold, but even if we were happy with waiting 2 billion transactions to get things cleaned up, the mechanism can't work here because it relies on being able to examine the pg_class row and determine which backend owns it, and where the storage is located. We could possibly set things up so that a running backend will notice if a global temporary table for which it's created a private relfilenode gets dropped, and blow away the backing file. But that doesn't protect against crashes, so I think we're going to need some other garbage collection mechanism, either instead of in addition to asking backends to clean up after themselves. I'm not quite sure what the design of that should look like, though. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > At least AIUI, the use case for this feature is that you want to avoid > creating "the same" temporary table over and over again. The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. Any performance gain from reduced catalog churn would be gravy. Aside from the DROP problem, I think this implementation proposal has one other big shortcoming: what are you going to do about table statistics? In many cases, you really *have* to do an ANALYZE once you've populated a temp table, if you want to get decent plans for it. Where will you put those stats? regards, tom lane
[ forgot to respond to this part ] Robert Haas <robertmhaas@gmail.com> writes: > ... I don't see the problem with DROP. > Under the proposed design, it's approximately equivalent to dropping a > table that someone else has truncated. You just wait for the > necessary lock and then do it. And do *what*? You can remove the catalog entries, but how are you going to make the physical storage of other backends' versions go away? (To say nothing of making them flush their local buffers for it.) If you do remove the catalog entries, won't you be cutting the knees out from under whatever end-of-session cleanup processing might exist in those other backends? The idea of the global table as a template that individual sessions clone working tables from would avoid most of these problems. You rejected it on the grounds that ALTER would be too hard; but if you're blowing off ALTER anyway, that argument seems pretty unimpressive. regards, tom lane
On Sat, Apr 24, 2010 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> At least AIUI, the use case for this feature is that you want to avoid >> creating "the same" temporary table over and over again. > > The context that I've seen it come up in is that people don't want to > clutter their functions with create-it-if-it-doesn't-exist logic, > which you have to have given the current behavior of temp tables. > Any performance gain from reduced catalog churn would be gravy. I think there's a significant contingent on this mailing list who feel that that gravy would be rather tasty and would like very much to enjoy some of it along with their temporary table tetrazzini. > Aside from the DROP problem, I think this implementation proposal > has one other big shortcoming: what are you going to do about > table statistics? In many cases, you really *have* to do an ANALYZE > once you've populated a temp table, if you want to get decent plans > for it. Where will you put those stats? For a first cut, I had thought about ignoring the problem. Now, that may sound stupid, because now if two different backends have very different distributions of data in the table and both do an ANALYZE, one set of statistics will clobber the other set of statistics. On the flip side, for some usage patterns, it might be actually work out to a win. Maybe the data I'm putting in here today is a great deal like the data I put in here yesterday, and planning it with yesterday's statistics doesn't cost enough to be worth a re-ANALYZE. If we don't want to do that, I suppose one option is to create a pg_statistic-like table in the backend's temporary tablespace and put them there; or we could put them into a backend-local hash table. The current setup of pg_statistic is actually somewhat weak for a number of things we might want to do: for example, it might be interesting to gather statistics for the subset of a table for which a particular partial index is predOK. When such an index is available for a particular query, we could use the statistics for that subset of the table instead of the overall statistics for the table, and get better estimates. Or we could even let the user specify predicates which will cause the table to have a different statistical distribution than the table as a whole, and gather statistics for the subset that matches the predicate. One approach would be to make the starelid column able to reference something other than a relation OID, although I don't think that actually helps with the global temp table problem because if we use the real pg_statistic to store the data then we have to arrange to clean it up. ...Robert
> > For a first cut, I had thought about ignoring the problem. Now, that > may sound stupid, because now if two different backends have very > different distributions of data in the table and both do an ANALYZE, > one set of statistics will clobber the other set of statistics. On > the flip side, for some usage patterns, it might be actually work out > to a win. Maybe the data I'm putting in here today is a great deal > like the data I put in here yesterday, and planning it with > yesterday's statistics doesn't cost enough to be worth a re-ANALYZE. > Both variant can be. First time - statistic can be taken from some "original" (can be empty). After ANALYZE the statistic can be individual. Regards Pavel
On Sat, Apr 24, 2010 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ forgot to respond to this part ] > > Robert Haas <robertmhaas@gmail.com> writes: >> ... I don't see the problem with DROP. >> Under the proposed design, it's approximately equivalent to dropping a >> table that someone else has truncated. You just wait for the >> necessary lock and then do it. > > And do *what*? You can remove the catalog entries, but how are you > going to make the physical storage of other backends' versions go away? > (To say nothing of making them flush their local buffers for it.) > If you do remove the catalog entries, won't you be cutting the knees > out from under whatever end-of-session cleanup processing might exist > in those other backends? Well, if I knew for sure what the best way was to solve all of these problems, I'd be posting a finished patch rather than a request for comment on a design. It's not obvious to me that there's a terribly thorny problem in the area you're concerned about, but your concern is making me worry that I'm missing something. Why would the end-of-session processing need the catalog entries? It seems like whatever backend-local data structure we're using to record the relfilenode mappings would be sufficent to nuke the backend storage, and what else needs doing? > The idea of the global table as a template that individual sessions > clone working tables from would avoid most of these problems. You > rejected it on the grounds that ALTER would be too hard; but if you're > blowing off ALTER anyway, that argument seems pretty unimpressive. I don't think that avoiding the catalog churn is something to be dismissed lightly, but I also think that cloning the table is likely to be significantly more difficult from an implementation point of view. Under the implementation I'm proposing, we don't need much that is fundamentally all that new. Global temporary tables can be treated like our existing temp tables for purposes of XLog and bufmgr, but they don't get forced into a temporary namespace. The relation mapping infrastructure provides a pretty good start for using a relfilenode that isn't stored in pg_class. I've already gone through the exercise of finding all the places where we check rd_istemp and changing them to use macros instead (RelationNeedsWAL, IsBackendLocal, etc.) and it's not bad. There's a related project which I think can also leverage much of this same infrastructure: unlogged tables. We've talked about this before, but in short the idea is that an unlogged table behaves like a regular table in all respects except that we never write WAL for it; and we truncate it at shutdown and at startup. Therefore, it doesn't show up on standby servers, and its contents are not persistent across restarts, but performance is improved. It's suitable for things like "the table of currently logged in users", where you don't mind forcing everyone to log in again if the database crashes. (It might even be possible to allow writes to unlogged tables on standby servers, though I'm not feeling that ambitious ATM.) So: - local temp tables exist in a temp namespace, use local buffers, and skip WAL - global temp tables exist in a non-temp namespace, use local buffers, and skip WAL - unlogged tables exist in a non-temp namespace, use shared buffers, and skip WAL - normal tables exist in a non-temp namespace, use shared buffers, and write WAL Thoughts? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > ... Why would the > end-of-session processing need the catalog entries? It seems like > whatever backend-local data structure we're using to record the > relfilenode mappings would be sufficent to nuke the backend storage, > and what else needs doing? Well, if you're intending to have a separate data structure and code path for cleaning up this type of temp table, then maybe you don't need to touch any catalog entries. I'm concerned though about how far the effects will propagate --- things like TRUNCATE, VACUUM FULL, REINDEX will probably have issues with this. Right now they think in terms of writing a new pg_class entry in order to reassociate tables with new relfilenodes. Have you thought much about the previously proposed design, ie keeping catalog entries for temp tables in backend-local temporary catalogs? That would certainly be a lot of work, but I think in the end it might fit in better. This design feels like it's being driven by "hey, we can abuse the relmapper to sort of do what we want", and not by what we really want. regards, tom lane
On Fri, 2010-04-23 at 22:52 -0400, Robert Haas wrote: > Thoughts? Only a requirement: that we design this in a way that will allow temp tables to be used during Hot Standby. I make not other comment. -- Simon Riggs www.2ndQuadrant.com
On Sat, Apr 24, 2010 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... Why would the >> end-of-session processing need the catalog entries? It seems like >> whatever backend-local data structure we're using to record the >> relfilenode mappings would be sufficent to nuke the backend storage, >> and what else needs doing? > > Well, if you're intending to have a separate data structure and code > path for cleaning up this type of temp table, then maybe you don't need > to touch any catalog entries. I'm concerned though about how far the > effects will propagate --- things like TRUNCATE, VACUUM FULL, REINDEX > will probably have issues with this. Right now they think in terms > of writing a new pg_class entry in order to reassociate tables with > new relfilenodes. > > Have you thought much about the previously proposed design, ie keeping > catalog entries for temp tables in backend-local temporary catalogs? > That would certainly be a lot of work, but I think in the end it might > fit in better. This design feels like it's being driven by "hey, > we can abuse the relmapper to sort of do what we want", and not by > what we really want. Well, yes and no. I think there are definitely some good things that can happen if we can see our way to taking a hammer to pg_class and pg_attribute. If we create, e.g. pg_shared_class and pg_shared_attribute, then we can un-nail the catalogs you just nailed to make the authentication process able to work without selecting a database. We can also enable (without guilt) clustering both those catalogs and the database-specific versions of pg_class, since we no longer have to worry about having multiple copies of the row that can get out of sync with each other. And if we further break off pg_temp_class and pg_temp_attribute, then we can also have our existing flavor of temporary tables without worrying about catalog bloat, which would be great. There may be other applications as well. Having said all that, it doesn't actually allow us to implement global temporary tables, because obviously the catalog entries for a global temporary table have to be permanent. Of course, if we didn't have to worry about catalog bloat, the "clone" approach you're proposing would be somewhat more attractive, but I actually think that the synergy is in the other direction: the perfect place to store the catalog entries and statistics for local temporary tables is - in a global temporary table! Note that while a local temporary table can never inherit from a permanent table, it's entirely sensible to let global temporary tables inherit from permanent tables. Different backends will have different views of the overall contents of the parent table, but that's OK, even desirable. ...Robert
On Sat, Apr 24, 2010 at 6:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2010-04-23 at 22:52 -0400, Robert Haas wrote: >> Thoughts? > > Only a requirement: that we design this in a way that will allow temp > tables to be used during Hot Standby. I make not other comment. For so long as local temporary tables put their catalog entries in pg_class, we're not going to be able to use them during Hot Standby. See the email I just sent elsewhere on this thread for a long term roadmap to getting out of that pickle. At least under the implementation I'm proposing here, making global temporary tables usable would be an easier nut to crack, because the catalog entries are a non-issue. There is one major problem, though: assigning a scratch relfilenode to the temporary table requires generating an OID, which we currently have no way to allow on the standby. Upthread I also proposed an implementation for unlogged tables (that is, contents don't survive a server bounce) which wouldn't have that problem either, although I haven't fully thought it through and there may very well be other issues. So in short: I don't think anything we're talking about it would make HS use harder, and some of it might make it easier. But probably some additional engineering effort dedicated specifically to solving the problems unique to HS would still be needed. ...Robert
On Apr 24, 2010, at 12:31 PM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> At least AIUI, the use case for this feature is that you want to avoid >> creating "the same" temporary table over and over again. > > The context that I've seen it come up in is that people don't want to > clutter their functions with create-it-if-it-doesn't-exist logic, > which you have to have given the current behavior of temp tables. > Any performance gain from reduced catalog churn would be gravy. > > Aside from the DROP problem, I think this implementation proposal > has one other big shortcoming: what are you going to do about > table statistics? In many cases, you really *have* to do an ANALYZE > once you've populated a temp table, if you want to get decent plans > for it. Where will you put those stats? One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary.pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefullyaccess to statistics goes through a limited set of functions so that teaching them about the two different tablesisn't hard. As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory underpgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there.That also means that we don't have to come up with different relfilenodes for each backend. On the other hand, somelayer (presumably smgr) would need to understand whether a relation was temporary or not. If we do that, cleanup is easy:you can remove any directories that no longer have a running PID. For forensics you probably only want to do that automaticallywhen a backend starts and discovers it already has a directory, though we should also provide an administratorfunction that will clobber all directories that no longer have backends. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Sat, Apr 24, 2010 at 8:47 PM, Jim Nasby <decibel@decibel.org> wrote: > On Apr 24, 2010, at 12:31 PM, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> At least AIUI, the use case for this feature is that you want to avoid >>> creating "the same" temporary table over and over again. >> >> The context that I've seen it come up in is that people don't want to >> clutter their functions with create-it-if-it-doesn't-exist logic, >> which you have to have given the current behavior of temp tables. >> Any performance gain from reduced catalog churn would be gravy. >> >> Aside from the DROP problem, I think this implementation proposal >> has one other big shortcoming: what are you going to do about >> table statistics? In many cases, you really *have* to do an ANALYZE >> once you've populated a temp table, if you want to get decent plans >> for it. Where will you put those stats? > > One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary.pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefullyaccess to statistics goes through a limited set of functions so that teaching them about the two different tablesisn't hard. Yeah, I don't think that would be too horrible. Part of me feels like you'd want to have the ability to store stats for a global temp table in either one of those tables depending on use-case, but I'm also reluctant to invent a lot of new syntax for a very limited use case. > As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directoryunder pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the filesin there. That also means that we don't have to come up with different relfilenodes for each backend. That would impose a couple of implementation restrictions that don't seem necessary. One, it would imply ignoring reltablespace. Two, it would prohibit (or at least complicate) allowing a backend to CLUSTER or REINDEX its own private copy of the rel. > On the other hand, some layer (presumably smgr) would need to understand whether a relation was temporary or not. If wedo that, cleanup is easy: you can remove any directories that no longer have a running PID. For forensics you probablyonly want to do that automatically when a backend starts and discovers it already has a directory, though we shouldalso provide an administrator function that will clobber all directories that no longer have backends. Unfortunately, I don't see much alternative to making smgr know something about the temp-ness of the relation, though I'm hoping to keep the smgr surgery to an absolute minimum. Maybe what we could do is incorporate the backend ID or PID into the file name when the relation is temp. Then we could scan for and nuke such files pretty easily. Otherwise I can't really think how to make it work. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Unfortunately, I don't see much alternative to making smgr know > something about the temp-ness of the relation, though I'm hoping to > keep the smgr surgery to an absolute minimum. Maybe what we could do > is incorporate the backend ID or PID into the file name when the > relation is temp. Then we could scan for and nuke such files pretty > easily. Otherwise I can't really think how to make it work. I think that could be a really good idea, mainly because it makes post-crash cleanup MUCH safer: you can tell with certainty from the filename that it's a leftover temp table. The notion of zapping files just because we don't see them listed in pg_class has always scared the heck out of me. We already know temp-ness at pretty low levels, like bufmgr vs localbuf. Pushing it all the way down to smgr doesn't seem like a leap; in fact I think it would eliminate a separate isTemp parameter in a lot of places. regards, tom lane
On Apr 24, 2010, at 8:14 PM, Robert Haas wrote: >> One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary.pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefullyaccess to statistics goes through a limited set of functions so that teaching them about the two different tablesisn't hard. > > Yeah, I don't think that would be too horrible. Part of me feels like > you'd want to have the ability to store stats for a global temp table > in either one of those tables depending on use-case, but I'm also > reluctant to invent a lot of new syntax for a very limited use case. Yeah, I'm thinking that's very probably overkill. And if we were going to go to that level, I think it would be far moreuseful to provide an interface to allow manual control over statistics first, so that you can give the optimizer custominformation. >> As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directoryunder pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the filesin there. That also means that we don't have to come up with different relfilenodes for each backend. > > That would impose a couple of implementation restrictions that don't > seem necessary. One, it would imply ignoring reltablespace. Two, it > would prohibit (or at least complicate) allowing a backend to CLUSTER > or REINDEX its own private copy of the rel. Well, the same structure could be imposed underneath a temptablespace. I don't think it matters where you ultimately putit, the goal is just to make sure you can definitively tell that a file was a: temporary and b: what PID it belonged to.That allows for safe cleanup. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Sat, Apr 24, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Unfortunately, I don't see much alternative to making smgr know >> something about the temp-ness of the relation, though I'm hoping to >> keep the smgr surgery to an absolute minimum. Maybe what we could do >> is incorporate the backend ID or PID into the file name when the >> relation is temp. Then we could scan for and nuke such files pretty >> easily. Otherwise I can't really think how to make it work. > > I think that could be a really good idea, mainly because it makes > post-crash cleanup MUCH safer: you can tell with certainty from the > filename that it's a leftover temp table. The notion of zapping files > just because we don't see them listed in pg_class has always scared the > heck out of me. > > We already know temp-ness at pretty low levels, like bufmgr vs localbuf. > Pushing it all the way down to smgr doesn't seem like a leap; in fact > I think it would eliminate a separate isTemp parameter in a lot of places. Eh? I don't see how it's going to do that. ...Robert
On Sat, Apr 24, 2010 at 9:59 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Apr 24, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Unfortunately, I don't see much alternative to making smgr know >>> something about the temp-ness of the relation, though I'm hoping to >>> keep the smgr surgery to an absolute minimum. Maybe what we could do >>> is incorporate the backend ID or PID into the file name when the >>> relation is temp. Then we could scan for and nuke such files pretty >>> easily. Otherwise I can't really think how to make it work. >> >> I think that could be a really good idea, mainly because it makes >> post-crash cleanup MUCH safer: you can tell with certainty from the >> filename that it's a leftover temp table. The notion of zapping files >> just because we don't see them listed in pg_class has always scared the >> heck out of me. >> >> We already know temp-ness at pretty low levels, like bufmgr vs localbuf. >> Pushing it all the way down to smgr doesn't seem like a leap; in fact >> I think it would eliminate a separate isTemp parameter in a lot of places. > > Eh? I don't see how it's going to do that. Oh, maybe I do see. If we pass it to smgropen() and stash it in the SMgrRelation, we don't have to keep supplying it later on, maybe? Will investigate further. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Oh, maybe I do see. If we pass it to smgropen() and stash it in the > SMgrRelation, we don't have to keep supplying it later on, maybe? Right. I'm unsure whether we should push it into the RelFileNode struct itself, but even having it in SMgrRelation ought to cut out a few places where it now has to be passed separately. regards, tom lane
On Sat, Apr 24, 2010 at 10:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Oh, maybe I do see. If we pass it to smgropen() and stash it in the >> SMgrRelation, we don't have to keep supplying it later on, maybe? > > Right. I'm unsure whether we should push it into the RelFileNode > struct itself, but even having it in SMgrRelation ought to cut out > a few places where it now has to be passed separately. Pushing it into the RelFileNode has some advantages in terms of being able to get at the information from everywhere, but one thing that makes me think that's probably not a good decision is that we somtimes WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is a waste, because if we're WAL-logging, it's zero. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Pushing it into the RelFileNode has some advantages in terms of being > able to get at the information from everywhere, but one thing that > makes me think that's probably not a good decision is that we somtimes > WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is > a waste, because if we're WAL-logging, it's zero. Yeah. I think we also use RelFileNode as a hash tag in places, and so adding a bool to it would be problematic for a couple of reasons: possibly uninitialized pad bytes, and uselessly incorporating more bytes into the hash calculation. regards, tom lane
On Sat, 2010-04-24 at 19:01 -0400, Robert Haas wrote: > There is one major problem, though: assigning a > scratch relfilenode to the temporary table requires generating an OID, > which we currently have no way to allow on the standby. Why not have an unlogged counter, which resets each system start, using same datatype as an oid. There's no necessity for the relfilenode to be an actual oid is there? That way we could use it on standbys also. -- Simon Riggs www.2ndQuadrant.com
On Sun, Apr 25, 2010 at 3:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2010-04-24 at 19:01 -0400, Robert Haas wrote: > >> There is one major problem, though: assigning a >> scratch relfilenode to the temporary table requires generating an OID, >> which we currently have no way to allow on the standby. > > Why not have an unlogged counter, which resets each system start, using > same datatype as an oid. There's no necessity for the relfilenode to be > an actual oid is there? > > That way we could use it on standbys also. I don't think that quite works, because the standby might assign a relfilenode number for a global temp table and then the master might subsequently assign the same relfilenode number to a regular table. We might be able to make that not matter, but it's far from obvious to me that there are no gotchas there... ...Robert
On Sat, Apr 24, 2010 at 11:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Pushing it into the RelFileNode has some advantages in terms of being >> able to get at the information from everywhere, but one thing that >> makes me think that's probably not a good decision is that we somtimes >> WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is >> a waste, because if we're WAL-logging, it's zero. > > Yeah. I think we also use RelFileNode as a hash tag in places, and > so adding a bool to it would be problematic for a couple of reasons: > possibly uninitialized pad bytes, and uselessly incorporating more bytes > into the hash calculation. Right. I was thinking about the padding issue, too. I took a crack at adding an isTemp argument to smgropen() and removing it everywhere else, but it turns out this isn't as straightforward as it ought to be because nbtsort.c, rewriteheap.c, and tablecmds.c feel entitled to violate the abstraction layer by passing isTemp = true to smgrwrite() and/or smgrextend() even when the real value is false, under the assumption that the only thing isTemp is doing in those functions is controlling fsync behavior. I think we'll have to replace the isTemp argument to those functions with a boolean whose explicit charter is to do what they're using it for. Removing the isTemp from smgrtruncate() and smgrdounlink() argument looks easy, though. WIP patch that takes it that far is attached. [davidfetter: patch description for PWN is "WIP patch to push isTemp down into the smgr layer"] ...Robert
Attachment
On Sun, 2010-04-25 at 06:50 -0400, Robert Haas wrote: > On Sun, Apr 25, 2010 at 3:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sat, 2010-04-24 at 19:01 -0400, Robert Haas wrote: > > > >> There is one major problem, though: assigning a > >> scratch relfilenode to the temporary table requires generating an OID, > >> which we currently have no way to allow on the standby. > > > > Why not have an unlogged counter, which resets each system start, using > > same datatype as an oid. There's no necessity for the relfilenode to be > > an actual oid is there? > > > > That way we could use it on standbys also. > > I don't think that quite works, because the standby might assign a > relfilenode number for a global temp table and then the master might > subsequently assign the same relfilenode number to a regular table. > We might be able to make that not matter, but it's far from obvious to > me that there are no gotchas there... That sounds fairly simple to solve. All I am saying is please include "working on the standby" as part of your requirement. -- Simon Riggs www.2ndQuadrant.com
On Sun, Apr 25, 2010 at 8:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I don't think that quite works, because the standby might assign a >> relfilenode number for a global temp table and then the master might >> subsequently assign the same relfilenode number to a regular table. >> We might be able to make that not matter, but it's far from obvious to >> me that there are no gotchas there... > > That sounds fairly simple to solve. > > All I am saying is please include "working on the standby" as part of > your requirement. Well, I think I already basically stated my position on this, but let me try to be more clear. I'm not promising to work on any portion of this project AT ALL or to have it done at any particular time. I am specifically not promising to do the extra work required to make it work with Hot Standby, although I am also not saying that I won't. Nobody has offered to fund any portion of this work, so there are no guarantees, full stop. Even if you could successfully convince a critical mass of people on this list that the work should not be committed without adding support for temp tables in Hot Standby mode, the most likely result of that would be that I would give up and not pursue this at all, rather than that I would agree to do that in addition to solving all the problems already discussed. And I don't think you can even get that far, because I don't think too many people here are going to say that we shouldn't add global temporary tables unless we can also make them work with Hot Standby. In all honesty, I would think that you would be happy about my possibly implementing a flavor of temporary tables that would be substantially more feasible to make work with Hot Standby than the kind we have now, rather than (as you seem to be) complaining that I'm not solving the entire problem. ...Robert
On Apr 24, 2010, at 10:02 PM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Pushing it into the RelFileNode has some advantages in terms of being >> able to get at the information from everywhere, but one thing that >> makes me think that's probably not a good decision is that we somtimes >> WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is >> a waste, because if we're WAL-logging, it's zero. > > Yeah. I think we also use RelFileNode as a hash tag in places, and > so adding a bool to it would be problematic for a couple of reasons: > possibly uninitialized pad bytes, and uselessly incorporating more bytes > into the hash calculation. Do we need to hash it that frequently? Do we insert tons of them into WAL? Worrying about those cases smells like prematureoptimization, but admittedly I don't have enough knowledge here... -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <decibel@decibel.org> writes: > On Apr 24, 2010, at 10:02 PM, Tom Lane wrote: >> Yeah. I think we also use RelFileNode as a hash tag in places, and >> so adding a bool to it would be problematic for a couple of reasons: >> possibly uninitialized pad bytes, and uselessly incorporating more bytes >> into the hash calculation. > Do we need to hash it that frequently? Every buffer access, for instance. > Do we insert tons of them into WAL? Yes; the vast majority of WAL records contain one. regards, tom lane
Robert, > (1). What I *think* it is supposed to mean is that the table is a > permanent object which is "globally" visible - that is, it's part of > some non-temp schema like public or $user and it's column definitions > etc. are visible to all backends - and it's not automatically removed > on commit, backend exit, etc. - but the *contents* of the table are > temporary and backend-local, so that each new backend initially sees > it as empty and can then insert, update, and delete data independently > of what any other backend does. While closer to the standard, the above definition is a lot less useful than what I believe a lot of people want, which is a table which is globally visible, but has no durability; that is, it does not get WAL-logged or recovered on restart. Certainly this latter definition would be far more useful to support materialized views. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus wrote: > Robert, > >> (1). What I *think* it is supposed to mean is that the table is a >> permanent object which is "globally" visible - that is, it's part of >> some non-temp schema like public or $user and it's column definitions >> etc. are visible to all backends - and it's not automatically removed >> on commit, backend exit, etc. - but the *contents* of the table are >> temporary and backend-local, so that each new backend initially sees >> it as empty and can then insert, update, and delete data independently >> of what any other backend does. > > While closer to the standard, the above definition is a lot less > useful than what I believe a lot of people want, which is a table > which is globally visible, but has no durability; that is, it does not > get WAL-logged or recovered on restart. Certainly this latter > definition would be far more useful to support materialized views. These are not mutually exclusive features. What you're asking for has value, certainly, but it's not a temp table in the standard's terms (which is a feature that also has value, I believe). cheers andrew
On Sun, Apr 25, 2010 at 5:29 PM, Josh Berkus <josh@agliodbs.com> wrote: > Robert, > >> (1). What I *think* it is supposed to mean is that the table is a >> permanent object which is "globally" visible - that is, it's part of >> some non-temp schema like public or $user and it's column definitions >> etc. are visible to all backends - and it's not automatically removed >> on commit, backend exit, etc. - but the *contents* of the table are >> temporary and backend-local, so that each new backend initially sees >> it as empty and can then insert, update, and delete data independently >> of what any other backend does. > > While closer to the standard, the above definition is a lot less useful than > what I believe a lot of people want, which is a table which is globally > visible, but has no durability; that is, it does not get WAL-logged or > recovered on restart. Certainly this latter definition would be far more > useful to support materialized views. I think it's arguable which one is more useful, but I think a good deal of the infrastructure can be made to serve both purposes, as I further expounded upon here. http://archives.postgresql.org/pgsql-hackers/2010-04/msg01123.php ...Robert
On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote: > And I don't > think you can even get that far, because I don't think too many people > here are going to say that we shouldn't add global temporary tables > unless we can also make them work with Hot Standby. The policy round here for some time has been that when we implement things we make them work fully and seamlessly. I don't see why Hot Standby would be singled out any more than any other feature, say Windows support or tablespaces should be occasionally ignored. People need to get used to the new feature set, just as we had to with HOT, subtransactions, prepared transactions, Gist etc.. That may require a thwack from various people, but the responsibility lies with the new feature implementor, not the person supporting existing code. I fully understand your wish to implement a partial feature with caveats because I have argued that many times myself. But I've come to realise that the best way is to build things so they work cleanly across the board. Other developers can plan projects in the knowledge that they can build directly on firm foundations, not fill in the cracks. In the end this comes down to a choice as developers, do we help each other by doing a full job, or do we leave unexploded bombs for each other through short-termism? Now I understand this better myself, I act differently and accept objections if people think a fuller, more complete design is what is needed. Recent demonstrations of that available, both objecting and accepting. Don't see this as an extra task, just see it as one of the many aspects that will need to be considered when developing it. If you do that it need not be additional work. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote: >> And I don't >> think you can even get that far, because I don't think too many people >> here are going to say that we shouldn't add global temporary tables >> unless we can also make them work with Hot Standby. > The policy round here for some time has been that when we implement > things we make them work fully and seamlessly. I don't see why Hot > Standby would be singled out any more than any other feature, say > Windows support or tablespaces should be occasionally ignored. The current definition of Hot Standby is that it's a *read only* behavior. Not read mostly. What you are proposing is a rather fundamental change in the behavior of HS, and it doesn't seem to me that it should be on the head of anybody else to make it work. IOW: I agree with Robert that this is not an essential part of global temp tables. If it happens to fall out that it works like that, great, but it isn't a requirement. regards, tom lane
On Mon, 2010-04-26 at 15:40 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote: > >> And I don't > >> think you can even get that far, because I don't think too many people > >> here are going to say that we shouldn't add global temporary tables > >> unless we can also make them work with Hot Standby. > > > The policy round here for some time has been that when we implement > > things we make them work fully and seamlessly. I don't see why Hot > > Standby would be singled out any more than any other feature, say > > Windows support or tablespaces should be occasionally ignored. > > The current definition of Hot Standby is that it's a *read only* > behavior. Not read mostly. What you are proposing is a rather > fundamental change in the behavior of HS, and it doesn't seem to me > that it should be on the head of anybody else to make it work. That's a dangerous precedent you just set. -- Simon Riggs www.2ndQuadrant.com
On Mon, Apr 26, 2010 at 3:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sun, 2010-04-25 at 11:35 -0400, Robert Haas wrote: > >> And I don't >> think you can even get that far, because I don't think too many people >> here are going to say that we shouldn't add global temporary tables >> unless we can also make them work with Hot Standby. > > The policy round here for some time has been that when we implement > things we make them work fully and seamlessly. I don't see why Hot > Standby would be singled out any more than any other feature, say > Windows support or tablespaces should be occasionally ignored. > > People need to get used to the new feature set, just as we had to with > HOT, subtransactions, prepared transactions, Gist etc.. That may require > a thwack from various people, but the responsibility lies with the new > feature implementor, not the person supporting existing code. > > I fully understand your wish to implement a partial feature with caveats > because I have argued that many times myself. But I've come to realise > that the best way is to build things so they work cleanly across the > board. Other developers can plan projects in the knowledge that they can > build directly on firm foundations, not fill in the cracks. In the end > this comes down to a choice as developers, do we help each other by > doing a full job, or do we leave unexploded bombs for each other through > short-termism? Now I understand this better myself, I act differently > and accept objections if people think a fuller, more complete design is > what is needed. Recent demonstrations of that available, both objecting > and accepting. > > Don't see this as an extra task, just see it as one of the many aspects > that will need to be considered when developing it. If you do that it > need not be additional work. I think you're looking at this the wrong way. If temporary tables have to work with Hot Standby in order for it to be committable, then we should never have committed Hot Standby in the first place because our current flavor of temporary tables doesn't. Was that an oversight on your part, or a recognition that you can't solve every problem in one commit? ...Robert
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2010-04-26 at 15:40 -0400, Tom Lane wrote: >> The current definition of Hot Standby is that it's a *read only* >> behavior. Not read mostly. What you are proposing is a rather >> fundamental change in the behavior of HS, and it doesn't seem to me >> that it should be on the head of anybody else to make it work. > That's a dangerous precedent you just set. [ shrug... ] If you have near-term solutions for all the *other* problems that would be involved (like what XID to put into rows you insert in the temp tables) then I might think that what you're asking Robert to do is reasonable. Personally I think non-read-only HS is entirely pie in the sky, and therefore it's not reasonable to saddle unrelated development tasks with an expectation that they should work with a behavior that probably won't ever happen. regards, tom lane
Tom Lane escribió: > [ forgot to respond to this part ] > > Robert Haas <robertmhaas@gmail.com> writes: > > ... I don't see the problem with DROP. > > Under the proposed design, it's approximately equivalent to dropping a > > table that someone else has truncated. You just wait for the > > necessary lock and then do it. > > And do *what*? You can remove the catalog entries, but how are you > going to make the physical storage of other backends' versions go away? > (To say nothing of making them flush their local buffers for it.) Maybe we could add a sinval message to that effect. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support