Thread: idea: global temp tables
Hello I am thinking about global temp tables. One possible solution is creating global temporary table like normal table and in planner stage check using this table. When some global temporary table is detected, then real temporary table is created and used in execution plan. It's like: CREATE GLOBAL TEMP TABLE foo(a varchar); -- create global empty table foo SELECT * FROM foo; a) is relevant temp table for foo, use it a) when not, then CREATE TEMP TABLE pg_temp_1.foo(LIKE foo INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES); b) transform origin query to SELECT * FROM pg_temp_1.foo; Ideas? Notes? Objections? regards Pavel Stehule
Pavel Stehule <pavel.stehule@gmail.com> wrote: > I am thinking about global temp tables. These would have some value to us. In case anyone doesn't know, this is a feature in the SQL standard. You have a permanent definition of the schema, but the table is materialized as a temporary table on reference by any connection. I can't speak to the practicality of the proposed implementation techniques. -Kevin
Kevin Grittner wrote: > Pavel Stehule <pavel.stehule@gmail.com> wrote: > > >> I am thinking about global temp tables. >> > > These would have some value to us. > > In case anyone doesn't know, this is a feature in the SQL standard. > You have a permanent definition of the schema, but the table is > materialized as a temporary table on reference by any connection. > > I can't speak to the practicality of the proposed implementation > techniques. > > > Using a global table to achieve schema-persistent temp tables seems like a horrid hack - what would you do if the table used a type other than a standard built-in type? Or perhaps Pavel doesn't really mean "global" as the term is used in Postgres (c.f. the pg_database table)? cheers andrew
Andrew Dunstan <andrew@dunslane.net> wrote: > Or perhaps Pavel doesn't really mean "global" as the term is used > in Postgres (c.f. the pg_database table)? I'd bet that he doesn't. He's taking terminology from the standard, where it means "not limited to one SQL-client module". It just means it is available as long as you are using the connection. -Kevin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 27, 2009, at 4:44 PM, Pavel Stehule wrote: > Hello > > I am thinking about global temp tables. One possible solution is > creating global temporary table like normal table and in planner stage > check using this table. When some global temporary table is detected, > then real temporary table is created and used in execution plan. It's > like: > > CREATE GLOBAL TEMP TABLE foo(a varchar); -- create global empty > table foo > SELECT * FROM foo; > a) is relevant temp table for foo, use it > a) when not, then CREATE TEMP TABLE pg_temp_1.foo(LIKE foo INCLUDING > DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES); > b) transform origin query to SELECT * FROM pg_temp_1.foo; > > Ideas? Notes? Objections? When will postgresql offer "global" temporary tables with data which are shared among sessions? Such tables are great for transient data such as web session data where writing to the WAL is a waste. (On DB startup, the tables would simply be empty.) We're currently stuck with the memcached plugin which makes it impossible to use database constructs such as foreign keys against the temporary data. Cheers, M - -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) iEYEARECAAYFAkn2JHQACgkQqVAj6JpR7t4YRgCdGj8JPJY61PPaK79jnPFXu8c7 vjIAn2F1lA0Nr/2EHVPcYQohWqGjWElK =3zYu - -----END PGP SIGNATURE----- -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) iEYEARECAAYFAkn2JIIACgkQqVAj6JpR7t6IOgCdE0le+MAlcwCYNqEt+w9jt/Y3 Z/sAni8Jm3ndYZSI1pIQLBVtKnBnJ8Ee =VXWF -----END PGP SIGNATURE-----
Hi, Le 27 avr. 09 à 23:32, A.M. a écrit : > When will postgresql offer "global" temporary tables with data which > are shared among sessions? Such tables are great for transient data > such as web session data where writing to the WAL is a waste. (On DB > startup, the tables would simply be empty.) We're currently stuck > with the memcached plugin which makes it impossible to use database > constructs such as foreign keys against the temporary data. If using 8.3 you can SET LOCAL synchronous_commit TO off; for web session management transactions, it'll skip the WAL fsync'ing, which is already a good start. HTH, -- dim
"A.M." <agentm@themactionfaction.com> wrote: > When will postgresql offer "global" temporary tables with data > which are shared among sessions? Well, that would certainly be far different from what the standard calls a temporary table of any flavor. In the standard all temporary tables are restricted to a single connection, and the scope is: GLOBAL: Schema always present. Once materialized, present for as long as the connection exists. CREATED LOCAL: Schema always present. Once materialized, visible only within a particular module. DECLARED LOCAL: No permanent schema. Materialized when declared in a compound statement (standard BEGIN/END; not related to transaction boundaries), and automatically dropped on exit from the compound statement. Current PostgreSQL temporary tables are sort of a hybrid between GLOBAL and DECLARED LOCAL temporary tables from the standard. -Kevin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 27, 2009, at 5:39 PM, Dimitri Fontaine wrote: > Hi, > > Le 27 avr. 09 à 23:32, A.M. a écrit : >> When will postgresql offer "global" temporary tables with data >> which are shared among sessions? Such tables are great for >> transient data such as web session data where writing to the WAL is >> a waste. (On DB startup, the tables would simply be empty.) We're >> currently stuck with the memcached plugin which makes it impossible >> to use database constructs such as foreign keys against the >> temporary data. > > > If using 8.3 you can SET LOCAL synchronous_commit TO off; for web > session management transactions, it'll skip the WAL fsync'ing, which > is already a good start. That's pretty close, but it's not table specific and wouldn't let us to reliably mix transient data changes with real data changes. Cheers, M -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) iEYEARECAAYFAkn2KNcACgkQqVAj6JpR7t4OrQCgpU9K3FzG2LWWyM245vUaop1G ZMIAn379RDewxKUmCsZsWLo8KdWAYGIs =kHl5 -----END PGP SIGNATURE-----
"A.M." <agentm@themactionfaction.com> wrote: > On Apr 27, 2009, at 5:39 PM, Dimitri Fontaine wrote: >> Le 27 avr. 09 à 23:32, A.M. a écrit : >>> When will postgresql offer "global" temporary tables with data >>> which are shared among sessions? Such tables are great for >>> transient data such as web session data where writing to the WAL is >>> a waste. (On DB startup, the tables would simply be empty.) We're >>> currently stuck with the memcached plugin which makes it impossible >>> to use database constructs such as foreign keys against the >>> temporary data. >> >> >> If using 8.3 you can SET LOCAL synchronous_commit TO off; for web >> session management transactions, it'll skip the WAL fsync'ing, which >> is already a good start. > > That's pretty close, but it's not table specific and wouldn't let us > to reliably mix transient data changes with real data changes. Yeah, we have a dozen or so tables we use with the pattern you describe; so the feature you describe would also have some value for us. To avoid confusion, we don't refer to these as "temporary tables", but rather as "permanent work tables". Again, I can't comment on practical issues regarding implementation; but it would be a "nice feature" to add some day. The tricky bit would be to figure out how to ensure that it got cleaned up properly, especially if the PostgreSQL went down or client processes wend down before tidying up. -Kevin
On Apr 27, 2009, at 6:01 PM, Kevin Grittner wrote: > "A.M." <agentm@themactionfaction.com> wrote: >> On Apr 27, 2009, at 5:39 PM, Dimitri Fontaine wrote: >>> Le 27 avr. 09 à 23:32, A.M. a écrit : >>>> When will postgresql offer "global" temporary tables with data >>>> which are shared among sessions? Such tables are great for >>>> transient data such as web session data where writing to the WAL is > >>>> a waste. (On DB startup, the tables would simply be empty.) We're > >>>> currently stuck with the memcached plugin which makes it impossible > >>>> to use database constructs such as foreign keys against the >>>> temporary data. >>> >>> >>> If using 8.3 you can SET LOCAL synchronous_commit TO off; for web >>> session management transactions, it'll skip the WAL fsync'ing, which > >>> is already a good start. >> >> That's pretty close, but it's not table specific and wouldn't let us > >> to reliably mix transient data changes with real data changes. > > Yeah, we have a dozen or so tables we use with the pattern you > describe; so the feature you describe would also have some value for > us. To avoid confusion, we don't refer to these as "temporary > tables", but rather as "permanent work tables". Again, I can't > comment on practical issues regarding implementation; but it would be > a "nice feature" to add some day. The tricky bit would be to figure > out how to ensure that it got cleaned up properly, especially if the > PostgreSQL went down or client processes wend down before tidying up. Actually, for our usage, that's the easiest part- truncate all the "permanent work tables" whenever the db starts. That's really the only sane thing to do anyway. That's what I mean by "transient" data- if it's there, that's great, if not, I can re-generate it (cache) or I don't care because, if the database goes down, then the data is useless on restart anyway. Cheers, M
2009/4/27 Andrew Dunstan <andrew@dunslane.net>: > > > Kevin Grittner wrote: >> >> Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> >>> I am thinking about global temp tables. >>> >> >> These would have some value to us. >> In case anyone doesn't know, this is a feature in the SQL standard. You >> have a permanent definition of the schema, but the table is >> materialized as a temporary table on reference by any connection. >> I can't speak to the practicality of the proposed implementation >> techniques. >> >> > > Using a global table to achieve schema-persistent temp tables seems like a > horrid hack - what would you do if the table used a type other than a > standard built-in type? Where is a problem? - there is normal dependency between types and relation. Of course, ALTER TABLE have to be little bit different - a) should be done, when no table is used, b) should be done only on all temporary tables. But this technique do minimal changes in pg internal structure. The core of problem is structure of pg_class table, that contains possibly shared and not shared fields between global tables. So implementation needs a) significant change of pg_class table OR b) using some transparent table overloading One year ago I though about some memory tables for it. But it is too different and now, when VACUUM should be effective I thing, it is needless. > > Or perhaps Pavel doesn't really mean "global" as the term is used in > Postgres (c.f. the pg_database table)? > no, I though global tables in sense of SQL standard. What do you thing are shared tables (in pg terminology) regards Pavel Stehule > cheers > > andrew >
2009/4/28 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > "A.M." <agentm@themactionfaction.com> wrote: >> On Apr 27, 2009, at 5:39 PM, Dimitri Fontaine wrote: >>> Le 27 avr. 09 à 23:32, A.M. a écrit : >>>> When will postgresql offer "global" temporary tables with data >>>> which are shared among sessions? Such tables are great for >>>> transient data such as web session data where writing to the WAL is > >>>> a waste. (On DB startup, the tables would simply be empty.) We're > >>>> currently stuck with the memcached plugin which makes it impossible > >>>> to use database constructs such as foreign keys against the >>>> temporary data. >>> >>> >>> If using 8.3 you can SET LOCAL synchronous_commit TO off; for web >>> session management transactions, it'll skip the WAL fsync'ing, which > >>> is already a good start. >> >> That's pretty close, but it's not table specific and wouldn't let us > >> to reliably mix transient data changes with real data changes. > > Yeah, we have a dozen or so tables we use with the pattern you > describe; so the feature you describe would also have some value for > us. To avoid confusion, we don't refer to these as "temporary > tables", but rather as "permanent work tables". Again, I can't > comment on practical issues regarding implementation; but it would be > a "nice feature" to add some day. The tricky bit would be to figure > out how to ensure that it got cleaned up properly, especially if the > PostgreSQL went down or client processes wend down before tidying up. For me, GLOBAL TEMP TABLES should significant to increase comfort for developers. That is main reason. reagards Pavel Stehule > > -Kevin > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Pavel Stehule escribió: > Hello > > I am thinking about global temp tables. One possible solution is > creating global temporary table like normal table and in planner stage > check using this table. When some global temporary table is detected, > then real temporary table is created and used in execution plan. It's > like: > > CREATE GLOBAL TEMP TABLE foo(a varchar); -- create global empty table foo > SELECT * FROM foo; > a) is relevant temp table for foo, use it > a) when not, then CREATE TEMP TABLE pg_temp_1.foo(LIKE foo INCLUDING > DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES); > b) transform origin query to SELECT * FROM pg_temp_1.foo; > > Ideas? Notes? Objections? Maybe we could make this work by fiddling with a different smgr -- on it, smgr_sync would be a noop, as would smgr_immedsync, and we could kludge something up to truncate relations during recovery. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Maybe we could make this work by fiddling with a different smgr -- on > it, smgr_sync would be a noop, as would smgr_immedsync, and we could > kludge something up to truncate relations during recovery. Interesting thought but I think it falls down on pg_statistic. One comment I've got is that we have already concluded that the spec's GLOBAL/LOCAL TEMP TABLE distinction is not related to cross-session persistence of the table definitions, but rather to module visibility which is a concept we have not got (yet). Ergo, we should not use the phrase "global temp table" for these things. Not sure what to suggest instead. Perhaps call them "session tables" instead of "temp tables"? regards, tom lane
Perhaps call them "session tables" instead of "temp tables"? regards, tom lane Or "transient table" ... Maybe we can define when such table lose data But in real - there is no need in this feature - databases are made to hold data, not to lose. If an application requires mechanism to store transient session-data, it should create its own session-objects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote: > we have already concluded that the spec's > GLOBAL/LOCAL TEMP TABLE distinction is not related > to cross-session persistence of the table definitions How do you reconcile that conclusion with the following, from ISO/IEC 9075-2:2003 (E), 4.14 Tables: "The definition of a global temporary table or a created local temporary table appears in a schema. In SQL language, the name and the scope of the name of a global temporary table or a created local temporary table are indistinguishable from those of a persistent base table. However, because global temporary table contents are distinct within SQL-sessions, and created local temporary tables are distinct within SQL-client modules within SQL-sessions, the effective <schema name> of the schema in which the global temporary table or the created local temporary table is instantiated is an implementation-dependent <schema name> that may be thought of as having been effectively derived from the <schema name> of the schema in which the global temporary table or created local temporary table is defined and the implementation-dependent SQL- session identifier associated with the SQL-session." There is a distinction between the definition, which "appears in a schema" and for which "the name and the scope ... are indistinguishable from those of a persistent base table", versus the effective schema in which an instance is materialized, which is session and/or module dependent. -Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote: > it could be considered either a global or a local temp table per > spec (without any module support you can't really say which it is). That seems bogus -- without modules it is clearly not LOCAL. What Pavel is requesting exactly matches the spec's definition of a global temporary table, but it does make me uneasy that after accepting the standard syntax, and behaving differently from it (including making no distinction between GLOBAL and LOCAL declarations) we would suddenly go to compliance on GLOBAL declarations but leave LOCAL as is. Maybe too messy to try to improve. -Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote: > GLOBAL/LOCAL TEMP TABLE distinction is not related to cross-session > persistence of the table definitions On a re-read, I think I see your point -- it is the DECLARE LOCAL TEMP TABLE versus CREATE { GLOBAL | LOCAL } TEMP TABLE which determines whether the table definition is persisted. Both forms of CREATE TEMP TABLE should persist the definition if you go by the standard, so you don't want to muddy the waters by complying on one and not the other? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> we have already concluded that the spec's >> GLOBAL/LOCAL TEMP TABLE distinction is not related >> to cross-session persistence of the table definitions > How do you reconcile that conclusion with the following, > from ISO/IEC 9075-2:2003 (E), 4.14 Tables: The point is that what we call "temp tables" are not either global or local temp tables by the spec's definition. If we invent something that behaves as Pavel suggests, then it could be considered either a global or a local temp table per spec (without any module support you can't really say which it is). We're stuck in a terminological problem anyway, but it will get a whole lot worse if we fail to acknowledge that there's more than one property involved here. regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > ... Both forms of CREATE TEMP > TABLE should persist the definition if you go by the standard, so you > don't want to muddy the waters by complying on one and not the other? Right. This goes back to our old principle of trying not to use spec-defined syntax for not-per-spec behavior. We are already behind the eight ball as far as temp tables go, but let's not make it worse by blindly picking some spec-defined syntax without a plan for where we go from here. (I'm assuming that it's reasonably likely that we will want a spec-compatible module feature someday. We'll really have painted ourselves into a corner if we don't think about the issue now.) regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> it could be considered either a global or a local temp table per >> spec (without any module support you can't really say which it is). > That seems bogus -- without modules it is clearly not LOCAL. You could just as easily say it's not GLOBAL. > What > Pavel is requesting exactly matches the spec's definition of a global > temporary table, but it does make me uneasy that after accepting the > standard syntax, and behaving differently from it (including making no > distinction between GLOBAL and LOCAL declarations) we would suddenly > go to compliance on GLOBAL declarations but leave LOCAL as is. Right. What I'm suggesting is that before we mess with this we should have a road map on whether we are going to try to get to spec compliance in this area, and if so how. One thing I just noticed is that the spec does not consider GLOBAL/LOCAL to be optional --- per spec you *must* write one or the other in front of TEMPORARY. So we could adopt the view that omitting this keyword implies our current non-spec behavior (which is far too useful to give up, spec compliance or no) while writing one or the other selects the spec behavior. However, if we're going to do that then we should start throwing warnings for use of the keywords, preferably before the release in which they actually start doing something different. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > One thing I just noticed is that the spec does not consider > GLOBAL/LOCAL to be optional --- per spec you *must* write one or the > other in front of TEMPORARY. Agreed. > So we could adopt the view that omitting this keyword > implies our current non-spec behavior (which is far too useful to > give up, spec compliance or no) while writing one or the other > selects the spec behavior. +1 (+1) > However, if we're going to do that then we should start > throwing warnings for use of the keywords, preferably before the > release in which they actually start doing something different. We might actually want to have a major release which rejects the standard syntax before the release where we implement standard behavior for it. (After, of course, a major release which issues the warning.) When we get to the point of breaking existing code (which is inevitable if we move to compliance here), it's better to break in a clear and predictable way.... Of course, that would mean that implementation would be three releases away (warn, disable syntax, reenable syntax with standard semantics). -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> However, if we're going to do that then we should start >> throwing warnings for use of the keywords, preferably before the >> release in which they actually start doing something different. > We might actually want to have a major release which rejects the > standard syntax before the release where we implement standard > behavior for it. (After, of course, a major release which issues the > warning.) When we get to the point of breaking existing code (which > is inevitable if we move to compliance here), it's better to break in > a clear and predictable way.... I was thinking in terms of throwing a warning in 8.4 and implementing new behavior in 8.5. An extra release only helps if you assume everyone adopts that release at some point. The number of questions we see about multi-version jumps should disabuse people of the notion that everyone does it that way ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I was thinking in terms of throwing a warning in 8.4 and > implementing new behavior in 8.5. If we're on that time frame with it, I guess it's not too early to suggest what we would implement in 8.5. I would suggest that it is clear that what Pavel is requesting is fully compliant with the spec's definition of global temporary tables. Since the essence of "global" in this context is that they are not constrained by modules, the absence module support doesn't seem to create any possible compatibility issue when and if modules are added. I would suggest (with somewhat less confidence) that both created and declared local temporary tables might make sense in the context of whatever procedural languages are supported. In PL/pgSQL, for example, the declaration for a declared local tempoary table would be allowed inside the PL's BEGIN block, in the area where local variables are allowed. Such a table would be visible only within the context of the block (meaning we would probably need to munge the name somehow to support recursion or other functions with a duplicate table name). The temporary table would be materialized at the point where it is declared, and dropped at the END of the block. I not clear on whether a created local temporary table should retain its contents from one invocation of a function to the next. I'm inclined to think it shouldn't -- that the scope for a materialized instance is the same as a declared local tempoarary table; the CREATE just ensures a consistent definition wherever used. Or perhaps it's just a bad idea to attempt to use the LOCAL syntax outside of a proper module at all. The GLOBAL option seems clear; LOCAL seems a bit muddy to me. -Kevin
Pavel Stehule escribió: > But question? > > about MVCC? > Is necessary to use MVCC on pg_statistic and some columns from pg_proc? Right now, we use MVCC everywhere because we have no choice. But there are some things that would be better off not using MVCC; and indeed we use a hack to make things that way in certain, very limited cases (see callers of heap_update_inplace). Years ago I proposed a patch (search for pg_class_nt and/or pg_ntclass) that created a catalog for non-MVCC attributes, with an eye on extending it to more attributes. The use case for those other attributes had fatal flaws so it went nowhere, but perhaps we can give the idea another whirl now. For example I think it would be possible to implement read-only partitions that don't need vacuuming using that idea. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Greg Stark <stark@enterprisedb.com> wrote: > On Wed, Apr 29, 2009 at 7:57 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> But the scaling issues are there already with temp tables. This >> actually makes it better, not worse, because the table can be >> materialized once per session, not once per request. > > Currently you have to issue CREATE TABLE and associated DDL > manually. That makes it clear to the user that they're executing > DDL and should expect it to behave like DDL. > > I don't understand what you mean by a cost once per request. You > only have to create the temporary table on the first request. If you > can't tell which is the first request you only have to test whether > it exists which doesn't incur the consequences that ddl incurs. True. I got myself thinking that without this feature people would be dropping the table after each use, which doesn't have to be the case. > What we're talking about means that when someone issues "SELECT * > FROM cache" they're going to -- unexpected -- be a) turning their > virtual transaction id into a real transaction id b) creating a new > entry in pg_catalog and its indexes c) wal logging the new > pg_catalog entry (including having to fsync at commit time) d) > acquiring an exclusive lock on the new entry. Only if they are making the first reference to the table in that session, and it's only unexpected if they don't know that such a reference to a global temp table can cause the table to materialize. Surely you will grant that someone referencing such a table should know what it is? > There have been posts by people who were bitten by expecting that > they could create temporary work tables for short frequently run > queries who didn't realize that would mean pg_class would have to be > vacuumed every few minutes and that it would slow down every index > lookup for table names. Like I said, I have run into performance problems with temp table creation, especially when write barriers were configured on due to battery failure or OS misconfiguration, and the cost turned out to be almost entirely in the creation of the disk files which support the temp table -- base, toast, indexes, etc. Unless you can fix the big problems, worrying about the stuff we do optimize well will be a drop in the bucket. > I don't see it as friendly to make that the implicit > behaviour for innocent looking dml operations. Perhaps a note in the documentation of global temporary tables could set appropriate expectations? It seems that your whole objection to adding the requested feature hinges on anticipation of particular user expectations. -Kevin
Greg Stark <stark@enterprisedb.com> writes: > I don't understand what you mean by a cost once per request. You only > have to create the temporary table on the first request. If you can't > tell which is the first request you only have to test whether it > exists which doesn't incur the consequences that ddl incurs. This is all based on utterly-unproven assumptions about relative costs. In particular, ISTM an additional network round trip or two associated with testing for/creating a temp table could easily swamp any costs associated with catalog entry creation. Even if it doesn't, creating/deleting a few dozen rows in the system catalogs shouldn't really be something that autovacuum can't deal with. If it were, we'd be hearing a lot more complaints about the *existing* temp table feature being unusable. (And yes, I know it's come up once or twice, but not all that often.) I'm all for eliminating catalog overheads, if we can find a way to do that. I don't think that you get to veto implementation of the feature until we can find a way to optimize it better. The question is not about whether having the optimization would be better than not having it --- it's about whether having the unoptimized feature is better than having no feature at all (which means people have to implement the same behavior by hand, and they'll *still* not get the optimization). regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Any chance that some of these improvements could be applied to temp >> tables created with the PostgreSQL-specific syntax while we're at >> it? > > You mean the Postgres-specific behavior, no? > > Trying to support a table without *any* pre-existing catalog entries > seems even harder than doing it without changing the pre-existing > catalog entries. I can't say that I followed all of Greg's ideas, but it seemed that some of them related to cheaper ways to materialize the body of the temp table, as opposed to updating the system tables. That seemed like it might be orthogonal to the issue of persistent temp table definitions, and perhaps the ideas could help performance of all temp tables, including the PostgreSQL-specific variety. Being out of my depth on the technical issues he was discussing, I can't really do more than pose the question, however.... -Kevin
On Tue, Apr 28, 2009 at 6:18 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I was thinking in terms of throwing a warning in 8.4 and >> implementing new behavior in 8.5. > > If we're on that time frame with it, I guess it's not too early to > suggest what we would implement in 8.5. > > I would suggest that it is clear that what Pavel is requesting is > fully compliant with the spec's definition of global temporary tables. I think there are implicit assumptions that the spec is making about the performance implications of using these temporary tables. It's offering a tool that can reasonably be used in place of views and CTEs in otherwise pure DML. I didn't follow precisely what Pavel was describing but IMHO anything which does any DDL, even implicitly, would be make the feature impractical in many cases where it really ought to work. Anything which causes pg_class to bloat or require special vacuum strategies is just not going to scale. The whole point of having the schema declared in advance and then having each procedure execution have access to a private (or non-private) data store following that predefined schema is to avoid having to execute any catalog changes with all the locking and catalog i/o that DDL requires. -- greg
Greg Stark <stark@enterprisedb.com> wrote: > I've been thinking about Alvaro's idea of a separate smgr. If you > had a single pg_class entry for all sessions but the smgr knew to > store the actual data for it in a session-local file, either in a > session-specific tablespace or using the same mechanism the > temporary files use to direct data then the backend would basically > never know it wasn't a regular table. > > It could still use local buffers but it could use the global > relcache, invalidation, locks, etc. I think we would have to take a > session-level access lock as soon as we put any data in our local > store. And each DDL operation would have to be visited to see > whether it needs special behaviour for locally stored tables. I > suspect most of them will only be able to be handled if there are no > active sessions using the table so they'll basically be no-ops > except for the catalog changes. Any chance that some of these improvements could be applied to temp tables created with the PostgreSQL-specific syntax while we're at it? The need for several tables to be created on disk to materialize a single temp table currently causes performance problems in some contexts. I don't think the updates to the system tables have the same magnitude of performance hit as creating these tables, especially if write barriers are on. -Kevin
On Wed, Apr 29, 2009 at 8:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > This is all based on utterly-unproven assumptions about relative costs. > In particular, ISTM an additional network round trip or two associated > with testing for/creating a temp table could easily swamp any costs > associated with catalog entry creation. Even if it doesn't, > creating/deleting a few dozen rows in the system catalogs shouldn't > really be something that autovacuum can't deal with. I don't see why it's limited to a few dozen rows. Moderately busy web sites these days count their traffic in hundreds of page views per second. > If it were, > we'd be hearing a lot more complaints about the *existing* temp table > feature being unusable. (And yes, I know it's come up once or twice, > but not all that often.) Well my point is that currently you have to type CREATE TEMPORARY TABLE somewhere which at least gives you a clue that maybe you're doing something significant. -- greg
Greg Stark <stark@enterprisedb.com> wrote: > The whole point of having the schema declared in advance and then > having each procedure execution have access to a private (or > non-private) data store following that predefined schema is to avoid > having to execute any catalog changes with all the locking and > catalog i/o that DDL requires. Global temporary tables are not shared between connections. Look back to Pavel's original post; he's looking for a convenience -- a way to have a temporary table materialized for a connection on reference, "INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES". It is not meant to be better in performance than creating a temporary table explicitly on the fly and adding all these things, it's meant to provide a consistent definition which can be materialized and used on demand. Nothing more; nothing less. I understand not everyone has a need for such a thing, but isn't that true of most features in PostgreSQL? If you're saying we can implement the standard's global temporary tables in a way that performs better than current temporary tables, that's cool. That would be a nice "bonus" in addition to the application programmer convenience and having another tick-mark on the standards compliance charts. Do you think that's feasible? If not, the feature would be useful to some with the same performance that temporary tables currently provide. -Kevin
On Wed, Apr 29, 2009 at 7:57 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > But the scaling issues are there already with temp tables. This > actually makes it better, not worse, because the table can be > materialized once per session, not once per request. Currently you have to issue CREATE TABLE and associated DDL manually. That makes it clear to the user that they're executing DDL and should expect it to behave like DDL. I don't understand what you mean by a cost once per request. You only have to create the temporary table on the first request. If you can't tell which is the first request you only have to test whether it exists which doesn't incur the consequences that ddl incurs. What we're talking about means that when someone issues "SELECT * FROM cache" they're going to -- unexpected -- be a) turning their virtual transaction id into a real transaction id b) creating a new entry in pg_catalog and its indexes c) wal logging the new pg_catalog entry (including having to fsync at commit time) d) acquiring an exclusive lock on the new entry. There have been posts by people who were bitten by expecting that they could create temporary work tables for short frequently run queries who didn't realize that would mean pg_class would have to be vacuumed every few minutes and that it would slow down every index lookup for table names. I don't see it as friendly to make that the implicit behaviour for innocent looking dml operations. -- greg
Kevin Grittner wrote: > contexts. I don't think the updates to the system tables have the > same magnitude of performance hit as creating these tables, especially > if write barriers are on. > Wouldn't it be cleaner just to defer creation of real files to support the structures associated with a temp table until it i snecessary to spill the data from the backend's RAM? This data doesn't need to be in shared memory and the tables and data aren't visible to any other session, so can't they run out of RAM most of the time (or all the time if the data in them is short lived)?
Greg Stark <stark@enterprisedb.com> wrote: >> creating/deleting a few dozen rows in the system catalogs shouldn't >> really be something that autovacuum can't deal with. > > I don't see why it's limited to a few dozen rows. Moderately busy > web sites these days count their traffic in hundreds of page views > per second. Sure. We're there. And many of those hits run ten to twenty queries. We'd be insane to get a new connection for each one rather than use a connection pool; and this overhead only occurs once per referenced table per connection. -Kevin
Greg Stark <stark@enterprisedb.com> wrote: > On Wed, Apr 29, 2009 at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I agree with Kevin's objection that you are unfairly raising the >> bar for this feature by demanding a performance improvement to go >> along with a functionality change. > > I think having the feature is making a promise that we can't keep. > Having a feature which meets the letter of the rules but fails to > actually work as users have a right to expect is going to trap > people unaware that they're writing code that works in testing but > will never scale. But the scaling issues are there already with temp tables. This actually makes it better, not worse, because the table can be materialized once per session, not once per request. >> The use-case for this feature is to >> simplify application logic by allowing apps to assume that a temp >> table exists without having to create it at the start of a session. >> That's particularly handy in connection-pooling scenarios, for >> instance. Currently, you have to have some sort of "if exists" >> check, and you pay just as much in catalog thrashing as you would >> if the feature was present without any catalog optimization. > > That seems like a trivial little annoyance. And a potentially large performance booster. >> It would be great to find a way to avoid the catalog thrashing, >> but I completely disagree with a point of view that says we can't >> have this without solving that first. It's an improvement on the >> current state of affairs anyway. > > Not if it promises something we can't deliver. My claim is that the > whole point of having a persistent catalog definition is *precisely* > to avoid the catalog thrashing and that's obvious to users who would > be using this feature. I've re-read the spec on this several times now, and I can't see where that is implied. > I still maintain that this feature is not primarily about programmer > convenience, but rather comes with an expectation that the schema > definition is being given in advance so that the database doesn't > have to incur the costs of issuing the ddl for every session. I think many would be satisfied not to have those costs on every *request* on the connection. -Kevin
2009/4/29 Tom Lane <tgl@sss.pgh.pa.us>: > Greg Stark <stark@enterprisedb.com> writes: >> Well I claim it's not just a nice bonus but is the difference between >> implementing something which falls technically within the standard's >> rules but fails to actually be useful for the standard's intended >> purpose. > > I agree with Kevin's objection that you are unfairly raising the bar > for this feature by demanding a performance improvement to go along > with a functionality change. The use-case for this feature is to > simplify application logic by allowing apps to assume that a temp > table exists without having to create it at the start of a session. > That's particularly handy in connection-pooling scenarios, for instance. > Currently, you have to have some sort of "if exists" check, and you > pay just as much in catalog thrashing as you would if the feature > was present without any catalog optimization. > exactly > It would be great to find a way to avoid the catalog thrashing, > but I completely disagree with a point of view that says we can't > have this without solving that first. It's an improvement on the > current state of affairs anyway. > >> I've been thinking about Alvaro's idea of a separate smgr. If you had >> a single pg_class entry for all sessions but the smgr knew to store >> the actual data for it in a session-local file, either in a >> session-specific tablespace or using the same mechanism the temporary >> files use to direct data then the backend would basically never know >> it wasn't a regular table. > > 1. pg_statistic. > > 2. How you going to have transaction-safe behavior for things like > TRUNCATE, if you don't have an updatable private catalog entry to keep > the current relfilenode in? > >> It could still use local buffers but it could use the global relcache, >> invalidation, locks, etc. > > Locks would be another big problem: if only smgr knows that different > instances of the table are different, then different backends' locks > would conflict, which would be Bad. This might not matter for simple > read/update, but again TRUNCATE is a counterexample of something that > is likely to be needed and should not cause cross-backend conflicts. I though about some techniques for elimination changes in pg_class and pg_statistic. Teoreticly, we could to overwrite some columns (or complete rows) from these tables via stored values in memory. My last (and not sucessfull) prototype was based on some alchymy over syscache. It was wrong way. Maybe we could do some like int get_relpages(oid) { tuple = read_tuple_pg_class(oid); if is_global(tuple) { tuple2 = find_global(oid); if (tuple2 == NULL) { store_global(tuple); return relpages(tuple); } else return relpages(tuple2); } else return relpages(tuple); } But question? about MVCC? Is necessary to use MVCC on pg_statistic and some columns from pg_proc? regards Pavel Stehule > > regards, tom lane >
Greg Stark <stark@enterprisedb.com> writes: > Well I claim it's not just a nice bonus but is the difference between > implementing something which falls technically within the standard's > rules but fails to actually be useful for the standard's intended > purpose. I agree with Kevin's objection that you are unfairly raising the bar for this feature by demanding a performance improvement to go along with a functionality change. The use-case for this feature is to simplify application logic by allowing apps to assume that a temp table exists without having to create it at the start of a session. That's particularly handy in connection-pooling scenarios, for instance. Currently, you have to have some sort of "if exists" check, and you pay just as much in catalog thrashing as you would if the feature was present without any catalog optimization. It would be great to find a way to avoid the catalog thrashing, but I completely disagree with a point of view that says we can't have this without solving that first. It's an improvement on the current state of affairs anyway. > I've been thinking about Alvaro's idea of a separate smgr. If you had > a single pg_class entry for all sessions but the smgr knew to store > the actual data for it in a session-local file, either in a > session-specific tablespace or using the same mechanism the temporary > files use to direct data then the backend would basically never know > it wasn't a regular table. 1. pg_statistic. 2. How you going to have transaction-safe behavior for things like TRUNCATE, if you don't have an updatable private catalog entry to keep the current relfilenode in? > It could still use local buffers but it could use the global relcache, > invalidation, locks, etc. Locks would be another big problem: if only smgr knows that different instances of the table are different, then different backends' locks would conflict, which would be Bad. This might not matter for simple read/update, but again TRUNCATE is a counterexample of something that is likely to be needed and should not cause cross-backend conflicts. regards, tom lane
On Wed, Apr 29, 2009 at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@enterprisedb.com> writes: >> Well I claim it's not just a nice bonus but is the difference between >> implementing something which falls technically within the standard's >> rules but fails to actually be useful for the standard's intended >> purpose. > > I agree with Kevin's objection that you are unfairly raising the bar > for this feature by demanding a performance improvement to go along > with a functionality change. I think having the feature is making a promise that we can't keep. Having a feature which meets the letter of the rules but fails to actually work as users have a right to expect is going to trap people unaware that they're writing code that works in testing but will never scale. > The use-case for this feature is to > simplify application logic by allowing apps to assume that a temp > table exists without having to create it at the start of a session. > That's particularly handy in connection-pooling scenarios, for instance. > Currently, you have to have some sort of "if exists" check, and you > pay just as much in catalog thrashing as you would if the feature > was present without any catalog optimization. That seems like a trivial little annoyance. Spending effort fixing that that in a way that will only have to be replaced if we ever want to support using temporary tables for heavy oltp load is a waste of effort. > It would be great to find a way to avoid the catalog thrashing, > but I completely disagree with a point of view that says we can't > have this without solving that first. It's an improvement on the > current state of affairs anyway. Not if it promises something we can't deliver. My claim is that the whole point of having a persistent catalog definition is *precisely* to avoid the catalog thrashing and that's obvious to users who would be using this feature. This is just like the idea of prepared queries -- nowhere in the standard does it say that prepared queries have to cache the prepared plan and run any faster than non-prepared queries but if we didn't it would be somewhat broken. Now in that case planning queries is relativelyl cheap, but how happy do you think users would be if we supported prepared queries and they built their application around that fact only to discover that every time any session executed a PREPARE it created new records in catalog tables? We would be better off not supporting the command at all so at least users would know not to head down that dead-end path. > 1. pg_statistic. > 2. How you going to have transaction-safe behavior for things like > TRUNCATE hm, truncate is a bit of a special case. I had been thinking that ddl would basically require preventing other backends from using the table at the same time. A lot of people don't realize truncate behaves like ddl but in any case I agree it sure seems like it would be important for temporary tables. I still maintain that this feature is not primarily about programmer convenience, but rather comes with an expectation that the schema definition is being given in advance so that the database doesn't have to incur the costs of issuing the ddl for every session. If we provide the interface but not the expected behaviour it could be worse than not having the interface at all. -- greg
Peter Eisentraut <peter_e@gmx.net> wrote: > There are actually two orthogonal properties at work here: How the > table is visible with respect to modules (LOCAL/GLOBAL) and whether > the table disappears at the end of the session (currently yes, > proposed new behavior optionally no). Pavel's request and the standard always have the temporary table disappearing at the end of the session; the second question for these is whether the *definition* of the table disappears at the end of the session, or remains in the schema to generate an instance for another session on demand. There was a digression about a possible feature other than what Pavel requested, which is not defined in the standard, which I referred to as "permanent work tables", which would be distinguished from normal tables only by the fact that no effort would be made to log the data for recovery, and perhaps the table should be truncated on PostgreSQL startup. If we want to pursue that discussion, it should probably be on a different thread. > On the matter of LOCAL/GLOBAL, I think the correct thing to do is to > reject LOCAL and accept GLOBAL as equivalent to the default. +1 > And then invent a separate setting, say EPHEMERAL/PERSISTENT that > controls the behavior that Pavel requested. While I probably wouldn't have chosen the syntax that the standard did, the SQL spec does specify a syntax to do what Pavel requested. I don't think it's so bad that we should ignore it and invent our own alternative. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Any chance that some of these improvements could be applied to temp > tables created with the PostgreSQL-specific syntax while we're at it? You mean the Postgres-specific behavior, no? Trying to support a table without *any* pre-existing catalog entries seems even harder than doing it without changing the pre-existing catalog entries. It's something we've thought about before and failed to find a decent solution for. I think there was some discussion of allowing pg_class and other relevant tables to implicitly have session-local child tables that would hold session-local catalog entries (and be stored in the local bufmgr); but AFAIR we never worked all the bugs out of the concept. > The need for several tables to be created on disk to materialize a > single temp table currently causes performance problems in some > contexts. I don't think the updates to the system tables have the > same magnitude of performance hit as creating these tables, especially > if write barriers are on. True; it's pure supposition that avoiding the catalog thrashing is actually important in context of everything else that has to happen. regards, tom lane
On Wed, Apr 29, 2009 at 4:24 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > I understand not everyone has a > need for such a thing, but isn't that true of most features in > PostgreSQL? Well I think implementing a feature which only works if it's used at low transaction rates as a convenience wrapper for a single command seems like a lot of work for little gain. > If you're saying we can implement the standard's global temporary > tables in a way that performs better than current temporary tables, > that's cool. That would be a nice "bonus" in addition to the > application programmer convenience and having another tick-mark on the > standards compliance charts. Well I claim it's not just a nice bonus but is the difference between implementing something which falls technically within the standard's rules but fails to actually be useful for the standard's intended purpose. I claim there's an implied expectation that by predefining these schema definitions you eliminate the overhead of DDL creating and dropping tables on the fly. That you can basically explicitly code up algorithms which might be too complex or detailed for an SQL query which get executed as high transaction rate DML using temporary storage just as our SQL engine uses it in materialize nodes and sort nodes. Just to give a real-world example, think of web pages that do paging of moderately complex query results. You often want to calculate the total number of matches and then also return a subset of those matches. Currently the only practical way to do it is to execute the query twice.Creating a temporary table for this purpose would transform your simple read-only DML into a complex DDL operation you can't expose to the masses without a lot of precautions and extra consequences. > Do you think that's feasible? If not, > the feature would be useful to some with the same performance that > temporary tables currently provide. I've been thinking about Alvaro's idea of a separate smgr. If you had a single pg_class entry for all sessions but the smgr knew to store the actual data for it in a session-local file, either in a session-specific tablespace or using the same mechanism the temporary files use to direct data then the backend would basically never know it wasn't a regular table. It could still use local buffers but it could use the global relcache, invalidation, locks, etc. I think we would hav eto take a session-level access lock as soon as we put any data in our local store. And each DDL operation would have to be visited to see whether it needs special behaviour for locally stored tables. I suspect most of them will only be able to be handled if there are no active sessions using the table so they'll basically be no-ops except for the catalog changes. -- greg
The archives for this thread http://archives.postgresql.org//pgsql-hackers/2009-04/threads.php#01329 show a bunch of missing messages. Were they being stored in a temporary table? Anywhere, here is what I had meant to say but only got through to a few ... On Tuesday 28 April 2009 19:38:25 Tom Lane wrote: > One thing I just noticed is that the spec does not consider GLOBAL/LOCAL > to be optional --- per spec you *must* write one or the other in front > of TEMPORARY. So we could adopt the view that omitting this keyword > implies our current non-spec behavior (which is far too useful to give > up, spec compliance or no) while writing one or the other selects the > spec behavior. However, if we're going to do that then we should start > throwing warnings for use of the keywords, preferably before the release > in which they actually start doing something different. There are actually two orthogonal properties at work here: How the table is visible with respect to modules (LOCAL/GLOBAL) and whether the table disappears at the end of the session (currently yes, proposed new behavior optionally no). We should have two separate settings for these. On the matter of LOCAL/GLOBAL, I think the correct thing to do is to reject LOCAL and accept GLOBAL as equivalent to the default. And then invent a separate setting, say EPHEMERAL/PERSISTENT that controls the behavior that Pavel requested.
James Mansion <james@mansionfamily.plus.com> wrote: > Wouldn't it be cleaner just to defer creation of real files to support the > structures associated with a temp table until it i snecessary to spill the > data from the backend's RAM? This data doesn't need to be in > shared memory and the tables and data aren't visible to any other > session, so can't they run out of RAM most of the time (or all the > time if the data in them is short lived)? See this thread for a discussion of the idea: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00342.php -Kevin
On Tuesday 28 April 2009 19:38:25 Tom Lane wrote: > One thing I just noticed is that the spec does not consider GLOBAL/LOCAL > to be optional --- per spec you *must* write one or the other in front > of TEMPORARY. So we could adopt the view that omitting this keyword > implies our current non-spec behavior (which is far too useful to give > up, spec compliance or no) while writing one or the other selects the > spec behavior. However, if we're going to do that then we should start > throwing warnings for use of the keywords, preferably before the release > in which they actually start doing something different. There are actually two orthogonal properties at work here: How the table is visible with respect to modules (LOCAL/GLOBAL) and whether the table disappears at the end of the session (currently yes, proposed new behavior optionally no). We should have two separate settings for these. On the matter of LOCAL/GLOBAL, I think the correct thing to do is to reject LOCAL and accept GLOBAL as equivalent to the default. And then invent a separate setting, say EPHEMERAL/PERSISTENT that controls the behavior that Pavel requested.
Dne 28.04.09 16:59, Alvaro Herrera napsal(a): > Pavel Stehule escribió: > > Maybe we could make this work by fiddling with a different smgr -- on > it, smgr_sync would be a noop, as would smgr_immedsync, and we could > kludge something up to truncate relations during recovery. Maybe set path like <db oid>/pg_temp/<session id>/<table oid> do the work for this kind of table. Zdenek
Peter Eisentraut <peter_e@gmx.net> wrote: > On the matter of LOCAL/GLOBAL, I think the correct thing to do is to > reject LOCAL and accept GLOBAL as equivalent to the default. Oops. Read that too quickly before my first reply. I think that we should, as Tom said, warn on *both* for 8.4, and treat GLOBAL per the standard in 8.5. I don't know whether we can do anything useful with LOCAL for 8.5, or whether it should be rejected at that point. Sorry for my confusion about what you were saying. -Kevin
On Thu, Apr 30, 2009 at 8:36 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > The archives for this thread > > http://archives.postgresql.org//pgsql-hackers/2009-04/threads.php#01329 > > show a bunch of missing messages. Were they being stored in a temporary > table? > > Anywhere, here is what I had meant to say but only got through to a few ... I am getting a lot of messages from this list out of order the last few days. I'll get later messages in a thread and then a day or two later I'll get the messages to which they were replies. ...Robert
Robert Haas wrote: > On Thu, Apr 30, 2009 at 8:36 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > > The archives for this thread > > > > http://archives.postgresql.org//pgsql-hackers/2009-04/threads.php#01329 > > > > show a bunch of missing messages. ?Were they being stored in a temporary > > table? > > > > Anywhere, here is what I had meant to say but only got through to a few ... > > I am getting a lot of messages from this list out of order the last > few days. I'll get later messages in a thread and then a day or two > later I'll get the messages to which they were replies. When we get to a conclusion, would someone please add a TODO entry? Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +