Thread: GLOBAL vs LOCAL temp tables
I've just been having an informative off-list conversation with Mike Sykes. As he pointed out in a message that Marc forwarded to the list http://archives.postgresql.org/pgsql-hackers/2003-04/msg00411.php we shouldn't feel bad about the fact that our temp table implementation doesn't conform to the standard's semantics for temp tables, because almost no one else does it the spec's way either. Oracle and DB2, to name a couple of big players, do it effectively the same way we do. But he also points out that we are confused about the difference between GLOBAL and LOCAL temporary tables. In the spec, this distinction does *not* mean cross-session vs session-private temp tables, as we wrote in the documentation. In fact, there are no cross-session temp tables at all in SQL92. GLOBAL means there is one instance per session, while LOCAL means there is one instance per module invocation (which is thus necessarily within a session). The text of the spec is clear: ... Global and created local temporary tables are effectively materialized only when referenced in an SQL-session. Every <module> in every SQL-session that references a created local temporary table causes a distinctinstance of that created local temporary table to be materialized. That is, the contents of a globaltemporary table or a created local temporary table cannot be shared between SQL-sessions. In addition, the contentsof a cre- ated local temporary table cannot be shared between <module>s of a single SQL-session. Since we don't have modules, the distinction between GLOBAL and LOCAL temp tables is meaningless for us. However, if we were to someday implement modules, we would probably expect that the existing flavor of temp tables would remain globally visible throughout each session. That is, the temp tables we have more nearly approximate the spec's GLOBAL temp tables than LOCAL temp tables. As Mike pointed out in the message referenced above, Oracle's and DB2's Postgres-equivalent syntax uses GLOBAL not LOCAL to describe temp tables. So it now seems clear to me that we are in error to reject CREATE GLOBAL TEMP TABLE; we ought to accept that. What I am wondering now is if we should flip the logic to reject CREATE LOCAL TEMP TABLE? Or should we just silently accept both? I'm leaning towards the latter, on the grounds of backward compatibility. regards, tom lane
Tom Lane wrote: > So it now seems clear to me that we are in error to reject CREATE GLOBAL > TEMP TABLE; we ought to accept that. > > What I am wondering now is if we should flip the logic to reject CREATE > LOCAL TEMP TABLE? Or should we just silently accept both? I'm leaning > towards the latter, on the grounds of backward compatibility. Well, since we don't support modules, I think we should allow LOCAL. If we had modules, we should reject it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> What I am wondering now is if we should flip the logic to reject CREATE >> LOCAL TEMP TABLE? Or should we just silently accept both? I'm leaning >> towards the latter, on the grounds of backward compatibility. > Well, since we don't support modules, I think we should allow LOCAL. If > we had modules, we should reject it. Huh? If we had modules, we'd probably actually implement it. If you want to look ahead that far, the question is whether rejecting LOCAL or treating it as a noise word, today, will provide the easiest update path to full support for module-LOCAL temp tables. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> What I am wondering now is if we should flip the logic to reject CREATE > >> LOCAL TEMP TABLE? Or should we just silently accept both? I'm leaning > >> towards the latter, on the grounds of backward compatibility. > > > Well, since we don't support modules, I think we should allow LOCAL. If > > we had modules, we should reject it. > > Huh? If we had modules, we'd probably actually implement it. > > If you want to look ahead that far, the question is whether rejecting > LOCAL or treating it as a noise word, today, will provide the easiest > update path to full support for module-LOCAL temp tables. Seems so. I was saying we would remove LOCAL _only_ if we had modules and didn't support LOCAL for them. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Seems so. I was saying we would remove LOCAL _only_ if we had modules > and didn't support LOCAL for them. Okay. For now, I think backwards compatibility is a sufficiently good reason to accept LOCAL as a noise word, anyway. regards, tom lane
The real problem with current temp tables is the implementation. I see very quick growth of system catalogs with heavy use of temp tables(some hundred mega bytes per week on a busy system for example). To fix the system catalogs, we have to stop postmaster and have to do reindex. This is truly a pain. -- Tatsuo Ishii
On Wed, Apr 16, 2003 at 10:09:38AM +0900, Tatsuo Ishii wrote: > The real problem with current temp tables is the implementation. I see > very quick growth of system catalogs with heavy use of temp > tables(some hundred mega bytes per week on a busy system for > example). To fix the system catalogs, we have to stop postmaster and > have to do reindex. This is truly a pain. This is fixed in 7.4 already. It wasn't a problem with temp tables, but with btree indexes. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Endurecerse, pero jamas perder la ternura" (E. Guevara)
Alvaro Herrera wrote: > On Wed, Apr 16, 2003 at 10:09:38AM +0900, Tatsuo Ishii wrote: > > The real problem with current temp tables is the implementation. I see > > very quick growth of system catalogs with heavy use of temp > > tables(some hundred mega bytes per week on a busy system for > > example). To fix the system catalogs, we have to stop postmaster and > > have to do reindex. This is truly a pain. > > This is fixed in 7.4 already. It wasn't a problem with temp tables, but > with btree indexes. Yes, it is fixed partly, but I want to point out that the fix somewhat asymetric. For example, it would be nice to tell people that they should either use VACUUM several times a day _or_ run VACUUM FULL nightly. The problem with this simplification is indexes --- VACUUM records free indx pages, while VACUUM FULL doesn't do anything with empty index pages. Is there anything we can do to improve this situation? Should VACUUM FULL record free index pages? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, Apr 15, 2003 at 11:17:59PM -0400, Bruce Momjian wrote: > Alvaro Herrera wrote: > > This is fixed in 7.4 already. It wasn't a problem with temp tables, but > > with btree indexes. > > Yes, it is fixed partly, but I want to point out that the fix somewhat > asymetric. > > For example, it would be nice to tell people that they should either use > VACUUM several times a day _or_ run VACUUM FULL nightly. The problem > with this simplification is indexes --- VACUUM records free indx pages, > while VACUUM FULL doesn't do anything with empty index pages. > > Is there anything we can do to improve this situation? Should VACUUM > FULL record free index pages? Maybe VACUUM FULL could reorder index pages and truncate to recover disk space. But this is very costly, so there could be a "VACUUM REALLY FULL" or something :-) I think VACUUM FULL should at least record free index pages just like VACUUM. I thought it did that. Anyway I think the "right" solution would be to integrate the vacuum daemon in the backend. Isn't this being worked on? Maybe this daemon could not only vacuum and analyze automatically when needed, but also reorder btree pages and truncate the index file when the server load is low. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Entristecido, Wutra echa a Freyr a rodar y a nosotros al mar" (cancion de Las Barreras)
> On Wed, Apr 16, 2003 at 10:09:38AM +0900, Tatsuo Ishii wrote: > > The real problem with current temp tables is the implementation. I see > > very quick growth of system catalogs with heavy use of temp > > tables(some hundred mega bytes per week on a busy system for > > example). To fix the system catalogs, we have to stop postmaster and > > have to do reindex. This is truly a pain. > > This is fixed in 7.4 already. It wasn't a problem with temp tables, but > with btree indexes. I thought Tom has not finished his work yet, no? -- Tatsuo Ishii
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> This is fixed in 7.4 already. It wasn't a problem with temp tables, but >> with btree indexes. > Yes, it is fixed partly, but I want to point out that the fix somewhat > asymetric. Have you actually run any experiments to prove that the current implementation has a problem? regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Maybe VACUUM FULL could reorder index pages and truncate to recover disk > space. But this is very costly, so there could be a "VACUUM REALLY > FULL" or something :-) At that point I would think REINDEX would be a better answer. regards, tom lane
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > The real problem with current temp tables is the implementation. I see > very quick growth of system catalogs with heavy use of temp > tables(some hundred mega bytes per week on a busy system for > example). To fix the system catalogs, we have to stop postmaster and > have to do reindex. This is truly a pain. I believe the btree compaction logic in CVS tip will fix this. It would be nice to see in-the-field proof though. Don't suppose you want to run a test system with CVS tip? regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> This is fixed in 7.4 already. It wasn't a problem with temp tables, but > >> with btree indexes. > > > Yes, it is fixed partly, but I want to point out that the fix somewhat > > asymetric. > > Have you actually run any experiments to prove that the current > implementation has a problem? I am asking more from a theoretical perspective --- can we say VACUUM regularly or VACUUM FULL are the same in terms of index recovery, or at least as similar as FULL/non-FULL are? I don't remember the btree index compaction fix in CVS --- I just remember the recording of index free space by VACUUM --- did I forget something? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I am asking more from a theoretical perspective --- can we say VACUUM > regularly or VACUUM FULL are the same in terms of index recovery, or at > least as similar as FULL/non-FULL are? See the comments in nbtree.c's btvacuumcleanup(). FULL is able to recycle empty pages faster than non-FULL, since it knows there can be no other transactions with open indexscans. So a freshly emptied index page can be added to the FSM freelist immediately, whereas in the non-FULL case it will need to wait till the next VACUUM (possibly even longer if you have long-running transactions). Also, VACUUM FULL will truncate off any free pages at the end of the index, though I doubt this is very effective since it won't move data across pages. As I commented to Alvaro, I don't really see a need for an intermediate level of cleanup between what VACUUM FULL does now and REINDEX. Moving data in an index is slow, would certainly require exclusive lock, and helps to degrade the physical ordering of the index. REINDEX gives you a nice new freshly-sorted index and would probably be what you'd want if you were going to lock down the index for a long period anyway. > I don't remember the btree index > compaction fix in CVS --- I just remember the recording of index free > space by VACUUM --- did I forget something? It's in there. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > The real problem with current temp tables is the implementation. I see > > very quick growth of system catalogs with heavy use of temp > > tables(some hundred mega bytes per week on a busy system for > > example). To fix the system catalogs, we have to stop postmaster and > > have to do reindex. This is truly a pain. > > I believe the btree compaction logic in CVS tip will fix this. > It would be nice to see in-the-field proof though. Don't suppose you > want to run a test system with CVS tip? No problem with the testing. Let me report later. BTW, do you have any document for the btree compaction logic you have implemented? I see your proposal in the mailing list, but not sure about your actual implementaion... -- Tatsuo Ishii
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I am asking more from a theoretical perspective --- can we say VACUUM > > regularly or VACUUM FULL are the same in terms of index recovery, or at > > least as similar as FULL/non-FULL are? > > See the comments in nbtree.c's btvacuumcleanup(). FULL is able to > recycle empty pages faster than non-FULL, since it knows there can be no > other transactions with open indexscans. So a freshly emptied index > page can be added to the FSM freelist immediately, whereas in the > non-FULL case it will need to wait till the next VACUUM (possibly even > longer if you have long-running transactions). Also, VACUUM FULL will > truncate off any free pages at the end of the index, though I doubt this > is very effective since it won't move data across pages. > > As I commented to Alvaro, I don't really see a need for an intermediate > level of cleanup between what VACUUM FULL does now and REINDEX. Moving > data in an index is slow, would certainly require exclusive lock, and > helps to degrade the physical ordering of the index. REINDEX gives you > a nice new freshly-sorted index and would probably be what you'd want > if you were going to lock down the index for a long period anyway. Hmm...could VACUUM FULL then take an additional option, REINDEX (not sure if calling it that will be possible if it'll cause ambiguities in the parser), so that you don't have to do REINDEX separately? The only reason I mention this is that VACUUM FULL doesn't require an argument and will operate on an entire database in that situation, and doesn't require that you be in standalone mode, whereas REINDEX DATABASE must operate in standalone mode and other REINDEX modes require a specific table or index as arguments. At least, that is the case under 7.3.x. Ignore this if REINDEX can reindex a database without being in standalone mode in 7.4... The situations where you'd do a VACUUM FULL are the same situations in which you're likely to want to REINDEX as well, so it seems reasonable to make a REINDEX operation an option to VACUUM FULL, especially since VACUUM FULL acquires an exclusive lock on the table (so the indexes associated with that table won't be in use during the VACUUM -- a perfect time to recreate them). In fact, since VACUUM FULL is scanning the entire heap, rebuilding the indexes while the VACUUM FULL scan is going seems like it would be a significant performance win over doing separate VACUUM FULLs and REINDEXes. On a slightly different note, from what I've seen of the source, REINDEX TABLE acquires an access exclusive lock on the table, thus preventing concurrent SELECTs against the table while REINDEX TABLE is running. But shouldn't those SELECTs be allowed until the newly-built index is ready for operation (at which point the regular exclusive lock on the table could be upgraded to a shared exclusive lock)? If the DBA doesn't want anyone to be doing SELECTs against the table he can lock the table in access exclusive mode himself and then issue the REINDEX TABLE from within that same transaction, right? I mean, if REINDEX TABLE is supposed to be a regularly scheduled maintenance command, then shouldn't its impact on normal operations be as minimal as possible? Or am I missing something fundamentally important? Searching the archives didn't prove very enlightening... -- Kevin Brown kevin@sysexperts.com
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > BTW, do you have any document for the btree compaction logic you have > implemented? There's some stuff about it in src/backend/access/nbtree/README, but it's at the implementor's-guide level. Not sure what would be useful to say in the user documentation. regards, tom lane
On Wed, Apr 16, 2003 at 06:42:36PM -0700, Kevin Brown wrote: > Hmm...could VACUUM FULL then take an additional option, REINDEX (not > sure if calling it that will be possible if it'll cause ambiguities in > the parser), so that you don't have to do REINDEX separately? Actually, the need to REINDEX should be pretty minimal. The command was originally invented to recover from corrupted indexes, but was used as a regular maintenance because of the ever-growing indexes problem. The problem is mostly gone now, and REINDEX is again only meant to the corrupted indexes scenario. VACUUM generally won't free disk space, but it will mark pages free so subsequent index growth will use them. Periodic VACUUM usage should be enough to keep indexes in control. > Ignore this if REINDEX can reindex a database without being in > standalone mode in 7.4... You should not need to use REINDEX anymore. > On a slightly different note, from what I've seen of the source, > REINDEX TABLE acquires an access exclusive lock on the table, thus > preventing concurrent SELECTs against the table while REINDEX TABLE is > running. Yeah, it's difficult to recreate an index keeping concurrency. I had originally meant to implement this (concurrent index rebuild), but got stuck in the freelist thing, and now concurrent index rebuild is not needed as much. Because of this I selected a different project that is probably more useful. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Alvaro Herrera wrote: > On Wed, Apr 16, 2003 at 06:42:36PM -0700, Kevin Brown wrote: > > > Hmm...could VACUUM FULL then take an additional option, REINDEX (not > > sure if calling it that will be possible if it'll cause ambiguities in > > the parser), so that you don't have to do REINDEX separately? > > Actually, the need to REINDEX should be pretty minimal. The command was > originally invented to recover from corrupted indexes, but was used as > a regular maintenance because of the ever-growing indexes problem. The > problem is mostly gone now, and REINDEX is again only meant to the > corrupted indexes scenario. But as Tom pointed out, if you delete a bunch of data from a table then insert a fresh set of data, but don't end up inserting much data with roughly the same keys that were in the original batch of data, you'll get a lot of empty areas in your index that are unused. VACUUM marks them as being available for reuse, of course, but that doesn't help you unless you insert data containing values that are appropriate to the unused areas. An example would be an index on a serial column, where a periodic delete of old data happens. The values in the serial column are never reused, so their locations in the index never get reused, either, right? So the index on that serial column grows without bound. I imagine that situation is actually rather common. Unless that behavior has changed, you will need to reindex on a periodic basis to keep your index from growing indefinitely, won't you? > VACUUM generally won't free disk space, but it will mark pages free so > subsequent index growth will use them. Periodic VACUUM usage should be > enough to keep indexes in control. Even in situations as I describe above? > > On a slightly different note, from what I've seen of the source, > > REINDEX TABLE acquires an access exclusive lock on the table, thus > > preventing concurrent SELECTs against the table while REINDEX TABLE is > > running. > > Yeah, it's difficult to recreate an index keeping concurrency. I don't quite understand this. I'm certainly not advocating allowing writes while reindexing! The data in the table isn't changing at all because you've acquired an exclusive lock on the table (which permits reads but not writes) -- but not a shared exclusive lock (which would prevent reads). Why would you need to prevent reads of the data in the table or the old index, aside from performance or corruption considerations (which is an evaluation that belongs in the hands the DBA)? I can certainly see the need to prevent reads (and thus acquire a shared exclusive lock) once the new index is built and needs to be swapped in for the old one, but not before then. > I had originally meant to implement this (concurrent index rebuild), > but got stuck in the freelist thing, and now concurrent index > rebuild is not needed as much. Because of this I selected a > different project that is probably more useful. I can certainly understand that. :-) I'm just wondering what would happen if you changed the code to acquire a standard exclusive lock on the table instead of a shared exclusive one, and added another line to acquire a shared exclusive lock once the new index is built... -- Kevin Brown kevin@sysexperts.com
Kevin Brown <kevin@sysexperts.com> writes: > But as Tom pointed out, if you delete a bunch of data from a table > then insert a fresh set of data, but don't end up inserting much data > with roughly the same keys that were in the original batch of data, > you'll get a lot of empty areas in your index that are unused. VACUUM > marks them as being available for reuse, of course, but that doesn't > help you unless you insert data containing values that are appropriate > to the unused areas. No, you misunderstood. That is the problem in existing releases --- but in CVS tip, VACUUM can actually remove unused sections from the b-tree and make that space available for re-use in other key ranges. regards, tom lane
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > But as Tom pointed out, if you delete a bunch of data from a table > > then insert a fresh set of data, but don't end up inserting much data > > with roughly the same keys that were in the original batch of data, > > you'll get a lot of empty areas in your index that are unused. VACUUM > > marks them as being available for reuse, of course, but that doesn't > > help you unless you insert data containing values that are appropriate > > to the unused areas. > > No, you misunderstood. That is the problem in existing releases --- but > in CVS tip, VACUUM can actually remove unused sections from the b-tree > and make that space available for re-use in other key ranges. Ah, okay. That's quite a feat, actually. 7.4 is going to be one sweet release... -- Kevin Brown kevin@sysexperts.com