Thread: update on global temporary and unlogged tables
I haven't had a chance to do a great deal of work on this project, but I'm hoping to get back to it at some point and, in the meantime, thought that it might be useful to circulate a few thoughts I've had so far. 1. As common architecture for both features, I think that it might make sense to replace the existing relistemp (bool) field with a relpersistence (char) field. The current tests against rel->rd_istemp can be replaced with macros testing the exact property we really care about in that situation. I'm thinking of RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(). 2. With respect to global temporary tables, I've hit a fairly serious roadblock in the form of relfrozenxid. For a permanent table, VACUUM by any backend can advance relfrozenxid; for a backend-local temporary table, VACUUM by the owning backend can advance relfrozenxid. But for a global temporary table, the proper value for relfrozenxid is the earliest value for any backend that has stored tuples into the table. I'm not immediately sure what to do about this. Tom's previous suggestion of cloning the catalog entries for each backend that tries to access the table is one possible alternative, but I don't like that much for reasons previously discussed. Incidentally, per some previous discussion, I took a look at what Oracle does with respect to DDL on global temp tables, and I gather that they allow it if no session has "bound" (instantiated?) the table. Maybe some infrastructure along those lines would be useful in dealing with the VACUUM problem also; not sure. 3. With respect to unlogged tables, the major obstacle seems to be figuring out a way for these to get automatically truncated at startup time. As with temporary table cleanup in general, the problem here is that you can't do the obvious thing of iterating through pg_class and truncating each unlogged table you find without greatly complicating the startup sequence. However, I think there's a fairly easy way around this problem: truncating a table basically means removing all segments and relation forks other than the first segment of the main fork, and truncating that one to zero length. So we could do it the same way we clean up temporary files - namely, based on the file name - if we made the filenames for unlogged tables distinguishable from those for regular and temporary tables. What I'm thinking about is reserving a backend ID of -2 for this purpose via some suitable constant definition, just as -1 (InvalidBackendId) represents a permanent table in this context. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, Sep 6, 2010 at 10:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > 3. With respect to unlogged tables, the major obstacle seems to be > figuring out a way for these to get automatically truncated at startup > time. As with temporary table cleanup in general, the problem here is > that you can't do the obvious thing of iterating through pg_class and > truncating each unlogged table you find without greatly complicating > the startup sequence. However, I think there's a fairly easy way > around this problem: truncating a table basically means removing all > segments and relation forks other than the first segment of the main > fork, and truncating that one to zero length. So we could do it the > same way we clean up temporary files - namely, based on the file name > - if we made the filenames for unlogged tables distinguishable from > those for regular and temporary tables. What I'm thinking about is > reserving a backend ID of -2 for this purpose via some suitable > constant definition, just as -1 (InvalidBackendId) represents a > permanent table in this context. I tried this approach and got fairly far with it, but ran into a snag in the buffer manager. It's fairly obvious that the buffer manager has to know whether a particular buffer is from an unlogged relation or not; for example, FlushBuffer() should skip the XLOG flush for an unlogged buffer, and must pass the correct backend ID to smgropen(). So my first thought was just to define a bit BM_IS_UNLOGGED, with the obvious interpretation. That's not quite good enough, though, because GetNewRelFileNode doesn't guarantee that the OID chosen is absolutely unique; it just guarantees that it's unique within the space defined by the database ID and backend ID. So it's possible that you could have a logged relation and an unlogged relation with the same value for pg_class.relfilenode, which means that the buffer manager can't store the unlogged status as a random bit someplace, but actually needs to have it as part of the buffer tag (otherwise, a buffer descriptor hash table lookup might find the wrong buffer). You could maybe work around this problem by having GetNewRelFileNode(), when generating an OID for either a permanent or unlogged relation, check that the OID isn't in use for either of those things already, but that requires an extra system call, so it doesn't seem ideal. I'd be willing to go that route if people think it's cheap enough and more desirable for some reason, though. So I went looking for bit-space in the buffer tag and quickly found some. ForkNumber is an enum which I suppose means a 32-bit integer, but we've only got three forks right now and it's hard to imagine more than a handful of additional ones, so what I'm tempted to do is change this from an enum to a 2-byte integer and replace the enum values with #defines. That frees up 2 bytes in the buffer tag which is more than plenty. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 13/09/10 05:49, Robert Haas wrote: > On Mon, Sep 6, 2010 at 10:55 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> 3. With respect to unlogged tables, the major obstacle seems to be >> figuring out a way for these to get automatically truncated at startup >> time. As with temporary table cleanup in general, the problem here is >> that you can't do the obvious thing of iterating through pg_class and >> truncating each unlogged table you find without greatly complicating >> the startup sequence. However, I think there's a fairly easy way >> around this problem: truncating a table basically means removing all >> segments and relation forks other than the first segment of the main >> fork, and truncating that one to zero length. So we could do it the >> same way we clean up temporary files - namely, based on the file name >> - if we made the filenames for unlogged tables distinguishable from >> those for regular and temporary tables. What I'm thinking about is >> reserving a backend ID of -2 for this purpose via some suitable >> constant definition, just as -1 (InvalidBackendId) represents a >> permanent table in this context. > > I tried this approach and got fairly far with it, but ran into a snag > in the buffer manager. It's fairly obvious that the buffer manager > has to know whether a particular buffer is from an unlogged relation > or not; for example, FlushBuffer() should skip the XLOG flush for an > unlogged buffer, and must pass the correct backend ID to smgropen(). > So my first thought was just to define a bit BM_IS_UNLOGGED, with the > obvious interpretation. The LSNs on all pages in an unlogged relation should be zero, and XLogFlush() will do nothing. That's what we rely on at the moment for pages that are not WAL-logged for some reason, I don't think you need any extra flag for that. > So I went looking for bit-space in the buffer tag and quickly found > some. ForkNumber is an enum which I suppose means a 32-bit integer, > but we've only got three forks right now and it's hard to imagine more > than a handful of additional ones, so what I'm tempted to do is change > this from an enum to a 2-byte integer and replace the enum values with > #defines. That frees up 2 bytes in the buffer tag which is more than > plenty. I haven't been following the discussion so I don't understand why you need the extra bits, but no objections to reducing the fork number field. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Sep 13, 2010 at 5:24 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > The LSNs on all pages in an unlogged relation should be zero, and > XLogFlush() will do nothing. That's what we rely on at the moment for pages > that are not WAL-logged for some reason, I don't think you need any extra > flag for that. Ah, interesting. I wonder if I should add a cross-check for that and elog(LOG) if it isn't the case, or some such. >> So I went looking for bit-space in the buffer tag and quickly found >> some. ForkNumber is an enum which I suppose means a 32-bit integer, >> but we've only got three forks right now and it's hard to imagine more >> than a handful of additional ones, so what I'm tempted to do is change >> this from an enum to a 2-byte integer and replace the enum values with >> #defines. That frees up 2 bytes in the buffer tag which is more than >> plenty. > > I haven't been following the discussion so I don't understand why you need > the extra bits, but no objections to reducing the fork number field. Well, the idea is that unlogged table files are named differently than regular table files (I'm thinking, just insert a "u" at the beginning) so that we can truncate them at startup time without needing to look at any catalog entries. So the point is you could have data/base/16384/u124141421 block 173 and data/base/16384/124141421 block 173 in the buffer cache at the same time. The alternative is to try to make sure that you never create both of those files in the first place, but that requires an extra system call per GetNewRelFileNode() - and that could get worse in the future if for some reason we find it advantageous to have more than two "namespaces". (The already-committed RelFileNodeBackend patch already creates one "namespace" per backend for temporary tables plus one for permanent tables; but that doesn't run into this problem because temporary tables use backend-local buffers - i.e. the relevant BackendId can be inferred strictly from which set of buffers we're looking at.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
<div class="gmail_quote">On Mon, Sep 6, 2010 at 7:55 PM, Robert Haas <span dir="ltr"><<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> 3. With respect to unloggedtables, the major obstacle seems to be<br /> figuring out a way for these to get automatically truncated at startup<br/> time.<br /></blockquote></div><br />(please forgive what is probably a stupid question)<br />By truncate domean reduce the table to a very small number (or zero) number of pages? Is there a case to be made for instead somehowmarking all pages as available for reuse? Deallocating and reallocating space can be expensive.<br clear="all" /><br/>-- <br />Rob Wultsch<br /><a href="mailto:wultsch@gmail.com">wultsch@gmail.com</a><br />
On Mon, Sep 13, 2010 at 9:06 PM, Rob Wultsch <wultsch@gmail.com> wrote: > On Mon, Sep 6, 2010 at 7:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> 3. With respect to unlogged tables, the major obstacle seems to be >> figuring out a way for these to get automatically truncated at startup >> time. > > (please forgive what is probably a stupid question) > By truncate do mean reduce the table to a very small number (or zero) number > of pages? Is there a case to be made for instead somehow marking all pages > as available for reuse? Deallocating and reallocating space can be > expensive. I think it's probably actually cheaper to truncate them, but since it only happens at startup time it's probably not worth worrying about.... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company