Thread: [WIP] The relminxid addition, try 3
This patch is not ready for application, but read below because I need some input. Here is a reworked relminxid patch. I added XLOG operations for heap unfreeze and database unfreeze. The latter happens when someone connects to a frozen database. The former happens when a LockRelation() is called on a frozen relation, and the lock is stronger than AccessShare. A database is only frozen when somebody calls VACUUM FREEZE in a standalone backend. A table is frozen when somebody calls VACUUM FREEZE on it, which acquires ExclusiveLock. I'm not too sure about the XLOG routines -- I don't understand very well the business about attaching the changes to a buffer; I thought at first that since all the changes go to a tuple, they all belong to the buffer, so I assigned a single XLogRecData struct with all the info and the buffer containing the tuple; but then on replay, I got "PANIC: invalid xlog record length 0" So I went to read the code for that case and noticed that it said that the test "is somewhat fishy" but none of the callers used that case currently. So I decided to review what others routines are doing. I thought that heap_insert should be doing more or less the same that heap_unfreeze, since all the changes in the tuple go into the buffer, right? But for some reason it doesn't. So I just added a second XLogRecData, without any data but attached to the buffer, and I removed the reference to the buffer in the first XLogRecData. This is probably wrong, but I'd like to know why and what's the correct way to do it :-) The replay routines also appear to work, but I'm not too sure about them -- backup blocks and stuff, I'm not sure what is really happening. I appreciate any advice. The patch is not complete because I need to fiddle with relvacuumxid so that it follows the same behavior as relminxid. That comes next. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > I'm not too sure about the XLOG routines -- I don't understand very well > the business about attaching the changes to a buffer; I thought at first > that since all the changes go to a tuple, they all belong to the buffer, > so I assigned a single XLogRecData struct with all the info and the > buffer containing the tuple; but then on replay, I got "PANIC: invalid > xlog record length 0" Generally you want an xlog record to consist of some fixed overhead plus attached data. The attached data is the part that should link to the buffer (normally it's data that is/will be actually stored into that buffer). The fixed overhead isn't in the buffer and doesn't link. But why do you need your own xlogging at all? Shouldn't these actions be perfectly ordinary updates of the relevant catalog tuples? regards, tom lane
Tom Lane wrote: > But why do you need your own xlogging at all? Shouldn't these actions > be perfectly ordinary updates of the relevant catalog tuples? The XLog entry can be smaller, AFAICT (we need to store the whole new tuple in a heap_update, right?). If that's not a worthy goal I'll gladly rewrite this piece of code. Ah, there's another reason, and it's that I'm rewriting the tuple in place, not calling heap_update. I'm not sure if I can reuse log_heap_update for this purpose -- I'll take a look. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Ah, there's another reason, and it's that I'm rewriting the tuple in > place, not calling heap_update. Is that really a good idea, as compared to using heap_update? (Now, if you're combining this with the very grotty relpages/reltuples update code, then I'm all for making that xlog properly --- we've gotten away without it so far but it really should xlog the change.) regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Ah, there's another reason, and it's that I'm rewriting the tuple in > > place, not calling heap_update. > > Is that really a good idea, as compared to using heap_update? Not sure -- we would leave dead tuples after the VACUUM is finished if we used heap_update, no? ISTM it's undesirable. > (Now, if you're combining this with the very grotty relpages/reltuples > update code, then I'm all for making that xlog properly --- we've gotten > away without it so far but it really should xlog the change.) I hadn't done that, but I'll see what I can do. Notice however that I'm doing this in both pg_class _and_ pg_database. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> (Now, if you're combining this with the very grotty relpages/reltuples >> update code, then I'm all for making that xlog properly --- we've gotten >> away without it so far but it really should xlog the change.) > I hadn't done that, but I'll see what I can do. Notice however that I'm > doing this in both pg_class _and_ pg_database. BTW, one thing I was looking at last week was whether we couldn't refactor the relpages/reltuples updates to be done in a cleaner place. Right now UpdateStats is called (for indexes) directly from the index AM, which sucks from a modularity point of view, and what's worse it forces two successive updates of the same tuple during CREATE INDEX. We should reorganize things so this is done once at the outer level. It'd require some change of the ambuild() API, but considering we're hacking several other aspects of the AM API in this development cycle, that doesn't bother me. regards, tom lane
Tom Lane wrote: > BTW, one thing I was looking at last week was whether we couldn't > refactor the relpages/reltuples updates to be done in a cleaner place. > Right now UpdateStats is called (for indexes) directly from the index > AM, which sucks from a modularity point of view, and what's worse it > forces two successive updates of the same tuple during CREATE INDEX. > We should reorganize things so this is done once at the outer level. > It'd require some change of the ambuild() API, but considering we're > hacking several other aspects of the AM API in this development cycle, > that doesn't bother me. I'm rather stuck in Mammoth Replicator currently :-( so I'm not sure when I could devote some time to this task. If you want to do it, go ahead and I can adapt the relminxid patch to your changes. Or you can take the relminxid patch from where it currently is, if you feel so inclined. If you don't, I'll eventually come back to it, but I'm not sure when will that be. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> We should reorganize things so this is done once at the outer level. >> It'd require some change of the ambuild() API, but considering we're >> hacking several other aspects of the AM API in this development cycle, >> that doesn't bother me. > I'm rather stuck in Mammoth Replicator currently :-( so I'm not sure > when I could devote some time to this task. OK, I'll try to get to it later this week. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> (Now, if you're combining this with the very grotty relpages/reltuples >> update code, then I'm all for making that xlog properly --- we've gotten >> away without it so far but it really should xlog the change.) > I hadn't done that, but I'll see what I can do. Notice however that I'm > doing this in both pg_class _and_ pg_database. It strikes me that the cleanest way to deal with this is to invent a single new type of xlog record, something like HEAP_UPDATE_IN_PLACE, which just replaces tuple contents with a new tuple of the same size. This would serve for both stats updates and unfreezing in both pg_class and pg_database, and might have other uses in future for non-transactional updates. It's a little bit more xlog space than your unfreeze-specific operations, but I don't think we need to sweat a few bytes for those; they're not likely to be common. Barring objections, I'll add this when I refactor UpdateStats. Got some other work I need to get back to right now, though. regards, tom lane
Alvaro Herrera wrote: Hi, > Here is a reworked relminxid patch. I added XLOG operations for heap > unfreeze and database unfreeze. The latter happens when someone > connects to a frozen database. The former happens when a LockRelation() > is called on a frozen relation, and the lock is stronger than > AccessShare. I've been further reworking this patch. It is basically as discussed: tables are unfrozen when LockRelation() is called, databases are unfrozen when they are connected to. Those operations are performed using heap_inplace_update, so we no longer need to take particular steps to XLog them. There are two new columns in pg_class (relminxid and relvacuumxid). Both of them can be set to FrozenTransactionId. The minimum of each is used to determine database-wide datminxid and datvacuumxid. Those can also be to FrozenXid. (Note that it seems to be problematic that all databases have FrozenXids; however, this can only happen in a standalone backend, when you VACUUM FREEZE a database and all other databases are already frozen. So we can truncate to CurrentTransactionId without problem.) However there's a big problem once again but I don't know how to fix it. I don't know why this didn't show up before. Consider the following scenario: - pg_attribute is frozen - pg_class is frozen CREATE TABLE foo (a int); for some unknown reason, an inval message involving relation foo seems to be emitted. heap_unfreeze(pg_class) CommandCounterIncrement() heap_unfreeze(pg_attribute) CommandCounterIncrement() ... insert the pg_attribute rows ... During one of those CommandCounterIncrement calls, the "foo" relation receives an invalidation message and the system tries to rebuild the RelationDesc. But at this point the pg_attribute rows are not there yet, so the rebuild fails because it can't find them: $ runpg 10relminxid client Bienvenido a psql 8.2devel, la terminal interactiva de PostgreSQL. Digite: \copyright para ver los términos de distribución \h para ayuda de comandos SQL \? para ayuda de comandos psql \g o or termine con punto y coma para ejecutar una consulta \q para salir alvherre=# vacuum freeze ; alvherre=# \q $ runpg 10relminxid client Bienvenido a psql 8.2devel, la terminal interactiva de PostgreSQL. Digite: \copyright para ver los términos de distribución \h para ayuda de comandos SQL \? para ayuda de comandos psql \g o or termine con punto y coma para ejecutar una consulta \q para salir alvherre=# create table hmmmm (a int); NOTICE: unfreezing pg_type NOTICE: unfreezing pg_depend NOTICE: unfreezing pg_shdepend NOTICE: unfreezing pg_attribute ERROR: catalog is missing 1 attribute(s) for relid 16409 Strangely this doesn't happen if I create the table right away at the same connection that did the VACUUM FREEZE. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > CREATE TABLE foo (a int); > for some unknown reason, an inval message involving relation foo seems > to be emitted. > heap_unfreeze(pg_class) > CommandCounterIncrement() > heap_unfreeze(pg_attribute) > CommandCounterIncrement() > ... insert the pg_attribute rows ... Where did all these CommandCounterIncrement calls come from? I don't think it's going to work if you are throwing in CCIs at random places; this problem with the relcache will be the least of your worries. Why do you think you need that anyway? regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > CREATE TABLE foo (a int); > > > for some unknown reason, an inval message involving relation foo seems > > to be emitted. > > > heap_unfreeze(pg_class) > > CommandCounterIncrement() > > heap_unfreeze(pg_attribute) > > CommandCounterIncrement() > > ... insert the pg_attribute rows ... > > Where did all these CommandCounterIncrement calls come from? > I don't think it's going to work if you are throwing in CCIs > at random places; this problem with the relcache will be the > least of your worries. Why do you think you need that anyway? I added them in heap_unfreeze precisely because I want the relation to be frozen exactly once, and this doesn't seem to happen if I don't CCI there -- I was seeing multiple occurences of the NOTICE I put in heap_unfreeze for the same relation for a single CREATE TABLE (multiple unfreezes of pg_class and pg_attribute, for example). Maybe the problem is elsewhere. I'll investigate more. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Where did all these CommandCounterIncrement calls come from? > I added them in heap_unfreeze precisely because I want the relation to > be frozen exactly once, and this doesn't seem to happen if I don't CCI > there -- I was seeing multiple occurences of the NOTICE I put in > heap_unfreeze for the same relation for a single CREATE TABLE (multiple > unfreezes of pg_class and pg_attribute, for example). Well, that needs rethinking. The unfreeze has to be a non-transactional update (if our transaction rolls back, the unfreeze still has to "stick", because we may have put dead tuples into the rel). Therefore, a CCI is neither necessary nor useful. I didn't look at your patch in any detail ... didn't you modify it to use the non-transactional update code I put in heapam.c recently? It might be that we need to broadcast an sinval message for the tuple when we update it this way ... although sinval believes updates are transactional, so that's not going to work all that well. Maybe we have to legislate that catcache/syscache entries shouldn't be trusted to have up-to-date values of these fields. regards, tom lane
I wrote: > Well, that needs rethinking. The unfreeze has to be a non-transactional > update (if our transaction rolls back, the unfreeze still has to > "stick", because we may have put dead tuples into the rel). Actually, this seems even messier than I thought. Consider a transaction that does something transactional to a table's schema, thereby generating a new pg_class row (but not touching any data within the table), and then alters the table contents, requiring an unfreeze. An update to the apparently-current pg_class tuple is not good because that tuple might be rolled back. An update to the last committed version doesn't work either. This seems real close to the recent discussions about how to put sequence data into a single one-row-per-sequence system catalog, specifically about how there were some parts of the sequence catalog data that should be transactional and some that should not be. I'm wondering if we need a second pg_class-derived catalog that carries just the nontransactional columns. regards, tom lane
Tom Lane wrote: > I wrote: > > Well, that needs rethinking. The unfreeze has to be a non-transactional > > update (if our transaction rolls back, the unfreeze still has to > > "stick", because we may have put dead tuples into the rel). > > Actually, this seems even messier than I thought. Consider a > transaction that does something transactional to a table's schema, > thereby generating a new pg_class row (but not touching any data within > the table), and then alters the table contents, requiring an unfreeze. > An update to the apparently-current pg_class tuple is not good because > that tuple might be rolled back. An update to the last committed > version doesn't work either. Well, if a transaction modifies a table in some way, even without changing the data, should generate an unfreeze event, because it will need to lock the table; for example AlterTable locks the affected relation with AccessExclusiveLock. It's important for the non-transactional change to the pg_class tuple be the very first in the transaction, because otherwise the change could be lost; but other than this, I don't think there's any problem. Not that I had actually considered this problem, to be frank; but it seems a nice side effect of how the unfreezing works. > This seems real close to the recent discussions about how to put > sequence data into a single one-row-per-sequence system catalog, > specifically about how there were some parts of the sequence catalog > data that should be transactional and some that should not be. > I'm wondering if we need a second pg_class-derived catalog that carries > just the nontransactional columns. I hope we don't need to do this because ISTM it will be a very big change. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Well, if a transaction modifies a table in some way, even without > changing the data, should generate an unfreeze event, because it will > need to lock the table; for example AlterTable locks the affected > relation with AccessExclusiveLock. It's important for the > non-transactional change to the pg_class tuple be the very first in the > transaction, because otherwise the change could be lost; but other than > this, I don't think there's any problem. You can't guarantee that. Consider for instance manual updates to pg_class: BEGIN; UPDATE pg_class SET reltriggers = 0 WHERE relname = ... ... alter table contents ... COMMIT or ROLLBACK; I believe there are actually patterns like this in some pg_dump output. Will you hack every UPDATE operation to test whether it's changing pg_class and if so force an "unfreeze" operation before changing any row? No thanks :-( >> I'm wondering if we need a second pg_class-derived catalog that carries >> just the nontransactional columns. > I hope we don't need to do this because ISTM it will be a very big change. (Yawn...) We've made far bigger changes than that. The important thing is to get it right. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Well, if a transaction modifies a table in some way, even without > > changing the data, should generate an unfreeze event, because it will > > need to lock the table; for example AlterTable locks the affected > > relation with AccessExclusiveLock. It's important for the > > non-transactional change to the pg_class tuple be the very first in the > > transaction, because otherwise the change could be lost; but other than > > this, I don't think there's any problem. > > You can't guarantee that. Consider for instance manual updates to > pg_class: > > BEGIN; > UPDATE pg_class SET reltriggers = 0 WHERE relname = ... > ... alter table contents ... > COMMIT or ROLLBACK; > > I believe there are actually patterns like this in some pg_dump output. > Will you hack every UPDATE operation to test whether it's changing > pg_class and if so force an "unfreeze" operation before changing any > row? No thanks :-( Oh, true, I hadn't thought of direct updates to pg_class. > >> I'm wondering if we need a second pg_class-derived catalog that carries > >> just the nontransactional columns. > > > I hope we don't need to do this because ISTM it will be a very big change. > > (Yawn...) We've made far bigger changes than that. The important > thing is to get it right. Yeah, I know -- I've been involved in some of them. I hereby volunteer to do it for 8.2 because I'd really like to see this patch in. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support