Thread: Latest version of Hot Standby patch
http://wiki.postgresql.org/wiki/Hot_Standby now contains a link to latest version of this patch. This is still at "v5", just brought forward to CVS HEAD. I will be doing further work on the patch from here and will reply to this post each time I submit a new version. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: > http://wiki.postgresql.org/wiki/Hot_Standby > > now contains a link to latest version of this patch. This is still at > "v5", just brought forward to CVS HEAD. > > I will be doing further work on the patch from here and will reply to > this post each time I submit a new version. New version (already!) v5 - this time slightly broken down to aid review -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Can't we use the existing backendid in place of the slot id? (sorry if this has been discussed already; can't remember) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2008-12-18 at 15:13 +0200, Heikki Linnakangas wrote: > Can't we use the existing backendid in place of the slot id? > > (sorry if this has been discussed already; can't remember) Exactly the sort of question we need, but unfortunately I'm a little hazy, but I just woke up some maybe some coffee will change that answer later. They're certainly related and I did try it initially. SlotId was not present in early versions of the patch up to 13 Oct, though backendId was. IIRC there were a couple of reasons * corner case behaviour of backendids - bgwriter writes checkpoint WAL records. Has no backendid, but needs a slotid (possibly others) * slotids are assigned once and never changed, so allowing them to be used as array lookups directly I think it would be possible to use slotids as backendids, but not the other way around. Anyway, seemed like a great way to induce bugs into the sinval code, so I didn't try too hard to make them identical. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: > http://wiki.postgresql.org/wiki/Hot_Standby > > now contains a link to latest version of this patch. This is still at > "v5", just brought forward to CVS HEAD. > > I will be doing further work on the patch from here and will reply to > this post each time I submit a new version. New version: patch apply fixed, doc typos corrected First half of Wiki docs checked and updated to explain User and Admin Overview exactly as currently implemented. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > * corner case behaviour of backendids - bgwriter writes checkpoint WAL > records. Has no backendid, but needs a slotid (possibly others) Why does bgwriter need a slotid? It doesn't run any transactions. > * slotids are assigned once and never changed, so allowing them to be > used as array lookups directly So are backend ids. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2008-12-19 at 10:59 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > * corner case behaviour of backendids - bgwriter writes checkpoint WAL > > records. Has no backendid, but needs a slotid (possibly others) > > Why does bgwriter need a slotid? It doesn't run any transactions. > > > * slotids are assigned once and never changed, so allowing them to be > > used as array lookups directly > > So are backend ids. I'm a little hazy, to be sure. I'm pretty sure there was a blocker, but if I cannot recall it we should assume it doesn't exist. Where are you going with the thought? Remove slotId from each proc and then use backendId to identify the recovery proc? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Fri, 2008-12-19 at 10:59 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> * corner case behaviour of backendids - bgwriter writes checkpoint WAL >>> records. Has no backendid, but needs a slotid (possibly others) >> Why does bgwriter need a slotid? It doesn't run any transactions. >> >>> * slotids are assigned once and never changed, so allowing them to be >>> used as array lookups directly >> So are backend ids. > > I'm a little hazy, to be sure. I'm pretty sure there was a blocker, but > if I cannot recall it we should assume it doesn't exist. > > Where are you going with the thought? Remove slotId from each proc and > then use backendId to identify the recovery proc? Yep. Well, to be honest, I don't much like the whole notion of tracking the slots. I think we should just rely on the XLOG_RECOVERY_END records to purge stale PGPROC entries, belonging to backends that died without writing an abort record. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Well, to be honest, I don't much like the whole notion of tracking the > slots. I think we should just rely on the XLOG_RECOVERY_END records to > purge stale PGPROC entries, belonging to backends that died without > writing an abort record. Sorry, I meant XLOG_XACT_RUNNING_XACTS, not XLOG_RECOVERY_END. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2008-12-19 at 14:00 +0200, Heikki Linnakangas wrote: > Heikki Linnakangas wrote: > > Well, to be honest, I don't much like the whole notion of tracking the > > slots. I think we should just rely on the XLOG_RECOVERY_END records to > > purge stale PGPROC entries, belonging to backends that died without > > writing an abort record. > > Sorry, I meant XLOG_XACT_RUNNING_XACTS, not XLOG_RECOVERY_END. OK, previous response aborted, but the problem is fairly similar. If we just assign a recovery proc to each new transaction started then * we would need an expandable number of recovery procs to cope with the fact that we have more potentially emulated transactions than procs. This info needs to be in shared memory, so however many you allocate, it can still run out. Then what do you do? PANIC recovery and permanently fail? Kick off all queries until a XLOG_XACT_RUNNING_XACTS arrives? * there would be no direct mapping between the commit record and the proc, so each commit would need to scan the whole proc array to get the correct proc (which is potentially getting bigger and bigger because of the first point) The slotid (or backendid) is essential to keeping the number of emulated transactions bounded at all times. I don't see that it costs much, if anything and the resulting code is at least as simple as the alternatives. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: > http://wiki.postgresql.org/wiki/Hot_Standby > > now contains a link to latest version of this patch. v6 of Hot Standby now uploaded to Wiki (link above), with these changes: * Must ignore_killed_tuples and never kill_prior_tuple during index scans in recovery (v6) * XLOG_BTREE_DELETE records handled correctly (v6) * btree VACUUM code - must scan every block ofindex (v6) * BEGIN TRANSACTION READ WRITE should throw error (v6) New test cycle starting with this patch over next few days. Work continues on other items. Happy New Year everyone, -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs a écrit : > On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: >> http://wiki.postgresql.org/wiki/Hot_Standby >> >> now contains a link to latest version of this patch. > > v6 of Hot Standby now uploaded to Wiki (link above), with these changes: > > * Must ignore_killed_tuples and never kill_prior_tuple during index > scans in recovery (v6) > * XLOG_BTREE_DELETE records handled correctly (v6) > * btree VACUUM code - must scan every block of index (v6) > * BEGIN TRANSACTION READ WRITE should throw error (v6) > > New test cycle starting with this patch over next few days. > I use latest CVS version. I tried to apply the patches and I have the following error : ./hs.apply.sh: line 4: hs.prevent.v5.patch I think you forgot to update the script. hs.prevent.v5.patch doesn't exist in the tar file but hs.prevent.v6.patch does. Not sure we really need this new file because I have a compilation error if I use the new one. I don't have this error when I don't use the hs.prevent.v6.patch file. Compilation error is : utility.c: In function ‘ProcessUtility’: utility.c:292: erreur: expected ‘)’ before ‘{’ token utility.c:306: erreur: expected expression before ‘}’ token And one file didn't want to get patched : patching file src/include/catalog/pg_proc.h Hunk #1 FAILED at 3223. 1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.h.rej Not sure why. I did patch it manually, but I still have my compilation error. Regards. -- Guillaume.http://www.postgresqlfr.orghttp://dalibo.com
On Fri, 2009-01-02 at 11:02 +0100, Guillaume Lelarge wrote: > Simon Riggs a écrit : > > On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: > >> http://wiki.postgresql.org/wiki/Hot_Standby > >> > >> now contains a link to latest version of this patch. > > > > v6 of Hot Standby now uploaded to Wiki (link above), with these changes: > > > > * Must ignore_killed_tuples and never kill_prior_tuple during index > > scans in recovery (v6) > > * XLOG_BTREE_DELETE records handled correctly (v6) > > * btree VACUUM code - must scan every block of index (v6) > > * BEGIN TRANSACTION READ WRITE should throw error (v6) > > > > New test cycle starting with this patch over next few days. > > > > I use latest CVS version. I tried to apply the patches and I have the > following error : Thanks, will fix. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Fri, 2009-01-02 at 17:35 +0000, Simon Riggs wrote: > > I use latest CVS version. I tried to apply the patches and I have the > > following error : > > Thanks, will fix. Fixed various bit rots and re-packaged. v6a now up, v6 unlinked. Thanks, -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs a écrit : > On Fri, 2009-01-02 at 17:35 +0000, Simon Riggs wrote: > >>> I use latest CVS version. I tried to apply the patches and I have the >>> following error : >> Thanks, will fix. > > Fixed various bit rots and re-packaged. v6a now up, v6 unlinked. > Thanks. I only did a few checks and it worked great for me. I will try to put some more time on it. -- Guillaume.http://www.postgresqlfr.orghttp://dalibo.com
On Sun, 2009-01-04 at 21:03 +1300, Mark Kirkwood wrote: > bench=# select now(),count(*) from history; > ERROR: could not open relation base/16384/16394: No such file or > directory Thanks for the report. I'm attempting to recreate now. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: > >> http://wiki.postgresql.org/wiki/Hot_Standby >> >> now contains a link to latest version of this patch. >> > > v6 of Hot Standby now uploaded to Wiki (link above), with these changes: > > * Must ignore_killed_tuples and never kill_prior_tuple during index > scans in recovery (v6) > * XLOG_BTREE_DELETE records handled correctly (v6) > * btree VACUUM code - must scan every block of index (v6) > * BEGIN TRANSACTION READ WRITE should throw error (v6) > > New test cycle starting with this patch over next few days. > > Work continues on other items. > > Happy New Year everyone, > > I'm running some tests on v6a. The setup is: - install master, setup standby as usual, start standby - create database bench on master - initialize pgbench dataset size 100 on master - start 4 clients doing 500000 transactions each. After about 100000 transactions have been processed on the master, query the standby: bench=# \d history Table "public.history"Column | Type | Modifiers --------+-----------------------------+-----------tid | integer |bid | integer |aid | integer |delta | integer |mtime | timestamp without time zone |filler| character(22) | bench=# select now(),count(*) from history; ERROR: could not open relation base/16384/16394: No such file or directory regards Mark
On Sun, 2009-01-04 at 21:03 +1300, Mark Kirkwood wrote: > bench=# \d history > Table "public.history" > Column | Type | Modifiers > --------+-----------------------------+----------- > tid | integer | > bid | integer | > aid | integer | > delta | integer | > mtime | timestamp without time zone | > filler | character(22) | > > bench=# select now(),count(*) from history; > ERROR: could not open relation base/16384/16394: No such file or > directory >From my recreating your test case, the oids are consistent with the History table. So the cache looks good. md.c should be cacheing the file descriptor so the second use of the file should not be reopening it. I've not touched smgr/md so a missing file error is a surprise. I wonder if this is an error associated with large file handling and file forks? Smells like an FSM or VM error. Is the file actually missing? i.e. ls -l mydatadir/base/16384/16394* -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Mark Kirkwood wrote: > Simon Riggs wrote: >> On Sun, 2009-01-04 at 21:03 +1300, Mark Kirkwood wrote: >> >> >>> bench=# \d history >>> Table "public.history" >>> Column | Type | Modifiers >>> --------+-----------------------------+----------- >>> tid | integer | >>> bid | integer | >>> aid | integer | >>> delta | integer | >>> mtime | timestamp without time zone | >>> filler | character(22) | >>> >>> bench=# select now(),count(*) from history; >>> ERROR: could not open relation base/16384/16394: No such file or >>> directory >>> >> >> >From my recreating your test case, the oids are consistent with the >> History table. So the cache looks good. >> >> md.c should be cacheing the file descriptor so the second use of the >> file should not be reopening it. I've not touched smgr/md so a missing >> file error is a surprise. >> >> I wonder if this is an error associated with large file handling and >> file forks? Smells like an FSM or VM error. >> >> Is the file actually missing? i.e. ls -l mydatadir/base/16384/16394* >> >> > Yeah - > $ ls -l $PGDATA/base/16384/16394* > ls: /data0/pgslave/8.4/base/16384/16394*: No such file or directory > This might be useful: the other tables in the dataset (accounts, branches, tellers) all behave as expected: bench=# select now(),count(*) from branches; now | count -------------------------------+-------2009-01-04 22:17:00.298597+13 | 100 (1 row) I'm guessing something tied up with the fact that history has no rows to start with...
On Sun, 2009-01-04 at 22:13 +1300, Mark Kirkwood wrote: > Simon Riggs wrote: > > > > Is the file actually missing? i.e. ls -l mydatadir/base/16384/16394* > > > > > Yeah - > $ ls -l $PGDATA/base/16384/16394* > ls: /data0/pgslave/8.4/base/16384/16394*: No such file or directory What else is missing? Files, Directories etc? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Sun, 2009-01-04 at 21:03 +1300, Mark Kirkwood wrote: > > >> bench=# \d history >> Table "public.history" >> Column | Type | Modifiers >> --------+-----------------------------+----------- >> tid | integer | >> bid | integer | >> aid | integer | >> delta | integer | >> mtime | timestamp without time zone | >> filler | character(22) | >> >> bench=# select now(),count(*) from history; >> ERROR: could not open relation base/16384/16394: No such file or >> directory >> > > >From my recreating your test case, the oids are consistent with the > History table. So the cache looks good. > > md.c should be cacheing the file descriptor so the second use of the > file should not be reopening it. I've not touched smgr/md so a missing > file error is a surprise. > > I wonder if this is an error associated with large file handling and > file forks? Smells like an FSM or VM error. > > Is the file actually missing? i.e. ls -l mydatadir/base/16384/16394* > > Yeah - $ ls -l $PGDATA/base/16384/16394* ls: /data0/pgslave/8.4/base/16384/16394*: No such file or directory regards Mark
On Sun, 2009-01-04 at 22:18 +1300, Mark Kirkwood wrote: > >>> > >>> bench=# select now(),count(*) from history; > >>> ERROR: could not open relation base/16384/16394: No such file or > >>> directory > >>> > I'm guessing something tied up with the fact that history has no rows > to > start with... Good guess, thanks. I can recreate the error now, though not by following the actions in the order you mentioned. I guess the files hadn't applied fully before you ran the test. The problem I can re-create looks like this: 1. Create standby set-up, with both primary and standby active 2. Create new table on primary, but don't add data; wait for apply 3. Attempt to access new table on standby, throws ERROR as shown 4. Add 1 row on primary; wait for apply 5. Attempt to access new table on standby, no ERROR It looks to me like WAL for CREATE TABLE doesn't actually create a file, we just rely on the ability of mdextend() to create the file if required during recovery. So it looks to me like either an outstanding error with current system, or a new error introduced with recent-ish md/smgr changes. Second opinion please Heikki, if you are available? I'll come back to this in a few hours, but I have some other things need to do right now. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Re: Latest version of Hot Standby patch: unexpected error querying standby
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > On Sun, 2009-01-04 at 22:18 +1300, Mark Kirkwood wrote: >>>>> bench=# select now(),count(*) from history; >>>>> ERROR: could not open relation base/16384/16394: No such file or >>>>> directory >>>>> > >> I'm guessing something tied up with the fact that history has no rows >> to >> start with... > > Good guess, thanks. I can recreate the error now, though not by > following the actions in the order you mentioned. I guess the files > hadn't applied fully before you ran the test. > > The problem I can re-create looks like this: > > 1. Create standby set-up, with both primary and standby active > 2. Create new table on primary, but don't add data; wait for apply > 3. Attempt to access new table on standby, throws ERROR as shown > 4. Add 1 row on primary; wait for apply > 5. Attempt to access new table on standby, no ERROR > > It looks to me like WAL for CREATE TABLE doesn't actually create a file, > we just rely on the ability of mdextend() to create the file if required > during recovery. > > So it looks to me like either an outstanding error with current system, > or a new error introduced with recent-ish md/smgr changes. Second > opinion please Heikki, if you are available? Hmm, that's odd. Table creation calls RelationCreateStorage, which calls smgrcreate and writes the WAL record. smgr_redo certainly does call smgrcreate. I can reproduce that too with CVS HEAD, so it's clearly a bug. I probably introduced it with the recent smgr changes; I'll try to hunt it down. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Latest version of Hot Standby patch: unexpected error querying standby
From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote: > I can reproduce that too with CVS HEAD, so it's clearly a bug. I > probably introduced it with the recent smgr changes; I'll try to hunt it > down. Now that was an embarrassingly simple bug: --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -106,7 +106,7 @@ RelationCreateStorage(RelFileNode rnode, bool istemp) srel = smgropen(rnode); smgrcreate(srel,MAIN_FORKNUM, false); - if (istemp) + if (!istemp) { /* * Make an XLOG entry showing the file creation. If we abort, the file Fixed, as well as the same bug in RelationTruncate. Thanks for report, I'll keep this brown paper bag on for a few days... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
What I ifind interesting about this is that whereas I had been concerned that adding hot standby late in the development cycle might be destabilize the tree and add lots of time to the release cycle it seems having it might actually increase our ability to see problems in the recovery code which was previously quite hard to test. -- Greg On 4 Jan 2009, at 09:59, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com > wrote: > Heikki Linnakangas wrote: >> I can reproduce that too with CVS HEAD, so it's clearly a bug. I >> probably introduced it with the recent smgr changes; I'll try to >> hunt it down. > > Now that was an embarrassingly simple bug: > > --- a/src/backend/catalog/storage.c > +++ b/src/backend/catalog/storage.c > @@ -106,7 +106,7 @@ RelationCreateStorage(RelFileNode rnode, bool > istemp) > srel = smgropen(rnode); > smgrcreate(srel, MAIN_FORKNUM, false); > > - if (istemp) > + if (!istemp) > { > /* > * Make an XLOG entry showing the file creation. If we > abort, the file > > Fixed, as well as the same bug in RelationTruncate. Thanks for > report, I'll keep this brown paper bag on for a few days... > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs wrote: > * btree VACUUM code - must scan every block of index (v6) Need to unlock them too. --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -472,7 +472,7 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) xlrec = (xl_btree_vacuum *) XLogRecGetData(record); /* - * We need to ensure everyy block is unpinned between the + * We need to ensure every block is pinned between the * lastBlockVacuumed and the current block, if thereare any. * This ensures that every block in the index is touched during * VACUUM as required to ensurescans work correctly. @@ -482,7 +482,11 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) BlockNumber blkno = xlrec->lastBlockVacuumed+ 1; for (; blkno < xlrec->block; blkno++) + { buffer = XLogReadBufferForCleanup(xlrec->node, blkno, false); + if (BufferIsValid(buffer)) + UnlockReleaseBuffer(buffer); + } } /* -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-01-07 at 11:35 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > * btree VACUUM code - must scan every block of index (v6) > > Need to unlock them too. Oh c**p. Thanks. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
There's still something wrong with the way subtransactions are handled. I got: postgres=# SELECT * FROM foo; ERROR: could not access status of transaction 118649 DETAIL: Could not open file "pg_subtrans/0001": No such file or directory. in the standby after some testing. I created a lot of subtransactions in the master, each inserting a row to table 'foo', and left the transaction open. In another session, I did a lot of dummy activity (truncate bar; insert into bar ...) to generate WAL, and also checkpoints and pg_xlog_switch() calls. I then restarted the standby, and got the above error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-01-07 at 13:18 +0200, Heikki Linnakangas wrote: > There's still something wrong with the way subtransactions are handled. > I got: > > postgres=# SELECT * FROM foo; > ERROR: could not access status of transaction 118649 > DETAIL: Could not open file "pg_subtrans/0001": No such file or directory. > > in the standby after some testing. Please tell me some more. Was 118649 active etc? 118649 should be in pg_subtrans/057 shouldn't it?? Hmmm. > I created a lot of subtransactions in the master, each inserting a row > to table 'foo', and left the transaction open. In another session, I did > a lot of dummy activity (truncate bar; insert into bar ...) to generate > WAL, and also checkpoints and pg_xlog_switch() calls. I then restarted > the standby, and got the above error. Can you confirm this works in normal running? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-17 at 15:21 +0000, Simon Riggs wrote: > http://wiki.postgresql.org/wiki/Hot_Standby > > now contains a link to latest version of this patch. v6b now available via Wiki, fixes 5 reported issues. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Another annoyance I noticed while testing the case of a lot of subtransactions (overflowing the procarray cache) is that when you have a transaction with a lot of subtransactions open, getting the initial snapshot fails, and the standby doesn't open for read-only queries. Normally, GetRunningTransactionData determines the xid of the latest running xid by scanning the procarray. If the subxid cache has overflowed, it simply gives up. Comment there suggests that it could call ReadNewTransactionId() instead, like it does when there's no active xids in proc array. I think we should do that, or something else to alleviate the problem. When there's no xids in the procarray, couldn't we just use latestCompletedXid instead of calling ReadNewTransactionId()? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-01-07 at 15:43 +0200, Heikki Linnakangas wrote: > Another annoyance I noticed while testing I'm sorry this has annoyed you. Thanks for testing. > the case of a lot of > subtransactions (overflowing the procarray cache) is that when you have > a transaction with a lot of subtransactions open, getting the initial > snapshot fails, ...on that attempt only, yes. > and the standby doesn't open for read-only queries. It will eventually do so, at the first time there is no overflow, so in most cases the wait will be fairly short. I thought it was code that would so seldom run that it was unlikely to be bug free. I would prefer to record this as a possible enhancement once committed rather than an essential fix; would you agree? > Normally, GetRunningTransactionData determines the xid of the latest > running xid by scanning the procarray. If the subxid cache has > overflowed, it simply gives up. Comment there suggests that it could > call ReadNewTransactionId() instead, like it does when there's no active > xids in proc array. I think we should do that, or something else to > alleviate the problem. I mention in comments that I was worried about the contention that would cause since this runs in all servers. > When there's no xids in the procarray, couldn't we just use > latestCompletedXid instead of calling ReadNewTransactionId()? latestCompletedXid is protected by ProcArrayLock so not much difference between those two. If you're saying you'd prefer latestCompletedXid then I can make that change. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Wed, 2009-01-07 at 15:43 +0200, Heikki Linnakangas wrote: >> When there's no xids in the procarray, couldn't we just use >> latestCompletedXid instead of calling ReadNewTransactionId()? > > latestCompletedXid is protected by ProcArrayLock so not much difference > between those two. The big difference is that we're already holding ProcArrayLock. You could read the value of latestCompletedXid before releasing ProcArrayLock, and wouldn't need the retry logic. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-01-07 at 23:56 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Wed, 2009-01-07 at 15:43 +0200, Heikki Linnakangas wrote: > >> When there's no xids in the procarray, couldn't we just use > >> latestCompletedXid instead of calling ReadNewTransactionId()? > > > > latestCompletedXid is protected by ProcArrayLock so not much difference > > between those two. > > The big difference is that we're already holding ProcArrayLock. You > could read the value of latestCompletedXid before releasing > ProcArrayLock, and wouldn't need the retry logic. Sounds good to me then. Will rework. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2009-01-07 at 22:08 +0000, Simon Riggs wrote: > On Wed, 2009-01-07 at 23:56 +0200, Heikki Linnakangas wrote: > > Simon Riggs wrote: > > > On Wed, 2009-01-07 at 15:43 +0200, Heikki Linnakangas wrote: > > >> When there's no xids in the procarray, couldn't we just use > > >> latestCompletedXid instead of calling ReadNewTransactionId()? > > > > > > latestCompletedXid is protected by ProcArrayLock so not much difference > > > between those two. > > > > The big difference is that we're already holding ProcArrayLock. You > > could read the value of latestCompletedXid before releasing > > ProcArrayLock, and wouldn't need the retry logic. > > Sounds good to me then. Will rework. Applies brakes suddenly. I realise this is subtle trap I almost fell into the first time I coded it. The function is retrieving GetRunningTransactionData() and so we are interested in the latest running xid, not the latest completed xid. The latter is sufficient for snapshots, but the information derived by GetRunningTransactionData() is used to maintain UnobservedXids. Now we might have a discussion about whether we *need* that information, but the code is correct as it currently stands. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Wed, 2009-01-07 at 22:08 +0000, Simon Riggs wrote: >> On Wed, 2009-01-07 at 23:56 +0200, Heikki Linnakangas wrote: >>> Simon Riggs wrote: >>>> On Wed, 2009-01-07 at 15:43 +0200, Heikki Linnakangas wrote: >>>>> When there's no xids in the procarray, couldn't we just use >>>>> latestCompletedXid instead of calling ReadNewTransactionId()? >>>> latestCompletedXid is protected by ProcArrayLock so not much difference >>>> between those two. >>> The big difference is that we're already holding ProcArrayLock. You >>> could read the value of latestCompletedXid before releasing >>> ProcArrayLock, and wouldn't need the retry logic. >> Sounds good to me then. Will rework. > > Applies brakes suddenly. > > I realise this is subtle trap I almost fell into the first time I coded > it. The function is retrieving GetRunningTransactionData() and so we are > interested in the latest running xid, not the latest completed xid. The > latter is sufficient for snapshots, but the information derived by > GetRunningTransactionData() is used to maintain UnobservedXids. If there's no transactions running, latest completed xid is just what we need. When there is any transactions in procarray, we should take the max xid of those, as the patch already does. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-01-07 at 13:18 +0200, Heikki Linnakangas wrote: > There's still something wrong with the way subtransactions are handled. > I got: > > postgres=# SELECT * FROM foo; > ERROR: could not access status of transaction 118649 > DETAIL: Could not open file "pg_subtrans/0001": No such file or directory. > > in the standby after some testing. Currently, we don't run TruncateSubtrans() when we perform restartpoints. That means that it's impossible for HS to see an xid as running for which it's segment has been deleted. The only other possibility is that the segment has not yet been created. subtrans is not WAL logged, so new subtrans pages are never created during recovery, so not yet created is a typical case. This looks to be essentially the same error I had already fixed, just that my fix in slru.c is not sufficient. The reason for this is that initially this error occurred on the startup process when attempting to record a subtransaction start. Now that has been optimised away, the same error occurs, but now while reading rather than writing. We don't wish to introduce WAL logging of subtrans, so the correct fix is to simply widen the error-checking in SlruPhysicalReadPage() from using InRecovery to IsRecoveryProcessingMode(). That then causes us to return zeroes for the page requested rather than error out. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, 2009-01-08 at 12:12 +0200, Heikki Linnakangas wrote: > >> Sounds good to me then. Will rework. > > > > Applies brakes suddenly. > > > > I realise this is subtle trap I almost fell into the first time I coded > > it. The function is retrieving GetRunningTransactionData() and so we are > > interested in the latest running xid, not the latest completed xid. The > > latter is sufficient for snapshots, but the information derived by > > GetRunningTransactionData() is used to maintain UnobservedXids. > > If there's no transactions running, latest completed xid is just what we > need. > When there is any transactions in procarray, we should take the > max xid of those, as the patch already does. OK, I don't now see the need for the special case in the way I've done it. There could still be problems there, but if there are they should apply to all cases not just the no transactions running case. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2009-01-07 at 15:43 +0200, Heikki Linnakangas wrote: > Normally, GetRunningTransactionData determines the xid of the latest > running xid by scanning the procarray. If the subxid cache has > overflowed, it simply gives up. Comment there suggests that it could > call ReadNewTransactionId() instead, like it does when there's no > active xids in proc array. I think we should do that, or something > else to alleviate the problem. > > When there's no xids in the procarray, couldn't we just use > latestCompletedXid instead of calling ReadNewTransactionId()? Done. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2009-01-07 at 13:18 +0200, Heikki Linnakangas wrote: > There's still something wrong with the way subtransactions are handled. > I got: > > postgres=# SELECT * FROM foo; > ERROR: could not access status of transaction 118649 > DETAIL: Could not open file "pg_subtrans/0001": No such file or directory. > > in the standby after some testing. > > I created a lot of subtransactions in the master, each inserting a row > to table 'foo', and left the transaction open. In another session, I did > a lot of dummy activity (truncate bar; insert into bar ...) to generate > WAL, and also checkpoints and pg_xlog_switch() calls. I then restarted > the standby, and got the above error. I fixed this by ignoring I/O errors on pg_subtrans, but I think that's the wrong approach and could mask errors. ExtendSubtrans() doesn't generate WAL records, but it could. I don't want to do that either for performance reasons. Best way seems to be to do (almost) the same thing as ExtendSubtrans() during recovery, so we zero out pages at the point that recovering transactions get written to pg_subtrans. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Thanks for picking this up, despite my report was so vague. Simon Riggs wrote: > Best way seems to be to do (almost) the same thing as ExtendSubtrans() > during recovery, so we zero out pages at the point that recovering > transactions get written to pg_subtrans. Yep. Do we have the same bug with ExtendCLOG? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-01-14 at 08:27 +0200, Heikki Linnakangas wrote: > Thanks for picking this up, despite my report was so vague. > > Simon Riggs wrote: > > Best way seems to be to do (almost) the same thing as ExtendSubtrans() > > during recovery, so we zero out pages at the point that recovering > > transactions get written to pg_subtrans. > > Yep. > > Do we have the same bug with ExtendCLOG? No, the clog issues WAL records for that. 8 times less than subtrans would if we allowed it to, so its not as bad performance wise either. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support