Thread: Multiple table entries?
Hi, I recently upgraded to 8.4 and everything went great. All databases are working as they are supposed to, no problems seen. Today, however, I did a \d on a database and was surprised to see sets of 5 identical table entries for each one that is supposed to be there. Here's the databases in question (used to store apache web logs). Several other of my databases (but not all) show the same 5 entries for each real table symptom. _postgresql@heinlein:/backup/postgresql $ psql pglogd psql (8.4.0) Type "help" for help. pglogd=# \d++ List of relations Schema | Name | Type | Owner | Size | Description --------+--------------+-------+-------------+---------+------------- public | full_entries | table | _postgresql | 528 kB | public | full_entries | table | _postgresql | 528 kB | public | full_entries | table | _postgresql | 528 kB | public | full_entries | table | _postgresql | 528 kB | public | full_entries | table | _postgresql | 528 kB | public | full_temp | table | jross | 1904 kB | public | full_temp | table | jross | 1904 kB | public | full_temp | table | jross | 1904 kB | public | full_temp | table | jross | 1904 kB | public | full_temp | table | jross | 1904 kB | public | log_entries | table | _postgresql | 1648 kB | public | log_entries | table | _postgresql | 1648 kB | public | log_entries | table | _postgresql | 1648 kB | public | log_entries | table | _postgresql | 1648 kB | public | log_entries | table | _postgresql | 1648 kB | public | page_hits | table | _postgresql | 5296 kB | public | page_hits | table | _postgresql | 5296 kB | public | page_hits | table | _postgresql | 5296 kB | public | page_hits | table | _postgresql | 5296 kB | public | page_hits | table | _postgresql | 5296 kB | public | total_hits | table | _postgresql | 16 kB | public | total_hits | table | _postgresql | 16 kB | public | total_hits | table | _postgresql | 16 kB | public | total_hits | table | _postgresql | 16 kB | public | total_hits | table | _postgresql | 16 kB | (25 rows) \l correctly lists the individual databases, and when I took a look at yesterday's dump file I only see one create table statement, not 5. I browsed through the system catalogs but haven't found anything yet that can shine some light on this. System is running OpenBSD -current. Thanks, Jeff Ross
On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote: > Hi, > > I recently upgraded to 8.4 and everything went great. All databases are > working as they are supposed to, no problems seen. > > Today, however, I did a \d on a database and was surprised to see sets of 5 > identical table entries for each one that is supposed to be there. Ugh. What does select xmin,xmax,ctid,oid,* from pg_class return? -- greg http://mit.edu/~gsstark/resume.pdf
On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote: > I browsed through the system catalogs but haven't found anything yet that > can shine some light on this. Actually, I wonder if this isn't more likely to show the problem -- it would explain why *all* your tables are showing up with duplicates rather than just one. select xmin,xmax,ctid,oid,* from pg_namespace -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote: > >> Hi, >> >> I recently upgraded to 8.4 and everything went great. �All databases are >> working as they are supposed to, no problems seen. >> >> Today, however, I did a \d on a database and was surprised to see sets of 5 >> identical table entries for each one that is supposed to be there. >> > > Ugh. > > What does > > select xmin,xmax,ctid,oid,* from pg_class > > return? > > 216 rows worth, but I only see one entry per table, not 5. For easier reading, I put the output at http://openvistas.net/pg_class_query.html Thanks! Jeff
Greg Stark wrote: > On Sat, Aug 22, 2009 at 9:31 PM, Jeff Ross<jross@wykids.org> wrote: > >> I browsed through the system catalogs but haven't found anything yet that >> can shine some light on this. >> > > Actually, I wonder if this isn't more likely to show the problem -- it > would explain why *all* your tables are showing up with duplicates > rather than just one. > > select xmin,xmax,ctid,oid,* from pg_namespace > > > http://openvistas.net/pg_namespace_query.html Jeff
On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote: > Greg Stark wrote: >> Actually, I wonder if this isn't more likely to show the problem -- it >> would explain why *all* your tables are showing up with duplicates >> rather than just one. >> >> select xmin,xmax,ctid,oid,* from pg_namespace > > http://openvistas.net/pg_namespace_query.html Yeah, that's a problem. Would you be able to load the pageinspect contrib module and run a query? select (h).* from (select heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x; -- greg http://mit.edu/~gsstark/resume.pdf
On Sun, Aug 23, 2009 at 4:40 AM, Greg Stark<gsstark@mit.edu> wrote: > On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote: >> Greg Stark wrote: > > Yeah, that's a problem. Would you be able to load the pageinspect > contrib module and run a query? > > select (h).* from (select > heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x; Also, do you have the WAL log files going back to the database creation? If so *please* preserve them! You're the second person to report similar symptoms so it's starting to look like there may be a serious bug here. And the nature of the problem is such that I think it may require someone sifting through the xlog WAL files to see what sequence of events happened. The wal files are in a subdirection of the database root called pg_xlog. If you still have one named 000000010000000000000000 then you have all of them. Please, if possible, copy them to a backup directory. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > On Sun, Aug 23, 2009 at 4:06 AM, Jeff Ross<jross@wykids.org> wrote: > >> Greg Stark wrote: >> > > >>> Actually, I wonder if this isn't more likely to show the problem -- it >>> would explain why *all* your tables are showing up with duplicates >>> rather than just one. >>> >>> select xmin,xmax,ctid,oid,* from pg_namespace >>> >> http://openvistas.net/pg_namespace_query.html >> > > Yeah, that's a problem. Would you be able to load the pageinspect > contrib module and run a query? > > select (h).* from (select > heap_page_items(get_raw_page('pg_namespace',0)) as h from p) as x; > > > > I had to modify your query slightly to make it run--hope I got what you are after! select (h).* from (select heap_page_items(get_raw_page('pg_namespace',0)) as h) as x; http://www.openvistas.net/pageinspect.html Thanks, Jeff
On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote: > I had to modify your query slightly to make it run--hope I got what you are > after! > > select (h).* from (select > heap_page_items(get_raw_page('pg_namespace',0)) as h) as x; > > http://www.openvistas.net/pageinspect.html Incidentally, may as well ask the usual questions: Did you upgrade this via pg_dump/pg_restore or via pg_migrator? Are you running with fsync=off? Are you running with synchronous_commit=off? Have you had any system crashes? Have you had any Postgres crashes? Of the above only fsync=off combined with a system crash ought to be unsafe, but it could be good to know which other events have occurred to know which code paths have been exercised. Do you still have pg_clog/0000 (that's kept longer than the xlog so I would expect you would), how large is it? could you send that file? It has a record of which transactions have committed or aborted, no private data. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote: > >> I had to modify your query slightly to make it run--hope I got what you are >> after! >> >> select (h).* from (select >> heap_page_items(get_raw_page('pg_namespace',0)) as h) as x; >> >> http://www.openvistas.net/pageinspect.html >> > > Incidentally, may as well ask the usual questions: > > Did you upgrade this via pg_dump/pg_restore or via pg_migrator? > pg_dump > Are you running with fsync=off? > No. > Are you running with synchronous_commit=off? > No. > Have you had any system crashes? > No. > Have you had any Postgres crashes? > No. > Of the above only fsync=off combined with a system crash ought to be > unsafe, but it could be good to know which other events have occurred > to know which code paths have been exercised. > > Do you still have pg_clog/0000 (that's kept longer than the xlog so I > would expect you would), how large is it? could you send that file? It > has a record of which transactions have committed or aborted, no > private data. > > pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog Jeff
Greg Stark wrote: > On Sun, Aug 23, 2009 at 5:02 PM, Greg Stark<gsstark@mit.edu> wrote: > >> On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote: >> Incidentally, may as well ask the usual questions: >> > > And just for reference, what does pg_controldata print? > > _postgresql@heinlein:/var/postgresql $ pg_controldata pg_control version number: 843 Catalog version number: 200904091 Database system identifier: 5368117850282091454 Database cluster state: in production pg_control last modified: Sun Aug 23 10:37:39 2009 Latest checkpoint location: 0/4203B43C Prior checkpoint location: 0/4202F978 Latest checkpoint's REDO location: 0/4203B43C Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/116884 Latest checkpoint's NextOID: 866525 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Time of latest checkpoint: Sun Aug 23 10:37:38 2009 Minimum recovery ending location: 0/0 Maximum data alignment: 4 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 2000 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by reference
On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote: > > pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog Sorry, I'm getting a 404 -- greg http://mit.edu/~gsstark/resume.pdf
On Sun, Aug 23, 2009 at 6:23 PM, Greg Stark<gsstark@mit.edu> wrote: > On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote: >> >> pg_clog is 32K. I've put it at http://www.openvistas.net/pg_clog > > Sorry, I'm getting a 404 For what it's worth this is what the heap dump shows. I don't understand why you're seeing these tuples show up if they're all properly marked as updated and XMAX_COMMITTED. The last tuple is marked strangely I think. I don't think it's supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also, I don't understand why it's marked as UPDATED if ctid and xmax aren't set. Could you also send select (h).* from (select page_header(get_raw_page('pg_namespace',0)) as h) as x; I'm wondering if the page allvisible flag is set. The visibility map is one of the few 8.4 patches which impact transaction visibility. lp lp_off flags lp_len t_xmin t_xmax field3 t_ctid t_info2 t_info 3 7616 1 141 2 897 0 (0,6) 16387 9482 HOT_UPDATED, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED 6 7340 1 129 897 898 0 (0,7) -16381 9482 HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED 7 7220 1 117 898 899 0 (0,9) -16381 9482 HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED 9 7088 1 129 899 900 0 (0,10) -16381 9482 HOT_UPDATED, HEAP_ONLY_TUPLE, UPDATED, XMAX_COMMITTED, XMIN_COMMITTED 10 6944 1 141 900 0 0 (0,10) -32765 10506 UPDATED, XMAX_INVALID, XMIN_COMMITTED -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > The last tuple is marked strangely I think. I don't think it's > supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also, > I don't understand why it's marked as UPDATED if ctid and xmax aren't > set. No, that all looks right to me. UPDATE sets HEAP_UPDATED on the newer version of the row, not the older one. What looks interesting to me is that the last update isn't marked HEAP_ONLY_TUPLE, ie, it's not in the same HOT chain. Why is that I wonder ... > I'm wondering if the page allvisible flag is set. The visibility map > is one of the few 8.4 patches which impact transaction visibility. But we don't use that while examining individual tuples, do we? regards, tom lane
On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> The last tuple is marked strangely I think. I don't think it's >> supposed to have XMAX_INVALID if xmax is 0 but I could be wrong. Also, >> I don't understand why it's marked as UPDATED if ctid and xmax aren't >> set. > > No, that all looks right to me. UPDATE sets HEAP_UPDATED on the newer > version of the row, not the older one. What looks interesting to me is > that the last update isn't marked HEAP_ONLY_TUPLE, ie, it's not in the > same HOT chain. Why is that I wonder ... > >> I'm wondering if the page allvisible flag is set. The visibility map >> is one of the few 8.4 patches which impact transaction visibility. > > But we don't use that while examining individual tuples, do we? We don't use the visibility map itself but we *do* use the page header's all visible bit. On a sequential scan we skip the visibility check for tuples on the page if the page header bit is set. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> But we don't use that while examining individual tuples, do we? > We don't use the visibility map itself but we *do* use the page > header's all visible bit. On a sequential scan we skip the visibility > check for tuples on the page if the page header bit is set. Urgh. I bet that's where the problem is then. Some path is failing to clear that bit, or maybe there's a race condition that allows it to become set incorrectly (ie right after somebody else adds a tuple). regards, tom lane
On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> On Sun, Aug 23, 2009 at 7:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> But we don't use that while examining individual tuples, do we? > >> We don't use the visibility map itself but we *do* use the page >> header's all visible bit. On a sequential scan we skip the visibility >> check for tuples on the page if the page header bit is set. > > Urgh. I bet that's where the problem is then. Some path is failing > to clear that bit, or maybe there's a race condition that allows it > to become set incorrectly (ie right after somebody else adds a tuple). Well it only gets set in vacuum when we have the vacuum cleanout lock so I can't see it getting set incorrectly unless there's a simple logic error in vacuum. I just scanned it and it looks right to me. I can imagine it failing to be unset. Or being unset but allowing a race condition window before it's unset when it's wrong. That doesn't explain why it the HEAP_ONLY_TUPLE flag would be missing though. Also, vacuum logs a warning if the flag is found to be set and shouldn't be. I suppose that could easily be missed though if it's autovacuum that ran it. Also, the flag might fail to be cleared but then be cleared later by another operation. I wonder how much damage it can do while it's wrong. -- greg http://mit.edu/~gsstark/resume.pdf
On Sun, Aug 23, 2009 at 5:02 PM, Greg Stark<gsstark@mit.edu> wrote: > On Sun, Aug 23, 2009 at 2:18 PM, Jeff Ross<jross@wykids.org> wrote: > Incidentally, may as well ask the usual questions: And just for reference, what does pg_controldata print? -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Urgh. I bet that's where the problem is then. Some path is failing >> to clear that bit, or maybe there's a race condition that allows it >> to become set incorrectly (ie right after somebody else adds a tuple). > Well it only gets set in vacuum when we have the vacuum cleanout lock > so I can't see it getting set incorrectly unless there's a simple > logic error in vacuum. I just scanned it and it looks right to me. heap_update is broken. Details left as an exercise for the reader (hint: Heikki has blatantly ignored the WAL-logging rules in transam/README) regards, tom lane
Greg Stark wrote: > On Sun, Aug 23, 2009 at 5:37 PM, Jeff Ross<jross@wykids.org> wrote: > >> pg_clog is 32K. �I've put it at http://www.openvistas.net/pg_clog >> > > Sorry, I'm getting a 404 > > > Oops--fixed now. Jeff
Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > >> On Sun, Aug 23, 2009 at 7:34 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> >>> Urgh. I bet that's where the problem is then. Some path is failing >>> to clear that bit, or maybe there's a race condition that allows it >>> to become set incorrectly (ie right after somebody else adds a tuple). >>> > > >> Well it only gets set in vacuum when we have the vacuum cleanout lock >> so I can't see it getting set incorrectly unless there's a simple >> logic error in vacuum. I just scanned it and it looks right to me. >> > > heap_update is broken. Details left as an exercise for the reader > (hint: Heikki has blatantly ignored the WAL-logging rules in transam/README) > > regards, tom lane > > Well, as the reader that started this all ;-) should I be worried? I don't see any obvious problem with the way 8.4.0 is working. Nearly every web page on my server is stored in postgres, and we're still churning out pages, no problem. Should I do a pg_dump and reinstall? Roll back to 8.3.7? Or just relax, don't worry and have a sparkling adult beverage? Thanks, Jeff (proceeding with relaxing, not worrying, and so on)
On Sun, Aug 23, 2009 at 9:06 PM, Jeff Ross<jross@wykids.org> wrote: > pglogd=# select (h).* from (select > page_header(get_raw_page('pg_namespace',0)) > pglogd(# as h) as x; > lsn | tli | flags | lower | upper | special | pagesize | version | > prune_xid > -----------+-----+-------+-------+-------+---------+----------+---------+----------- > 0/1A15904 | 1 | 5 | 64 | 6944 | 8192 | 8192 | 4 | > 897 Well this confirms that the bug involves the all-visible flag as the 5 is PD_ALL_VISIBLE + PD_HAS_FREE_LINES It's been overtaken by events as Tom has spotted a problem already. -- greg http://mit.edu/~gsstark/resume.pdf
Jeff Ross <jross@wykids.org> writes: > Tom Lane wrote: >> heap_update is broken. Details left as an exercise for the reader > Well, as the reader that started this all ;-) should I be worried? > Should I do a pg_dump and reinstall? Roll back to 8.3.7? Or just > relax, don't worry and have a sparkling adult beverage? Well, it's a pretty bad bug but as far as I can see a simple "VACUUM table" command should fix it up --- would you confirm? regards, tom lane
Tom Lane wrote: > Jeff Ross <jross@wykids.org> writes: > >> Tom Lane wrote: >> >>> heap_update is broken. Details left as an exercise for the reader >>> > > >> Well, as the reader that started this all ;-) should I be worried? >> Should I do a pg_dump and reinstall? Roll back to 8.3.7? Or just >> relax, don't worry and have a sparkling adult beverage? >> > > Well, it's a pretty bad bug but as far as I can see a simple "VACUUM > table" command should fix it up --- would you confirm? > > regards, tom lane > > Hah! It did indeed clear it up! pglogd=# \d List of relations Schema | Name | Type | Owner --------+--------------+-------+------------- public | full_entries | table | _postgresql public | full_temp | table | jross public | log_entries | table | _postgresql public | page_hits | table | _postgresql public | total_hits | table | _postgresql (5 rows) Thanks, Tom and Greg. Looking forward to 8.4.1 :-) Jeff
Jeff Ross <jross@wykids.org> writes: > Tom Lane wrote: >> Well, it's a pretty bad bug but as far as I can see a simple "VACUUM >> table" command should fix it up --- would you confirm? > Hah! It did indeed clear it up! [ thinks... ] Actually, that only proves that the PD_ALL_VISIBLE fixup logic in vacuumlazy.c does what it's supposed to do; it doesn't in itself show anything about how the flag got to be wrong. The path that I'm seeing is enabled by the bogus coding that messes with the flag after having done XLogInsert --- that's forbidden by our coding rules, and the reason is that if XLogInsert chooses to log the entire page then the replay routine will assume that all the required changes got applied already. It's fairly easy to demonstrate the problem: create table foo (f1 int, f2 int); insert into foo values (1,2); insert into foo values (3,4); select ctid,xmin,xmax,* from foo; vacuum foo; vacuum foo; checkpoint; update foo set f2 = f2 + 1; select ctid,xmin,xmax,* from foo; -- now cause a crash, eg kill -9 on this backend After recovery, there'll be two visible copies of the two rows. However, this theory requires that you had a backend crash, and you averred upthread that you hadn't ... regards, tom lane
On Mon, Aug 24, 2009 at 2:03 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Jeff Ross <jross@wykids.org> writes: >> Tom Lane wrote: >>> heap_update is broken. Details left as an exercise for the reader > >> Well, as the reader that started this all ;-) should I be worried? >> Should I do a pg_dump and reinstall? Roll back to 8.3.7? Or just >> relax, don't worry and have a sparkling adult beverage? > > Well, it's a pretty bad bug but as far as I can see a simple "VACUUM > table" command should fix it up --- would you confirm? At the very least taking regular pg_dumps is probably wise. That's probably wise even if there aren't Postges bugs though since it's the most flexible type of backup to deal with application bugs with. The answer to whether you should roll back until 8.4.1 comes out will depend on how valuable your data is, how critical the downtime to repair any corruption would be, versus the time that you'll spend on rolling it back. That's a complicated calculus which will be different for every user. The bug found should only affect recovery though. So unless you have a standby slave database or have postgres or system crashes it shouldn't be relevant. -- greg http://mit.edu/~gsstark/resume.pdf