Thread: Maintenance question / DB size anomaly...
Gang, Hoping you all can help me with a rather bizarre issue that I've run across. I don't really need a solution, I think I have one, but I'd really like to run it by everyone in case I'm headed in the wrong direction. I'm running a small Slony (v1.1.5)/postgresql 8.0.4 cluster (on RedHat) that contains one master database, and two slaves. The db1 (the master) has been up for about 1.5 years, db2 (slave 1) for about 9 months, and db3 (second slave) for about two months. I do a VACUUM ANALYZE every morning on all three databases. However, the vacuum on db1 takes approxiamately 4.5 hours, and on the slaves it takes about 1/2 hour. As far as I can tell, my FSM settings are correct. This is concerning because the vacuum on db1 is starting to run into production hours. The master receives all inserts, updates and deletes (as well as a fair number of selects). The slaves are select- only. In my investigation of this anomaly, I noticed that the data/ dir on db1 (the master) is around 60 Gigs. The data directory on the slaves is around 25Gb. After about 3 months of head scratching, someone on the irc channel suggested that it may be due to index bloat. Although, doing some research, it would seem that those problems were resolved in 7.4(ish), and it wouldn't account for one database being 2.5x bigger. Another unknown is Slony overhead (both in size and vacuum times). The ONLY thing I can think of is that I DROPped a large number of tables from db1 a few months ago (they weren't getting replicated). This is on the order of 1700+ fairly largeish (50,000+ row) tables. I do not remember doing a vacuum full after dropping them, so perhaps that's my problem. I'm planning on doing some maintenance this weekend, during which I will take the whole system down, then on db1, run a VACUUM FULL ANALYZE on the whole database, then a REINDEX on my very large tables. I may drop and recreate the indexes on my big tables, as I hear that may be faster than a REINDEX. I will probably run a VACUUM FULL ANALYZE on the slaves as well. Thoughts? Suggestions? Anyone think this will actually help my problem of size and vacuum times? Do I need to take Slony down while I do this? Will the VACUUM FULL table locking interfere with Slony? Thanks for any light you all can shed on these issues... /kurt
Kurt Overberg <kurt@hotdogrecords.com> writes: > In my investigation of this anomaly, I noticed that the data/ dir on > db1 (the master) is around 60 Gigs. The data directory on the slaves > is around 25Gb. After about 3 months of head scratching, someone on > the irc channel suggested that it may be due to index bloat. This is not something you need to guess about. Compare the table and index sizes, one by one, between the master and slaves. Do a VACUUM VERBOSE on the one(s) that are radically bigger on the master, and look at what it has to say. regards, tom lane
Kurt Overberg wrote: > > In my investigation of this anomaly, I noticed that the data/ dir on db1 > (the master) is around 60 Gigs. The data directory on the slaves is > around 25Gb. After about 3 months of head scratching, someone on the > irc channel suggested that it may be due to index bloat. Although, > doing some research, it would seem that those problems were resolved in > 7.4(ish), and it wouldn't account for one database being 2.5x bigger. > Another unknown is Slony overhead (both in size and vacuum times). Check the oid2name/dbsize utilities in the contrib RPM for 8.0.x http://www.postgresql.org/docs/8.0/static/diskusage.html Shouldn't be too hard to find out where the disk space is going. Oh and 8.0.13 is the latest release of 8.0 series, so you'll want to use your maintenance window to upgrade too. Lots of good bugfixes there. > The ONLY thing I can think of is that I DROPped a large number of tables > from db1 a few months ago (they weren't getting replicated). This is on > the order of 1700+ fairly largeish (50,000+ row) tables. I do not > remember doing a vacuum full after dropping them, so perhaps that's my > problem. I'm planning on doing some maintenance this weekend, during > which I will take the whole system down, then on db1, run a VACUUM FULL > ANALYZE on the whole database, then a REINDEX on my very large tables. > I may drop and recreate the indexes on my big tables, as I hear that may > be faster than a REINDEX. I will probably run a VACUUM FULL ANALYZE on > the slaves as well. You'll probably find CLUSTER to be quicker than VACUUM FULL, although you need enough disk-space free for temporary copies of the table/indexes concerned. Dropping and recreating indexes should prove much faster than VACUUMING with them. Shouldn't matter for CLUSTER afaict. > Thoughts? Suggestions? Anyone think this will actually help my problem > of size and vacuum times? Do I need to take Slony down while I do > this? Will the VACUUM FULL table locking interfere with Slony? Well, I'd take the opportunity to uninstall/reinstall slony just to check my scripts/procedures are working. -- Richard Huxton Archonet Ltd
kurt@hotdogrecords.com (Kurt Overberg) writes: > In my investigation of this anomaly, I noticed that the data/ dir on > db1 (the master) is around 60 Gigs. The data directory on the slaves > is around 25Gb. After about 3 months of head scratching, someone on > the irc channel suggested that it may be due to index bloat. > Although, doing some research, it would seem that those problems were > resolved in 7.4(ish), and it wouldn't account for one database being > 2.5x bigger. Another unknown is Slony overhead (both in size and > vacuum times). There are three tables in Slony-I that would be of interest; on the master, do a VACUUM VERBOSE on: - [clustername].sl_log_1 - [clustername].sl_log_2 - [clustername].sl_seqlog If one or another is really bloated, that could be the cause of *some* problems. Though that shouldn't account for 35GB of space :-). > The ONLY thing I can think of is that I DROPped a large number of > tables from db1 a few months ago (they weren't getting replicated). > This is on the order of 1700+ fairly largeish (50,000+ row) tables. > I do not remember doing a vacuum full after dropping them, so perhaps > that's my problem. I'm planning on doing some maintenance this > weekend, during which I will take the whole system down, then on db1, > run a VACUUM FULL ANALYZE on the whole database, then a REINDEX on my > very large tables. I may drop and recreate the indexes on my big > tables, as I hear that may be faster than a REINDEX. I will probably > run a VACUUM FULL ANALYZE on the slaves as well. When tables are dropped, so are the data files. So even if they were bloated, they should have simply disappeared. So I don't think that's the problem. > Thoughts? Suggestions? Anyone think this will actually help my > problem of size and vacuum times? Do I need to take Slony down while > I do this? Will the VACUUM FULL table locking interfere with Slony? I'd be inclined to head to the filesystem level, and try to see what tables are bloated *there*. You should be able to search for bloated tables via the command: $ find $PGDATA/base -name "[0-9]+\.[0-9]+" That would be likely to give you a listing of filenames that look something like: 12341.1 12341.2 12341.3 12341.4 12341.5 12341.6 231441.1 231441.2 231441.3 which indicates all table (or index) data files that had to be extended past 1GB. In the above, the relation with OID 12341 would be >6GB in size, because it has been extended to have 6 additional files (in addition to the "bare" filename, 12341). You can then go into a psql session, and run the query: select * from pg_class where oid = 12341; and thereby figure out what table is involved. I'll bet that if you do this on the "origin" node, you'll find that there is some small number of tables that have *way* more 1GB partitions than there are on the subscriber nodes. Those are the tables that will need attention. You could probably accomplish the reorganization more quickly via the "CLUSTER" statement; that will reorganize the table according based on the ordering of one specified index, and then regenerate all the other indices. It's not MVCC-safe, so if you have reports running concurrently, this could confuse them, but if you take the apps down, as you surely should, it won't be a problem. You don't forcibly have to take Slony-I down during this, but the locks taken out on tables by CLUSTER/VACUUM FULL will block slons from doing any work until those transactions complete. I wouldn't think you need to do VACUUM FULL or CLUSTER against the subscribers if they haven't actually bloated (and based on what you have said, there is no indication that they have). -- output = reverse("ofni.secnanifxunil" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/linuxdistributions.html The quickest way to a man's heart is through his chest, with an axe.
Chris Browne <cbbrowne@acm.org> writes: > kurt@hotdogrecords.com (Kurt Overberg) writes: >> In my investigation of this anomaly, I noticed that the data/ dir on >> db1 (the master) is around 60 Gigs. The data directory on the slaves >> is around 25Gb. After about 3 months of head scratching, someone on >> the irc channel suggested that it may be due to index bloat. > I'd be inclined to head to the filesystem level, and try to see what > tables are bloated *there*. At least as a first cut, it should be sufficient to look at pg_class.relpages, which'd be far easier to correlate with table names ;-). The relpages entry should be accurate as of the most recent VACUUM on each table, which ought to be close enough unless I missed something about the problem situation. regards, tom lane
Richard, Thanks for the feedback! I found oid2name and have been mucking about with it, but haven't really found anything that stands out yet. Most of the tables/indexes I'm comparing across machines seem to take up a similar amount of disk space. I think I'm going to have to get fancy and write some shell scripts. Regarding the slony configuration scripts, you're assuming that I have such scripts. Our slony install was originally installed by a contractor, and modified since then so "getting my act together with respect to slony" is kinda beyond the scope of what I'm trying to accomplish with this maintenance. I really just want to figure out whats going on with db1, and want to do so in a way that won't ruin slony since right now it runs pretty well, and I doubt I'd be able to fix it if it seriously broke. Upon a cursory pass with oid2name, it seems that my sl_log_1_idx1 index is out of hand: -bash-3.00$ oid2name -d mydb -f 955960160 From database "mydb": Filenode Table Name -------------------------- 955960160 sl_log_1_idx1 -bash-3.00$ ls -al 955960160* -rw------- 1 postgres postgres 1073741824 Jun 19 11:08 955960160 -rw------- 1 postgres postgres 1073741824 Jun 13 2006 955960160.1 -rw------- 1 postgres postgres 909844480 Jun 19 10:47 955960160.10 -rw------- 1 postgres postgres 1073741824 Jul 31 2006 955960160.2 -rw------- 1 postgres postgres 1073741824 Sep 12 2006 955960160.3 -rw------- 1 postgres postgres 1073741824 Oct 19 2006 955960160.4 -rw------- 1 postgres postgres 1073741824 Nov 27 2006 955960160.5 -rw------- 1 postgres postgres 1073741824 Feb 3 12:57 955960160.6 -rw------- 1 postgres postgres 1073741824 Mar 2 11:57 955960160.7 -rw------- 1 postgres postgres 1073741824 Mar 29 09:46 955960160.8 -rw------- 1 postgres postgres 1073741824 Mar 29 09:46 955960160.9 I know that slony runs its vacuuming in the background, but it doesn't seem to be cleaning this stuff up. Interestingly, from my VACUUM pgfouine output, that index doesn't take that long at all to vacuum analyze (compared to my other, much larger tables). Am I making the OID->filename translation properly? Running this: SELECT relname, relpages FROM pg_class ORDER BY relpages DESC; ...gives me... sl_log_1_idx1 | 1421785 xrefmembergroup | 1023460 answerselectinstance | 565343 ...does this jibe with what I'm seeing above? I guess I'll run a full vacuum on the slony tables too? I figured something would else would jump out bigger than this. FWIW, the same table on db2 and db3 is very small, like zero. I guess this is looking like it is overhead from slony? Should I take this problem over to the slony group? Thanks again, gang- /kurt On Jun 19, 2007, at 10:13 AM, Richard Huxton wrote: > Kurt Overberg wrote: >> In my investigation of this anomaly, I noticed that the data/ dir >> on db1 (the master) is around 60 Gigs. The data directory on the >> slaves is around 25Gb. After about 3 months of head scratching, >> someone on the irc channel suggested that it may be due to index >> bloat. Although, doing some research, it would seem that those >> problems were resolved in 7.4(ish), and it wouldn't account for >> one database being 2.5x bigger. Another unknown is Slony overhead >> (both in size and vacuum times). > > Check the oid2name/dbsize utilities in the contrib RPM for 8.0.x > http://www.postgresql.org/docs/8.0/static/diskusage.html > Shouldn't be too hard to find out where the disk space is going. > > Oh and 8.0.13 is the latest release of 8.0 series, so you'll want > to use your maintenance window to upgrade too. Lots of good > bugfixes there. > >> The ONLY thing I can think of is that I DROPped a large number of >> tables from db1 a few months ago (they weren't getting >> replicated). This is on the order of 1700+ fairly largeish (50,000 >> + row) tables. I do not remember doing a vacuum full after >> dropping them, so perhaps that's my problem. I'm planning on >> doing some maintenance this weekend, during which I will take the >> whole system down, then on db1, run a VACUUM FULL ANALYZE on the >> whole database, then a REINDEX on my very large tables. I may >> drop and recreate the indexes on my big tables, as I hear that may >> be faster than a REINDEX. I will probably run a VACUUM FULL >> ANALYZE on the slaves as well. > > You'll probably find CLUSTER to be quicker than VACUUM FULL, > although you need enough disk-space free for temporary copies of > the table/indexes concerned. > > Dropping and recreating indexes should prove much faster than > VACUUMING with them. Shouldn't matter for CLUSTER afaict. > >> Thoughts? Suggestions? Anyone think this will actually help my >> problem of size and vacuum times? Do I need to take Slony down >> while I do this? Will the VACUUM FULL table locking interfere >> with Slony? > > Well, I'd take the opportunity to uninstall/reinstall slony just to > check my scripts/procedures are working. > > -- > Richard Huxton > Archonet Ltd
Kurt Overberg wrote: > Richard, > > Thanks for the feedback! I found oid2name and have been mucking about > with it, but haven't really found anything that stands out yet. Most of > the tables/indexes I'm comparing across machines seem to take up a > similar amount of disk space. I think I'm going to have to get fancy > and write some shell scripts. Regarding the slony configuration > scripts, you're assuming that I have such scripts. Our slony install > was originally installed by a contractor, and modified since then so > "getting my act together with respect to slony" is kinda beyond the > scope of what I'm trying to accomplish with this maintenance. I really > just want to figure out whats going on with db1, and want to do so in a > way that won't ruin slony since right now it runs pretty well, and I > doubt I'd be able to fix it if it seriously broke. > > Upon a cursory pass with oid2name, it seems that my sl_log_1_idx1 index > is out of hand: If the sl_log_1 table is large too, it'll be worth reading throught the FAQ to see if any of its notes apply. http://cbbrowne.com/info/faq.html > -bash-3.00$ oid2name -d mydb -f 955960160 > From database "mydb": > Filenode Table Name > -------------------------- > 955960160 sl_log_1_idx1 > > -bash-3.00$ ls -al 955960160* > -rw------- 1 postgres postgres 1073741824 Jun 19 11:08 955960160 > -rw------- 1 postgres postgres 1073741824 Jun 13 2006 955960160.1 > -rw------- 1 postgres postgres 909844480 Jun 19 10:47 955960160.10 > -rw------- 1 postgres postgres 1073741824 Jul 31 2006 955960160.2 > -rw------- 1 postgres postgres 1073741824 Sep 12 2006 955960160.3 > -rw------- 1 postgres postgres 1073741824 Oct 19 2006 955960160.4 > -rw------- 1 postgres postgres 1073741824 Nov 27 2006 955960160.5 > -rw------- 1 postgres postgres 1073741824 Feb 3 12:57 955960160.6 > -rw------- 1 postgres postgres 1073741824 Mar 2 11:57 955960160.7 > -rw------- 1 postgres postgres 1073741824 Mar 29 09:46 955960160.8 > -rw------- 1 postgres postgres 1073741824 Mar 29 09:46 955960160.9 > > > I know that slony runs its vacuuming in the background, but it doesn't > seem to be cleaning this stuff up. Interestingly, from my VACUUM > pgfouine output, > that index doesn't take that long at all to vacuum analyze (compared to > my other, much larger tables). Am I making the OID->filename > translation properly? Looks OK to me > Running this: > SELECT relname, relpages FROM pg_class ORDER BY relpages DESC; > ...gives me... > > sl_log_1_idx1 | 1421785 > xrefmembergroup | 1023460 > answerselectinstance | 565343 > > ...does this jibe with what I'm seeing above? I guess I'll run a full > vacuum on the slony tables too? I figured something would else would > jump out bigger than this. FWIW, the same table on db2 and db3 is very > small, like zero. I guess this is looking like it is overhead from > slony? Should I take this problem over to the slony group? Well, pages are 8KB each (by default), so that'd be about 10.8GB, which seems to match your filesizes above. Read through the FAQ I linked to - for some reason Slony's not clearing out transactions it's replicated to your slaves (they *are* in sync, aren't they?). Could be a transaction preventing vacuuming, or perhaps a partially dropped node? Check the size of the sl_log_1 table and see if that tallies. -- Richard Huxton Archonet Ltd
Chris, I took your advice, and I had found that sl_log_1 seems to be causing some of the problem. Here's the result of a VACUUM VERBOSE mydb # vacuum verbose _my_cluster.sl_log_1 ; INFO: vacuuming "_my_cluster.sl_log_1" INFO: index "sl_log_1_idx1" now contains 309404 row versions in 1421785 pages DETAIL: 455001 index row versions were removed. 1419592 index pages have been deleted, 1416435 are currently reusable. CPU 16.83s/5.07u sec elapsed 339.19 sec. ^@^@^@INFO: index "sl_log_1_idx2" now contains 312864 row versions in 507196 pages DETAIL: 455001 index row versions were removed. 506295 index pages have been deleted, 504998 are currently reusable. CPU 6.44s/2.27u sec elapsed 138.70 sec. INFO: "sl_log_1": removed 455001 row versions in 7567 pages DETAIL: CPU 0.56s/0.40u sec elapsed 6.63 sec. INFO: "sl_log_1": found 455001 removable, 309318 nonremovable row versions in 13764 pages DETAIL: 0 dead row versions cannot be removed yet. There were 51972 unused item pointers. 0 pages are entirely empty. CPU 24.13s/7.85u sec elapsed 486.49 sec. INFO: vacuuming "pg_toast.pg_toast_955960155" INFO: index "pg_toast_955960155_index" now contains 9 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_955960155": found 0 removable, 9 nonremovable row versions in 3 pages DETAIL: 0 dead row versions cannot be removed yet. There were 3 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM ...I then checked the disk and those pages are still there. If I do a: select count(*) from _my_cluster.sl_log_1; count ------- 6366 (1 row) Would a VACUUM FULL take care of this? It seems to me that its not clearing up the indexes properly. You are correct in that I do see things getting much bigger on the master than on the subscriber nodes. Could this cause my slony replication to bog down? Also- I have a question about this comment: > > You don't forcibly have to take Slony-I down during this, but the > locks taken out on tables by CLUSTER/VACUUM FULL will block slons from > doing any work until those transactions complete. Thats because no writing will be done to the tables, thus, no slony triggers will get triggered, correct? I'd rather not shut down slony if I dont have to, but will if it "is safer/better/ more badass". For those playing along at home, > $ find $PGDATA/base -name "[0-9]+\.[0-9]+" > ...I had to use: find $PGDATA/base -name "[0-9]*\.[0-9]*" ...but the pluses should have worked too. Still a much better way than how I was doing it. Thanks again for helping me with this, its greatly appreciated! /kurt
Kurt Overberg <kurt@hotdogrecords.com> writes: > mydb # vacuum verbose _my_cluster.sl_log_1 ; > INFO: "sl_log_1": found 455001 removable, 309318 nonremovable row > versions in 13764 pages > DETAIL: 0 dead row versions cannot be removed yet. Hmm. So you don't have a long-running-transactions problem (else that DETAIL number would have been large). What you do have is a failure to vacuum sl_log_1 on a regular basis (because there are so many dead/removable rows). I suspect also some sort of Slony problem, because AFAIK a properly operating Slony system shouldn't have that many live rows in sl_log_1 either --- don't they all represent as-yet-unpropagated events? I'm no Slony expert though. You probably should ask about that on the Slony lists. > ...I then checked the disk and those pages are still there. Yes, regular VACUUM doesn't try very hard to shorten the disk file. > Would a VACUUM FULL take care of this? It would, but it will take an unpleasantly long time with so many live rows to reshuffle. I'd advise first working to see if you can get the table down to a few live rows. Then a VACUUM FULL will be a snap. Also, you might want to do REINDEX after VACUUM FULL to compress the indexes --- VACUUM FULL isn't good at that. regards, tom lane
That's the thing thats kinda blowing my mind here, when I look at that table: db1=# select count(*) from _my_cluster.sl_log_1 ; count ------- 6788 (1 row) As far as my DB is concerned, there's only ~7000 rows (on average) when I look in there (it does fluctuate, I've seen it go as high as around 12k, but then its gone back down, so I know events are moving around in there). So from what I can tell- from the disk point of view, there's ~11Gb of data; from the vacuum point of view there's 309318 rows. From the psql point of view, there's only around 7,000. Am I missing something? Unless there's something going on under the hood that I don't know about (more than likely), it seems like my sl_log_1 table is munged or somehow otherwise very screwed up. I fear that a re-shuffling or dropping/recreating the index will mess it up further. Maybe when I take my production systems down for maintenance, can I wait until sl_log_1 clears out, so then I can just drop that table altogether (and re-create it of course)? Thanks! /kurt On Jun 19, 2007, at 5:33 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> mydb # vacuum verbose _my_cluster.sl_log_1 ; >> INFO: "sl_log_1": found 455001 removable, 309318 nonremovable row >> versions in 13764 pages >> DETAIL: 0 dead row versions cannot be removed yet. > > Hmm. So you don't have a long-running-transactions problem (else that > DETAIL number would have been large). What you do have is a failure > to vacuum sl_log_1 on a regular basis (because there are so many > dead/removable rows). I suspect also some sort of Slony problem, > because AFAIK a properly operating Slony system shouldn't have that > many live rows in sl_log_1 either --- don't they all represent > as-yet-unpropagated events? I'm no Slony expert though. You probably > should ask about that on the Slony lists. > >> ...I then checked the disk and those pages are still there. > > Yes, regular VACUUM doesn't try very hard to shorten the disk file. > >> Would a VACUUM FULL take care of this? > > It would, but it will take an unpleasantly long time with so many live > rows to reshuffle. I'd advise first working to see if you can get the > table down to a few live rows. Then a VACUUM FULL will be a snap. > Also, you might want to do REINDEX after VACUUM FULL to compress the > indexes --- VACUUM FULL isn't good at that. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
Kurt Overberg <kurt@hotdogrecords.com> writes: > That's the thing thats kinda blowing my mind here, when I look at > that table: > db1=# select count(*) from _my_cluster.sl_log_1 ; > count > ------- > 6788 > (1 row) Well, that's real interesting. AFAICS there are only two possibilities: 1. VACUUM sees the other 300k tuples as INSERT_IN_PROGRESS; a look at the code shows that these are counted the same as plain live tuples, but they'd not be visible to other transactions. I wonder if you could have any really old open transactions that might have inserted all those tuples? 2. The other 300k tuples are committed good, but they are not seen as valid by a normal MVCC-aware transaction, probably because of transaction wraparound. This would require the sl_log_1 table to have escaped vacuuming for more than 2 billion transactions, which seems a bit improbable but maybe not impossible. (You did say you were running PG 8.0.x, right? That's the last version without any strong defenses against transaction wraparound...) The way to get some facts, instead of speculating, would be to get hold of the appropriate version of pg_filedump from http://sources.redhat.com/rhdb/ and dump out sl_log_1 with it (probably the -i option would be sufficient), then take a close look at the tuples that aren't visible to other transactions. (You could do "select ctid from sl_log_1" to determine which ones are visible.) regards, tom lane
Kurt Overberg <kurt@hotdogrecords.com> wrote: > > That's the thing thats kinda blowing my mind here, when I look at > that table: > > db1=# select count(*) from _my_cluster.sl_log_1 ; > count > ------- > 6788 > (1 row) > > As far as my DB is concerned, there's only ~7000 rows (on average) > when I look > in there (it does fluctuate, I've seen it go as high as around 12k, > but then its > gone back down, so I know events are moving around in there). This is consistent with my experience with Slony and sl_log_[12] I'm pretty sure that the slon processes vacuum sl_log_* on a fairly regular basis. I'm absolutely positive that slon occasionally switches from using sl_log_1, to sl_log_2, then truncates sl_log_1 (then, after some time, does the same in reverse) So, in order for you to get massive bloat of the sl_log_* tables, you must be doing a LOT of transactions in the time before it switches logs and truncates the unused version. Either that, or something is going wrong. > So from what I can tell- from the disk point of view, there's ~11Gb > of data; from the > vacuum point of view there's 309318 rows. From the psql point of > view, there's only > around 7,000. Am I missing something? Something seems wrong here. Correct me if I'm missing something, but you're saying the table takes up 11G on disk, but vacuum says there are ~14000 pages. That would mean your page size is ~800K. Doesn't seem right. > Unless there's something > going on under the > hood that I don't know about (more than likely), it seems like my > sl_log_1 table is munged or > somehow otherwise very screwed up. I fear that a re-shuffling or > dropping/recreating > the index will mess it up further. Maybe when I take my production > systems down for > maintenance, can I wait until sl_log_1 clears out, so then I can just > drop that > table altogether (and re-create it of course)? Possibly drop this node from the Slony cluster and re-add it. Unless it's the origin node, in which case you'll have to switchover, then redo the origin then switch back ... > > Thanks! > > /kurt > > > > > On Jun 19, 2007, at 5:33 PM, Tom Lane wrote: > > > Kurt Overberg <kurt@hotdogrecords.com> writes: > >> mydb # vacuum verbose _my_cluster.sl_log_1 ; > >> INFO: "sl_log_1": found 455001 removable, 309318 nonremovable row > >> versions in 13764 pages > >> DETAIL: 0 dead row versions cannot be removed yet. > > > > Hmm. So you don't have a long-running-transactions problem (else that > > DETAIL number would have been large). What you do have is a failure > > to vacuum sl_log_1 on a regular basis (because there are so many > > dead/removable rows). I suspect also some sort of Slony problem, > > because AFAIK a properly operating Slony system shouldn't have that > > many live rows in sl_log_1 either --- don't they all represent > > as-yet-unpropagated events? I'm no Slony expert though. You probably > > should ask about that on the Slony lists. > > > >> ...I then checked the disk and those pages are still there. > > > > Yes, regular VACUUM doesn't try very hard to shorten the disk file. > > > >> Would a VACUUM FULL take care of this? > > > > It would, but it will take an unpleasantly long time with so many live > > rows to reshuffle. I'd advise first working to see if you can get the > > table down to a few live rows. Then a VACUUM FULL will be a snap. > > Also, you might want to do REINDEX after VACUUM FULL to compress the > > indexes --- VACUUM FULL isn't good at that. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > > > > > -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
On Jun 19, 2007, at 7:26 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> That's the thing thats kinda blowing my mind here, when I look at >> that table: > >> db1=# select count(*) from _my_cluster.sl_log_1 ; >> count >> ------- >> 6788 >> (1 row) > > Well, that's real interesting. AFAICS there are only two > possibilities: > > 1. VACUUM sees the other 300k tuples as INSERT_IN_PROGRESS; a look at > the code shows that these are counted the same as plain live tuples, > but they'd not be visible to other transactions. I wonder if you > could > have any really old open transactions that might have inserted all > those > tuples? > Unlikely- the database has been stopped and restarted, which I think closes out transactions? Or could that cause the problems? > 2. The other 300k tuples are committed good, but they are not seen as > valid by a normal MVCC-aware transaction, probably because of > transaction wraparound. This would require the sl_log_1 table to have > escaped vacuuming for more than 2 billion transactions, which seems a > bit improbable but maybe not impossible. (You did say you were > running > PG 8.0.x, right? That's the last version without any strong defenses > against transaction wraparound...) Yep, this 8.0.4. It has been running for over a year, fairly heavy updates, so I would guess its possible. > The way to get some facts, instead of speculating, would be to get > hold > of the appropriate version of pg_filedump from > http://sources.redhat.com/rhdb/ and dump out sl_log_1 with it > (probably the -i option would be sufficient), then take a close look > at the tuples that aren't visible to other transactions. (You could > do "select ctid from sl_log_1" to determine which ones are visible.) > Okay, I've grabbed pg_filedump and got it running on the appropriate server. I really have No Idea how to read its output though. Where does the ctid from sl_log_1 appear in the following listing? Block 0 ******************************************************** <Header> ----- Block Offset: 0x00000000 Offsets: Lower 20 (0x0014) Block: Size 8192 Version 2 Upper 8176 (0x1ff0) LSN: logid 949 recoff 0xae63b06c Special 8176 (0x1ff0) Items: 0 Free Space: 8156 Length (including item array): 24 BTree Meta Data: Magic (0x00053162) Version (2) Root: Block (1174413) Level (3) FastRoot: Block (4622) Level (1) <Data> ------ Empty block - no items listed <Special Section> ----- BTree Index Section: Flags: 0x0008 (META) Blocks: Previous (0) Next (0) Level (0) .../this was taken from the first page file (955960160.0 I guess you could call it). Does this look interesting to you, Tom? FWIW- this IS on my master DB. I've been slowly preparing an upgrade to 8.2, I guess I'd better get that inta gear, hmmm? :-( /kurt > regards, tom lane
Kurt Overberg <kurt@hotdogrecords.com> writes: > Okay, I've grabbed pg_filedump and got it running on the appropriate > server. > I really have No Idea how to read its output though. Where does the > ctid from sl_log_1 > appear in the following listing? ctid is (block number, item number) > Block 0 ******************************************************** > BTree Meta Data: Magic (0x00053162) Version (2) > Root: Block (1174413) Level (3) > FastRoot: Block (4622) Level (1) This seems to be an index, not the sl_log_1 table. regards, tom lane
OOooooookaaaaaaaaay. Since the discussion has wandered a bit I just wanted to restate things in an effort to clear the problem in my head. Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to be messed up, specifically sl_log_1_idx1 seems to think that there's > 300,000 rows in the table its associated with. I just want to fix the index, really. So my question remains: Its it okay to dump and recreate that index (or reindex it) while the servers are down and the database is not being accessed? Tom, Bill, Chris and Richard, thank you so much for your thoughts on this matter so far. It helps to not feel "so alone" when dealing with difficult issues (for me anyway) on a system I don't know so much about. Thanks guys, /kurt On Jun 19, 2007, at 10:51 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> Okay, I've grabbed pg_filedump and got it running on the appropriate >> server. >> I really have No Idea how to read its output though. Where does the >> ctid from sl_log_1 >> appear in the following listing? > > ctid is (block number, item number) > >> Block 0 ******************************************************** >> BTree Meta Data: Magic (0x00053162) Version (2) >> Root: Block (1174413) Level (3) >> FastRoot: Block (4622) Level (1) > > This seems to be an index, not the sl_log_1 table. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend
In response to Kurt Overberg <kurt@hotdogrecords.com>: > OOooooookaaaaaaaaay. Since the discussion has wandered a bit I just > wanted to restate things in an effort to clear the problem in my head. > > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to > be messed up, specifically sl_log_1_idx1 seems to think that there's > > 300,000 rows in the table its associated with. I just want to fix > the index, really. So my question remains: Apologies, I must have misunderstood some previous message. > Its it okay to dump and recreate that index (or reindex it) while the > servers are down and the database is not being accessed? There are people here who know _way_ more than me -- but I can't see any reason why you couldn't just REINDEX it while everything is running. There may be some performance slowdown during the reindex, but everything should continue to chug along. A drop/recreate of the index should be OK as well. > Tom, Bill, Chris and Richard, thank you so much for your thoughts on > this matter so far. It helps to not feel "so alone" when dealing > with difficult issues (for me anyway) on a system I don't know so > much about. :D Isn't Open Source great! -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kurt Overberg wrote: > OOooooookaaaaaaaaay. Since the discussion has wandered a bit I just > wanted to restate things in an effort to clear the problem in my head. > > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to be > messed up, specifically sl_log_1_idx1 seems to think that there's >> 300,000 rows in the table its associated with. I just want to fix the > index, really. So my question remains: > > Its it okay to dump and recreate that index (or reindex it) while the > servers are down and the database is not being accessed? Well, I would probably stop the slon daemons => dropping needed indexes which slony needs can lead to quite a slowdown, and worse, the slowdown happens because the database server is doing things the wrong way. But that's mostly what you need to do. OTOH, depending upon the size of your database, you might consider starting out from a scratch database. Andreas > > Tom, Bill, Chris and Richard, thank you so much for your thoughts on > this matter so far. It helps to not feel "so alone" when dealing > with difficult issues (for me anyway) on a system I don't know so much > about. #slony@Freenode.net, #postgresql@freenode.net are quite helpful, and sometimes faster than mail. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGeTFtHJdudm4KnO0RAqDaAKDB1/eGqdwtLQdpTJzrChcp4J5M5wCglphW ljxag882h33fDWXX1ILiUU8= =jzBw -----END PGP SIGNATURE-----
Kurt Overberg <kurt@hotdogrecords.com> writes: > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to > be messed up, specifically sl_log_1_idx1 seems to think that there's >>> 300,000 rows in the table its associated with. I just want to fix > the index, really. I'm not sure how you arrive at that conclusion. The VACUUM VERBOSE output you provided here: http://archives.postgresql.org/pgsql-performance/2007-06/msg00370.php shows clearly that there are lots of rows in the table as well as the indexes. A REINDEX would certainly cut the size of the indexes but it isn't going to do anything about the extraneous rows. When last heard from, you were working on getting pg_filedump output for some of the bogus rows --- what was the result? regards, tom lane
Dang it, Tom, don't you ever get tired of being right? I guess I had been focusing on the index numbers since they came up first, and its the index files that are > 10Gb. Okay, so I did some digging with pg_filedump, and found the following: . . . . Block 406 ******************************************************** <Header> ----- Block Offset: 0x0032c000 Offsets: Lower 208 (0x00d0) Block: Size 8192 Version 2 Upper 332 (0x014c) LSN: logid 950 recoff 0x9ebcc6e4 Special 8192 (0x2000) Items: 47 Free Space: 124 Length (including item array): 212 <Data> ------ Item 1 -- Length: 472 Offset: 7720 (0x1e28) Flags: USED XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 1 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) Item 2 -- Length: 185 Offset: 7532 (0x1d6c) Flags: USED XMIN: 1489323584 CMIN: 4 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 2 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) Item 3 -- Length: 129 Offset: 7400 (0x1ce8) Flags: USED XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 3 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) Item 4 -- Length: 77 Offset: 7320 (0x1c98) Flags: USED XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 4 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) ...I then looked in the DB: mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,1)'; log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata ------------+---------+-------------+---------------+------------- +------------- (0 rows) mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,2)'; log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata ------------+---------+-------------+---------------+------------- +------------- (0 rows) mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,3)'; log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata ------------+---------+-------------+---------------+------------- +------------- (0 rows) ...is this what you were looking for, Tom? The only thing that stands out to me is the XMAX_INVALID mask. Thoughts? Thanks, /kurt On Jun 20, 2007, at 11:22 AM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to >> be messed up, specifically sl_log_1_idx1 seems to think that there's >>>> 300,000 rows in the table its associated with. I just want to fix >> the index, really. > > I'm not sure how you arrive at that conclusion. The VACUUM VERBOSE > output you provided here: > http://archives.postgresql.org/pgsql-performance/2007-06/msg00370.php > shows clearly that there are lots of rows in the table as well as > the indexes. A REINDEX would certainly cut the size of the indexes > but it isn't going to do anything about the extraneous rows. > > When last heard from, you were working on getting pg_filedump > output for > some of the bogus rows --- what was the result? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Kurt Overberg <kurt@hotdogrecords.com> writes: > Okay, so I did some digging with pg_filedump, and found the following: > Block 406 ******************************************************** > Item 1 -- Length: 472 Offset: 7720 (0x1e28) Flags: USED > XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 > Block Id: 406 linp Index: 1 Attributes: 6 Size: 32 > infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) This is pretty much what you'd expect for a never-updated tuple... > mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,1)'; > log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | > log_cmddata > ------------+---------+-------------+---------------+------------- > +------------- > (0 rows) so I have to conclude that you've got a wraparound problem. What is the current XID counter? (pg_controldata will give you that, along with a lot of other junk.) It might also be interesting to take a look at "ls -l $PGDATA/pg_clog"; the mod times on the files in there would give us an idea how fast XIDs are being consumed. regards, tom lane
Drat! I'm wrong again. I thought for sure there wouldn't be a wraparound problem. So does this affect the entire database server, or just this table? Is best way to proceed to immediately ditch this db and promote one of my slaves to a master? I'm just concerned about the data integrity. Note that I don't use OID for anything really, so I'm hoping I'll be safe. Thanks again, Tom. /kurt pg_controldata output: -bash-3.00$ pg_controldata pg_control version number: 74 Catalog version number: 200411041 Database system identifier: 4903924957417782767 Database cluster state: in production pg_control last modified: Wed 20 Jun 2007 03:19:52 PM CDT Current log file ID: 952 Next log file segment: 154 Latest checkpoint location: 3B8/920F0D78 Prior checkpoint location: 3B8/8328E4A4 Latest checkpoint's REDO location: 3B8/9200BBF0 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 1490547335 Latest checkpoint's NextOID: 3714961319 Time of latest checkpoint: Wed 20 Jun 2007 03:17:50 PM CDT Database block size: 8192 Blocks per segment of large relation: 131072 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 -bash-3.00$ echo $PGDATA Here's the list from pg_clog for June: -rw------- 1 postgres postgres 262144 Jun 1 03:36 054D -rw------- 1 postgres postgres 262144 Jun 1 08:16 054E -rw------- 1 postgres postgres 262144 Jun 1 10:24 054F -rw------- 1 postgres postgres 262144 Jun 1 17:03 0550 -rw------- 1 postgres postgres 262144 Jun 2 03:32 0551 -rw------- 1 postgres postgres 262144 Jun 2 10:04 0552 -rw------- 1 postgres postgres 262144 Jun 2 19:24 0553 -rw------- 1 postgres postgres 262144 Jun 3 03:38 0554 -rw------- 1 postgres postgres 262144 Jun 3 13:19 0555 -rw------- 1 postgres postgres 262144 Jun 4 00:02 0556 -rw------- 1 postgres postgres 262144 Jun 4 07:12 0557 -rw------- 1 postgres postgres 262144 Jun 4 12:37 0558 -rw------- 1 postgres postgres 262144 Jun 4 19:46 0559 -rw------- 1 postgres postgres 262144 Jun 5 03:36 055A -rw------- 1 postgres postgres 262144 Jun 5 10:54 055B -rw------- 1 postgres postgres 262144 Jun 5 18:11 055C -rw------- 1 postgres postgres 262144 Jun 6 03:38 055D -rw------- 1 postgres postgres 262144 Jun 6 10:15 055E -rw------- 1 postgres postgres 262144 Jun 6 15:10 055F -rw------- 1 postgres postgres 262144 Jun 6 23:21 0560 -rw------- 1 postgres postgres 262144 Jun 7 07:15 0561 -rw------- 1 postgres postgres 262144 Jun 7 13:43 0562 -rw------- 1 postgres postgres 262144 Jun 7 22:53 0563 -rw------- 1 postgres postgres 262144 Jun 8 07:12 0564 -rw------- 1 postgres postgres 262144 Jun 8 14:42 0565 -rw------- 1 postgres postgres 262144 Jun 9 01:30 0566 -rw------- 1 postgres postgres 262144 Jun 9 09:19 0567 -rw------- 1 postgres postgres 262144 Jun 9 20:19 0568 -rw------- 1 postgres postgres 262144 Jun 10 03:39 0569 -rw------- 1 postgres postgres 262144 Jun 10 15:38 056A -rw------- 1 postgres postgres 262144 Jun 11 03:34 056B -rw------- 1 postgres postgres 262144 Jun 11 09:14 056C -rw------- 1 postgres postgres 262144 Jun 11 13:59 056D -rw------- 1 postgres postgres 262144 Jun 11 19:41 056E -rw------- 1 postgres postgres 262144 Jun 12 03:37 056F -rw------- 1 postgres postgres 262144 Jun 12 09:59 0570 -rw------- 1 postgres postgres 262144 Jun 12 17:23 0571 -rw------- 1 postgres postgres 262144 Jun 13 03:32 0572 -rw------- 1 postgres postgres 262144 Jun 13 09:16 0573 -rw------- 1 postgres postgres 262144 Jun 13 16:25 0574 -rw------- 1 postgres postgres 262144 Jun 14 01:28 0575 -rw------- 1 postgres postgres 262144 Jun 14 08:40 0576 -rw------- 1 postgres postgres 262144 Jun 14 15:07 0577 -rw------- 1 postgres postgres 262144 Jun 14 22:00 0578 -rw------- 1 postgres postgres 262144 Jun 15 03:36 0579 -rw------- 1 postgres postgres 262144 Jun 15 12:21 057A -rw------- 1 postgres postgres 262144 Jun 15 18:10 057B -rw------- 1 postgres postgres 262144 Jun 16 03:32 057C -rw------- 1 postgres postgres 262144 Jun 16 09:17 057D -rw------- 1 postgres postgres 262144 Jun 16 19:32 057E -rw------- 1 postgres postgres 262144 Jun 17 03:39 057F -rw------- 1 postgres postgres 262144 Jun 17 13:26 0580 -rw------- 1 postgres postgres 262144 Jun 17 23:11 0581 -rw------- 1 postgres postgres 262144 Jun 18 04:40 0582 -rw------- 1 postgres postgres 262144 Jun 18 12:23 0583 -rw------- 1 postgres postgres 262144 Jun 18 17:22 0584 -rw------- 1 postgres postgres 262144 Jun 18 19:40 0585 -rw------- 1 postgres postgres 262144 Jun 19 03:38 0586 -rw------- 1 postgres postgres 262144 Jun 19 09:30 0587 -rw------- 1 postgres postgres 262144 Jun 19 10:23 0588 -rw------- 1 postgres postgres 262144 Jun 19 16:10 0589 -rw------- 1 postgres postgres 262144 Jun 19 21:45 058A -rw------- 1 postgres postgres 262144 Jun 20 03:38 058B -rw------- 1 postgres postgres 262144 Jun 20 12:17 058C -rw------- 1 postgres postgres 131072 Jun 20 15:13 058D On Jun 20, 2007, at 2:37 PM, Tom Lane wrote: > > so I have to conclude that you've got a wraparound problem. What > is the > current XID counter? (pg_controldata will give you that, along with a > lot of other junk.) It might also be interesting to take a look at > "ls -l $PGDATA/pg_clog"; the mod times on the files in there would > give > us an idea how fast XIDs are being consumed. > > regards, tom lane
Kurt Overberg <kurt@hotdogrecords.com> writes: > Drat! I'm wrong again. I thought for sure there wouldn't be a > wraparound problem. Well, I'm not sure what it is now. You showed some invisible tuples with XMINs of XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0 XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 but the nextXID is 1490547335 which is not that far ahead of those --- about 1.2 million transactions, or less than a day's time according to the clog timestamps, which suggest that you're burning several million XIDs a day. Perhaps you've wrapped past them since your earlier check --- if you try the same "select where ctid = " queries now, do they show rows? The other thing that's strange here is that an 8.0 installation should be pretty aggressive about recycling pg_clog segments, and yet you've got a bunch there. How far back do the files in pg_clog go --- what's the numeric range of the filenames, and the date range of their mod times? Have you checked the postmaster log to see if you're getting any complaints about checkpoint failures or anything like that? It would also be useful to look at the output of select datname, age(datfrozenxid) from pg_database; regards, tom lane
Okay, select * from _my_cluster.sl_log_1 where ctid = '(1,1)'; select * from _my_cluster.sl_log_1 where ctid = '(1,2)'; select * from _my_cluster.sl_log_1 where ctid = '(1,3)'; select * from _my_cluster.sl_log_1 where ctid = '(1,4)'; all returns zero rows. When I do a dump of that file, I get: Block 1 ******************************************************** <Header> ----- Block Offset: 0x00002000 Offsets: Lower 408 (0x0198) Block: Size 8192 Version 2 Upper 7680 (0x1e00) LSN: logid 955 recoff 0x0daed68c Special 8192 (0x2000) Items: 97 Free Space: 7272 Length (including item array): 412 <Data> ------ Item 1 -- Length: 121 Offset: 8068 (0x1f84) Flags: USED XMIN: 1491480520 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 Block Id: 1 linp Index: 1 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) ...the fact that they weren't in the table, but in the file (I did the filedump first, then the query), then redid the filedump, the results are the same, the rows are still in the file. I have no idea how frequently these files are getting written to, I assume frequently. I also looked at the last block listed in the file, 6445, and also looked for items 1-4, and also did not find them in the table using a similar select as above. That seems kinda strange, since there's right this second 11,000 items in that table, but I'll roll with it for awhile. Intrigued, I wanted to see what a filedump looked like of a row that WAS in the table: ctid | log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | (7,1) | 10 | 1491481037 | 8 | 473490934 | I | (memberid,answerid,taskinstanceid) values ('144854','148707','0') Block 7 ******************************************************** <Header> ----- Block Offset: 0x0000e000 Offsets: Lower 424 (0x01a8) Block: Size 8192 Version 2 Upper 508 (0x01fc) LSN: logid 955 recoff 0x0dc4bcc0 Special 8192 (0x2000) Items: 101 Free Space: 84 Length (including item array): 428 <Data> ------ Item 1 -- Length: 129 Offset: 8060 (0x1f7c) Flags: USED XMIN: 1491481037 CMIN: 7 XMAX: 0 CMAX|XVAC: 0 Block Id: 7 linp Index: 1 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) ...the NextID was (taken about 5 mins after the previous filedump): Latest checkpoint's NextXID: 1491498183 I don't see any real differences in the file entry for a row that is in the table, and one that I don't see in the table. I hope I'm getting this right, its totally fascinating seeing how all this works. About your other questions: 1. I have pg_clog segments all the way back to the start of the database, all the way back to March 14th, 2006 (most likely when the database was first brought up on this machine). The numeric names start at 0000 and go to 058E. I checked the recent (within last 8 days) and saw no errors containing the word 'checkpoint'. In fact, very few errors at all. The dang thing seems to be running pretty well, just a little slow. mydb=# select datname, age(datfrozenxid) from pg_database; datname | age -----------+------------ template1 | 1491520270 template0 | 1491520270 postgres | 1491520270 mydb | 1076194904 Oooooooo..... thats not good, is it? Thanks for taking an interest, Tom. I'm most likely going to promote one of my subscribers to be master, then nuke this database, but I have no problems keeping it around if you think I may have found some obscure bug that could help someone debug. Again, this DB gets vacuumed every day, and in the beginning, I think I remember doing a vacuum full every day. Thanks, /kurt On Jun 20, 2007, at 5:08 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> Drat! I'm wrong again. I thought for sure there wouldn't be a >> wraparound problem. > > Well, I'm not sure what it is now. You showed some invisible tuples > with XMINs of > XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 > XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0 > XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 > but the nextXID is > 1490547335 > which is not that far ahead of those --- about 1.2 million > transactions, > or less than a day's time according to the clog timestamps, which > suggest that you're burning several million XIDs a day. Perhaps > you've > wrapped past them since your earlier check --- if you try the same > "select where ctid = " queries now, do they show rows? > > The other thing that's strange here is that an 8.0 installation should > be pretty aggressive about recycling pg_clog segments, and yet you've > got a bunch there. How far back do the files in pg_clog go --- what's > the numeric range of the filenames, and the date range of their mod > times? Have you checked the postmaster log to see if you're > getting any > complaints about checkpoint failures or anything like that? It would > also be useful to look at the output of > select datname, age(datfrozenxid) from pg_database; > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq