Thread: Maintenance question / DB size anomaly...

Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
Gang,

Hoping you all can help me with a rather bizarre issue that I've run
across.  I don't really need a solution, I think I have one, but I'd
really like to run it by everyone in case I'm headed in the wrong
direction.

I'm running a small Slony (v1.1.5)/postgresql 8.0.4 cluster (on
RedHat) that contains one master database, and two slaves.  The db1
(the master) has been up for about 1.5 years, db2 (slave 1) for about
9 months, and db3 (second slave) for about two months.  I do a VACUUM
ANALYZE every morning on all three databases.  However, the vacuum on
db1 takes approxiamately 4.5 hours, and on the slaves it takes about
1/2 hour.  As far as I can tell, my FSM settings are correct.  This
is concerning because the vacuum on db1 is starting to run into
production hours.  The master receives all inserts, updates and
deletes (as well as a fair number of selects).  The slaves are select-
only.

In my investigation of this anomaly, I noticed that the data/ dir on
db1 (the master) is around 60 Gigs.  The data directory on the slaves
is around 25Gb.  After about 3 months of head scratching, someone on
the irc channel suggested that it may be due to index bloat.
Although, doing some research, it would seem that those problems were
resolved in 7.4(ish), and it wouldn't account for one database being
2.5x bigger.  Another unknown is Slony overhead (both in size and
vacuum times).

The ONLY thing I can think of is that I DROPped a large number of
tables from db1 a few months ago (they weren't getting replicated).
This is on the order of 1700+ fairly largeish (50,000+ row) tables.
I do not remember doing a vacuum full after dropping them, so perhaps
that's my problem.  I'm planning on doing some maintenance this
weekend, during which I will take the whole system down, then on db1,
run a VACUUM FULL ANALYZE on the whole database, then a REINDEX on my
very large tables.  I may drop and recreate the indexes on my big
tables, as I hear that may be faster than a REINDEX.  I will probably
run a VACUUM FULL ANALYZE on the slaves as well.

Thoughts?  Suggestions?  Anyone think this will actually help my
problem of size and vacuum times?  Do I need to take Slony down while
I do this?  Will the VACUUM FULL table locking interfere with Slony?

Thanks for any light you all can shed on these issues...

/kurt


Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> In my investigation of this anomaly, I noticed that the data/ dir on
> db1 (the master) is around 60 Gigs.  The data directory on the slaves
> is around 25Gb.  After about 3 months of head scratching, someone on
> the irc channel suggested that it may be due to index bloat.

This is not something you need to guess about.  Compare the table and
index sizes, one by one, between the master and slaves.  Do a VACUUM
VERBOSE on the one(s) that are radically bigger on the master, and look
at what it has to say.

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Richard Huxton
Date:
Kurt Overberg wrote:
>
> In my investigation of this anomaly, I noticed that the data/ dir on db1
> (the master) is around 60 Gigs.  The data directory on the slaves is
> around 25Gb.  After about 3 months of head scratching, someone on the
> irc channel suggested that it may be due to index bloat.  Although,
> doing some research, it would seem that those problems were resolved in
> 7.4(ish), and it wouldn't account for one database being 2.5x bigger.
> Another unknown is Slony overhead (both in size and vacuum times).

Check the oid2name/dbsize utilities in the contrib RPM for 8.0.x
http://www.postgresql.org/docs/8.0/static/diskusage.html
Shouldn't be too hard to find out where the disk space is going.

Oh and 8.0.13 is the latest release of 8.0 series, so you'll want to use
your maintenance window to upgrade too. Lots of good bugfixes there.

> The ONLY thing I can think of is that I DROPped a large number of tables
> from db1 a few months ago (they weren't getting replicated).  This is on
> the order of 1700+ fairly largeish (50,000+ row) tables.  I do not
> remember doing a vacuum full after dropping them, so perhaps that's my
> problem.  I'm planning on doing some maintenance this weekend, during
> which I will take the whole system down, then on db1, run a VACUUM FULL
> ANALYZE on the whole database, then a REINDEX on my very large tables.
> I may drop and recreate the indexes on my big tables, as I hear that may
> be faster than a REINDEX.  I will probably run a VACUUM FULL ANALYZE on
> the slaves as well.

You'll probably find CLUSTER to be quicker than VACUUM FULL, although
you need enough disk-space free for temporary copies of the
table/indexes concerned.

Dropping and recreating indexes should prove much faster than VACUUMING
with them. Shouldn't matter for CLUSTER afaict.

> Thoughts?  Suggestions?  Anyone think this will actually help my problem
> of size and vacuum times?  Do I need to take Slony down while I do
> this?  Will the VACUUM FULL table locking interfere with Slony?

Well, I'd take the opportunity to uninstall/reinstall slony just to
check my scripts/procedures are working.

--
   Richard Huxton
   Archonet Ltd

Re: Maintenance question / DB size anomaly...

From
Chris Browne
Date:
kurt@hotdogrecords.com (Kurt Overberg) writes:
> In my investigation of this anomaly, I noticed that the data/ dir on
> db1 (the master) is around 60 Gigs.  The data directory on the slaves
> is around 25Gb.  After about 3 months of head scratching, someone on
> the irc channel suggested that it may be due to index bloat.
> Although, doing some research, it would seem that those problems were
> resolved in 7.4(ish), and it wouldn't account for one database being
> 2.5x bigger.  Another unknown is Slony overhead (both in size and
> vacuum times).

There are three tables in Slony-I that would be of interest; on the
master, do a VACUUM VERBOSE on:

 - [clustername].sl_log_1
 - [clustername].sl_log_2
 - [clustername].sl_seqlog

If one or another is really bloated, that could be the cause of *some*
problems.  Though that shouldn't account for 35GB of space :-).

> The ONLY thing I can think of is that I DROPped a large number of
> tables from db1 a few months ago (they weren't getting replicated).
> This is on the order of 1700+ fairly largeish (50,000+ row) tables.
> I do not remember doing a vacuum full after dropping them, so perhaps
> that's my problem.  I'm planning on doing some maintenance this
> weekend, during which I will take the whole system down, then on db1,
> run a VACUUM FULL ANALYZE on the whole database, then a REINDEX on my
> very large tables.  I may drop and recreate the indexes on my big
> tables, as I hear that may be faster than a REINDEX.  I will probably
> run a VACUUM FULL ANALYZE on the slaves as well.

When tables are dropped, so are the data files.  So even if they were
bloated, they should have simply disappeared.  So I don't think that's
the problem.

> Thoughts?  Suggestions?  Anyone think this will actually help my
> problem of size and vacuum times?  Do I need to take Slony down while
> I do this?  Will the VACUUM FULL table locking interfere with Slony?

I'd be inclined to head to the filesystem level, and try to see what
tables are bloated *there*.

You should be able to search for bloated tables via the command:

$ find $PGDATA/base -name "[0-9]+\.[0-9]+"

That would be likely to give you a listing of filenames that look
something like:

12341.1
12341.2
12341.3
12341.4
12341.5
12341.6
231441.1
231441.2
231441.3

which indicates all table (or index) data files that had to be
extended past 1GB.

In the above, the relation with OID 12341 would be >6GB in size,
because it has been extended to have 6 additional files (in addition
to the "bare" filename, 12341).

You can then go into a psql session, and run the query:
  select * from pg_class where oid = 12341;
and thereby figure out what table is involved.

I'll bet that if you do this on the "origin" node, you'll find that
there is some small number of tables that have *way* more 1GB
partitions than there are on the subscriber nodes.

Those are the tables that will need attention.

You could probably accomplish the reorganization more quickly via the
"CLUSTER" statement; that will reorganize the table according based on
the ordering of one specified index, and then regenerate all the other
indices.  It's not MVCC-safe, so if you have reports running
concurrently, this could confuse them, but if you take the apps down,
as you surely should, it won't be a problem.

You don't forcibly have to take Slony-I down during this, but the
locks taken out on tables by CLUSTER/VACUUM FULL will block slons from
doing any work until those transactions complete.

I wouldn't think you need to do VACUUM FULL or CLUSTER against the
subscribers if they haven't actually bloated (and based on what you
have said, there is no indication that they have).
--
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
The quickest way to a man's heart is through his chest, with an axe.

Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> kurt@hotdogrecords.com (Kurt Overberg) writes:
>> In my investigation of this anomaly, I noticed that the data/ dir on
>> db1 (the master) is around 60 Gigs.  The data directory on the slaves
>> is around 25Gb.  After about 3 months of head scratching, someone on
>> the irc channel suggested that it may be due to index bloat.

> I'd be inclined to head to the filesystem level, and try to see what
> tables are bloated *there*.

At least as a first cut, it should be sufficient to look at
pg_class.relpages, which'd be far easier to correlate with table names
;-).  The relpages entry should be accurate as of the most recent VACUUM
on each table, which ought to be close enough unless I missed something
about the problem situation.

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
Richard,

Thanks for the feedback!  I found oid2name and have been mucking
about with it, but haven't really found anything that stands out
yet.  Most of the tables/indexes I'm comparing across machines seem
to take up a similar amount of disk space.  I think I'm going to have
to get fancy and write some shell scripts.  Regarding the slony
configuration scripts, you're assuming that I have such scripts.  Our
slony install was originally installed by a contractor, and modified
since then so "getting my act together with respect to slony" is
kinda beyond the scope of what I'm trying to accomplish with this
maintenance.  I really just want to figure out whats going on with
db1, and want to do so in a way that won't ruin slony since right now
it runs pretty well, and I doubt I'd be able to fix it if it
seriously broke.

Upon a cursory pass with oid2name, it seems that my  sl_log_1_idx1
index is out of hand:


-bash-3.00$ oid2name -d mydb -f 955960160
 From database "mydb":
    Filenode     Table Name
--------------------------
   955960160  sl_log_1_idx1

-bash-3.00$ ls -al 955960160*
-rw-------  1 postgres postgres 1073741824 Jun 19 11:08 955960160
-rw-------  1 postgres postgres 1073741824 Jun 13  2006 955960160.1
-rw-------  1 postgres postgres  909844480 Jun 19 10:47 955960160.10
-rw-------  1 postgres postgres 1073741824 Jul 31  2006 955960160.2
-rw-------  1 postgres postgres 1073741824 Sep 12  2006 955960160.3
-rw-------  1 postgres postgres 1073741824 Oct 19  2006 955960160.4
-rw-------  1 postgres postgres 1073741824 Nov 27  2006 955960160.5
-rw-------  1 postgres postgres 1073741824 Feb  3 12:57 955960160.6
-rw-------  1 postgres postgres 1073741824 Mar  2 11:57 955960160.7
-rw-------  1 postgres postgres 1073741824 Mar 29 09:46 955960160.8
-rw-------  1 postgres postgres 1073741824 Mar 29 09:46 955960160.9


I know that slony runs its vacuuming in the background, but it
doesn't seem to be cleaning this stuff up.  Interestingly, from my
VACUUM pgfouine output,
that index doesn't take that long at all to vacuum analyze (compared
to my other, much larger tables).  Am I making the OID->filename
translation properly?

Running this:
SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;
...gives me...

sl_log_1_idx1                                                   |
1421785
xrefmembergroup                                                 |
1023460
answerselectinstance                                            |
565343

...does this jibe with what I'm seeing above?  I guess I'll run a
full vacuum on the slony tables too?  I figured something would else
would jump out bigger than this.  FWIW, the same table on db2 and db3
is very small, like zero.  I guess this is looking like it is
overhead from slony?  Should I take this problem over to the slony
group?

Thanks again, gang-

/kurt



On Jun 19, 2007, at 10:13 AM, Richard Huxton wrote:

> Kurt Overberg wrote:
>> In my investigation of this anomaly, I noticed that the data/ dir
>> on db1 (the master) is around 60 Gigs.  The data directory on the
>> slaves is around 25Gb.  After about 3 months of head scratching,
>> someone on the irc channel suggested that it may be due to index
>> bloat.  Although, doing some research, it would seem that those
>> problems were resolved in 7.4(ish), and it wouldn't account for
>> one database being 2.5x bigger.  Another unknown is Slony overhead
>> (both in size and vacuum times).
>
> Check the oid2name/dbsize utilities in the contrib RPM for 8.0.x
> http://www.postgresql.org/docs/8.0/static/diskusage.html
> Shouldn't be too hard to find out where the disk space is going.
>
> Oh and 8.0.13 is the latest release of 8.0 series, so you'll want
> to use your maintenance window to upgrade too. Lots of good
> bugfixes there.
>
>> The ONLY thing I can think of is that I DROPped a large number of
>> tables from db1 a few months ago (they weren't getting
>> replicated).  This is on the order of 1700+ fairly largeish (50,000
>> + row) tables.  I do not remember doing a vacuum full after
>> dropping them, so perhaps that's my problem.  I'm planning on
>> doing some maintenance this weekend, during which I will take the
>> whole system down, then on db1, run a VACUUM FULL ANALYZE on the
>> whole database, then a REINDEX on my very large tables.  I may
>> drop and recreate the indexes on my big tables, as I hear that may
>> be faster than a REINDEX.  I will probably run a VACUUM FULL
>> ANALYZE on the slaves as well.
>
> You'll probably find CLUSTER to be quicker than VACUUM FULL,
> although you need enough disk-space free for temporary copies of
> the table/indexes concerned.
>
> Dropping and recreating indexes should prove much faster than
> VACUUMING with them. Shouldn't matter for CLUSTER afaict.
>
>> Thoughts?  Suggestions?  Anyone think this will actually help my
>> problem of size and vacuum times?  Do I need to take Slony down
>> while I do this?  Will the VACUUM FULL table locking interfere
>> with Slony?
>
> Well, I'd take the opportunity to uninstall/reinstall slony just to
> check my scripts/procedures are working.
>
> --
>   Richard Huxton
>   Archonet Ltd


Re: Maintenance question / DB size anomaly...

From
Richard Huxton
Date:
Kurt Overberg wrote:
> Richard,
>
> Thanks for the feedback!  I found oid2name and have been mucking about
> with it, but haven't really found anything that stands out yet.  Most of
> the tables/indexes I'm comparing across machines seem to take up a
> similar amount of disk space.  I think I'm going to have to get fancy
> and write some shell scripts.  Regarding the slony configuration
> scripts, you're assuming that I have such scripts.  Our slony install
> was originally installed by a contractor, and modified since then so
> "getting my act together with respect to slony" is kinda beyond the
> scope of what I'm trying to accomplish with this maintenance.  I really
> just want to figure out whats going on with db1, and want to do so in a
> way that won't ruin slony since right now it runs pretty well, and I
> doubt I'd be able to fix it if it seriously broke.
>
> Upon a cursory pass with oid2name, it seems that my  sl_log_1_idx1 index
> is out of hand:

If the sl_log_1 table is large too, it'll be worth reading throught the
FAQ to see if any of its notes apply.

http://cbbrowne.com/info/faq.html

> -bash-3.00$ oid2name -d mydb -f 955960160
>  From database "mydb":
>    Filenode     Table Name
> --------------------------
>   955960160  sl_log_1_idx1
>
> -bash-3.00$ ls -al 955960160*
> -rw-------  1 postgres postgres 1073741824 Jun 19 11:08 955960160
> -rw-------  1 postgres postgres 1073741824 Jun 13  2006 955960160.1
> -rw-------  1 postgres postgres  909844480 Jun 19 10:47 955960160.10
> -rw-------  1 postgres postgres 1073741824 Jul 31  2006 955960160.2
> -rw-------  1 postgres postgres 1073741824 Sep 12  2006 955960160.3
> -rw-------  1 postgres postgres 1073741824 Oct 19  2006 955960160.4
> -rw-------  1 postgres postgres 1073741824 Nov 27  2006 955960160.5
> -rw-------  1 postgres postgres 1073741824 Feb  3 12:57 955960160.6
> -rw-------  1 postgres postgres 1073741824 Mar  2 11:57 955960160.7
> -rw-------  1 postgres postgres 1073741824 Mar 29 09:46 955960160.8
> -rw-------  1 postgres postgres 1073741824 Mar 29 09:46 955960160.9
>
>
> I know that slony runs its vacuuming in the background, but it doesn't
> seem to be cleaning this stuff up.  Interestingly, from my VACUUM
> pgfouine output,
> that index doesn't take that long at all to vacuum analyze (compared to
> my other, much larger tables).  Am I making the OID->filename
> translation properly?

Looks OK to me

> Running this:
> SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;
> ...gives me...
>
> sl_log_1_idx1                                                   |  1421785
> xrefmembergroup                                                 |  1023460
> answerselectinstance                                            |   565343
>
> ...does this jibe with what I'm seeing above?  I guess I'll run a full
> vacuum on the slony tables too?  I figured something would else would
> jump out bigger than this.  FWIW, the same table on db2 and db3 is very
> small, like zero.  I guess this is looking like it is overhead from
> slony?  Should I take this problem over to the slony group?

Well, pages are 8KB each (by default), so that'd be about 10.8GB, which
seems to match your filesizes above.

Read through the FAQ I linked to - for some reason Slony's not clearing
out transactions it's replicated to your slaves (they *are* in sync,
aren't they?). Could be a transaction preventing vacuuming, or perhaps a
  partially dropped node?

Check the size of the sl_log_1 table and see if that tallies.

--
   Richard Huxton
   Archonet Ltd

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
Chris,

I took your advice, and I had found that sl_log_1 seems to be causing
some of the problem.  Here's the result of a VACUUM VERBOSE

mydb # vacuum verbose _my_cluster.sl_log_1 ;
INFO:  vacuuming "_my_cluster.sl_log_1"
INFO:  index "sl_log_1_idx1" now contains 309404 row versions in
1421785 pages
DETAIL:  455001 index row versions were removed.
1419592 index pages have been deleted, 1416435 are currently reusable.
CPU 16.83s/5.07u sec elapsed 339.19 sec.
^@^@^@INFO:  index "sl_log_1_idx2" now contains 312864 row versions
in 507196 pages
DETAIL:  455001 index row versions were removed.
506295 index pages have been deleted, 504998 are currently reusable.
CPU 6.44s/2.27u sec elapsed 138.70 sec.
INFO:  "sl_log_1": removed 455001 row versions in 7567 pages
DETAIL:  CPU 0.56s/0.40u sec elapsed 6.63 sec.
INFO:  "sl_log_1": found 455001 removable, 309318 nonremovable row
versions in 13764 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 51972 unused item pointers.
0 pages are entirely empty.
CPU 24.13s/7.85u sec elapsed 486.49 sec.
INFO:  vacuuming "pg_toast.pg_toast_955960155"
INFO:  index "pg_toast_955960155_index" now contains 9 row versions
in 2 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_955960155": found 0 removable, 9 nonremovable row
versions in 3 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 3 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

...I then checked the disk and those pages are still there.  If I do a:

select count(*) from  _my_cluster.sl_log_1;
count
-------
   6366
(1 row)

Would a VACUUM FULL take care of this?  It seems to me that its not
clearing up the indexes properly.  You are correct in that
I do see things getting much bigger on the master than on the
subscriber nodes.  Could this cause my slony replication to bog down?

Also- I have a question about this comment:

>
> You don't forcibly have to take Slony-I down during this, but the
> locks taken out on tables by CLUSTER/VACUUM FULL will block slons from
> doing any work until those transactions complete.

Thats because no writing will be done to the tables, thus, no slony
triggers will get triggered, correct?  I'd rather not
shut down slony if I dont have to, but will if it "is safer/better/
more badass".

For those playing along at home,

> $ find $PGDATA/base -name "[0-9]+\.[0-9]+"
>

...I had to use:

find $PGDATA/base -name "[0-9]*\.[0-9]*"

...but the pluses should have worked too.  Still a much better way
than how I was doing it.   Thanks again for helping me with this, its
greatly appreciated!

/kurt

Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> mydb # vacuum verbose _my_cluster.sl_log_1 ;
> INFO:  "sl_log_1": found 455001 removable, 309318 nonremovable row
> versions in 13764 pages
> DETAIL:  0 dead row versions cannot be removed yet.

Hmm.  So you don't have a long-running-transactions problem (else that
DETAIL number would have been large).  What you do have is a failure
to vacuum sl_log_1 on a regular basis (because there are so many
dead/removable rows).  I suspect also some sort of Slony problem,
because AFAIK a properly operating Slony system shouldn't have that
many live rows in sl_log_1 either --- don't they all represent
as-yet-unpropagated events?  I'm no Slony expert though.  You probably
should ask about that on the Slony lists.

> ...I then checked the disk and those pages are still there.

Yes, regular VACUUM doesn't try very hard to shorten the disk file.

> Would a VACUUM FULL take care of this?

It would, but it will take an unpleasantly long time with so many live
rows to reshuffle.  I'd advise first working to see if you can get the
table down to a few live rows.  Then a VACUUM FULL will be a snap.
Also, you might want to do REINDEX after VACUUM FULL to compress the
indexes --- VACUUM FULL isn't good at that.

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
That's the thing thats kinda blowing my mind here, when I look at
that table:

db1=# select count(*) from _my_cluster.sl_log_1 ;
count
-------
   6788
(1 row)

As far as my DB is concerned, there's only ~7000 rows (on average)
when I look
in there (it does fluctuate, I've seen it go as high as around 12k,
but then its
gone back down, so I know events are moving around in there).

So from what I can tell- from the disk point of view, there's ~11Gb
of data; from the
vacuum point of view there's 309318 rows.  From the psql point of
view, there's only
around 7,000.  Am I missing something?  Unless there's something
going on under the
hood that I don't know about (more than likely), it seems like my
sl_log_1 table is munged or
somehow otherwise very screwed up.  I fear that a re-shuffling or
dropping/recreating
the index will mess it up further.  Maybe when I take my production
systems down for
maintenance, can I wait until sl_log_1 clears out, so then I can just
drop that
table altogether (and re-create it of course)?

Thanks!

/kurt




On Jun 19, 2007, at 5:33 PM, Tom Lane wrote:

> Kurt Overberg <kurt@hotdogrecords.com> writes:
>> mydb # vacuum verbose _my_cluster.sl_log_1 ;
>> INFO:  "sl_log_1": found 455001 removable, 309318 nonremovable row
>> versions in 13764 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
>
> Hmm.  So you don't have a long-running-transactions problem (else that
> DETAIL number would have been large).  What you do have is a failure
> to vacuum sl_log_1 on a regular basis (because there are so many
> dead/removable rows).  I suspect also some sort of Slony problem,
> because AFAIK a properly operating Slony system shouldn't have that
> many live rows in sl_log_1 either --- don't they all represent
> as-yet-unpropagated events?  I'm no Slony expert though.  You probably
> should ask about that on the Slony lists.
>
>> ...I then checked the disk and those pages are still there.
>
> Yes, regular VACUUM doesn't try very hard to shorten the disk file.
>
>> Would a VACUUM FULL take care of this?
>
> It would, but it will take an unpleasantly long time with so many live
> rows to reshuffle.  I'd advise first working to see if you can get the
> table down to a few live rows.  Then a VACUUM FULL will be a snap.
> Also, you might want to do REINDEX after VACUUM FULL to compress the
> indexes --- VACUUM FULL isn't good at that.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org


Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> That's the thing thats kinda blowing my mind here, when I look at
> that table:

> db1=# select count(*) from _my_cluster.sl_log_1 ;
> count
> -------
>    6788
> (1 row)

Well, that's real interesting.  AFAICS there are only two possibilities:

1. VACUUM sees the other 300k tuples as INSERT_IN_PROGRESS; a look at
the code shows that these are counted the same as plain live tuples,
but they'd not be visible to other transactions.  I wonder if you could
have any really old open transactions that might have inserted all those
tuples?

2. The other 300k tuples are committed good, but they are not seen as
valid by a normal MVCC-aware transaction, probably because of
transaction wraparound.  This would require the sl_log_1 table to have
escaped vacuuming for more than 2 billion transactions, which seems a
bit improbable but maybe not impossible.  (You did say you were running
PG 8.0.x, right?  That's the last version without any strong defenses
against transaction wraparound...)

The way to get some facts, instead of speculating, would be to get hold
of the appropriate version of pg_filedump from
http://sources.redhat.com/rhdb/ and dump out sl_log_1 with it
(probably the -i option would be sufficient), then take a close look
at the tuples that aren't visible to other transactions.  (You could
do "select ctid from sl_log_1" to determine which ones are visible.)

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Bill Moran
Date:
Kurt Overberg <kurt@hotdogrecords.com> wrote:
>
> That's the thing thats kinda blowing my mind here, when I look at
> that table:
>
> db1=# select count(*) from _my_cluster.sl_log_1 ;
> count
> -------
>    6788
> (1 row)
>
> As far as my DB is concerned, there's only ~7000 rows (on average)
> when I look
> in there (it does fluctuate, I've seen it go as high as around 12k,
> but then its
> gone back down, so I know events are moving around in there).

This is consistent with my experience with Slony and sl_log_[12]

I'm pretty sure that the slon processes vacuum sl_log_* on a fairly
regular basis.  I'm absolutely positive that slon occasionally switches
from using sl_log_1, to sl_log_2, then truncates sl_log_1 (then, after
some time, does the same in reverse)

So, in order for you to get massive bloat of the sl_log_* tables, you
must be doing a LOT of transactions in the time before it switches
logs and truncates the unused version.  Either that, or something is
going wrong.

> So from what I can tell- from the disk point of view, there's ~11Gb
> of data; from the
> vacuum point of view there's 309318 rows.  From the psql point of
> view, there's only
> around 7,000.  Am I missing something?

Something seems wrong here.  Correct me if I'm missing something, but
you're saying the table takes up 11G on disk, but vacuum says there are
~14000 pages.  That would mean your page size is ~800K.  Doesn't seem
right.

> Unless there's something
> going on under the
> hood that I don't know about (more than likely), it seems like my
> sl_log_1 table is munged or
> somehow otherwise very screwed up.  I fear that a re-shuffling or
> dropping/recreating
> the index will mess it up further.  Maybe when I take my production
> systems down for
> maintenance, can I wait until sl_log_1 clears out, so then I can just
> drop that
> table altogether (and re-create it of course)?

Possibly drop this node from the Slony cluster and re-add it.  Unless
it's the origin node, in which case you'll have to switchover, then
redo the origin then switch back ...

>
> Thanks!
>
> /kurt
>
>
>
>
> On Jun 19, 2007, at 5:33 PM, Tom Lane wrote:
>
> > Kurt Overberg <kurt@hotdogrecords.com> writes:
> >> mydb # vacuum verbose _my_cluster.sl_log_1 ;
> >> INFO:  "sl_log_1": found 455001 removable, 309318 nonremovable row
> >> versions in 13764 pages
> >> DETAIL:  0 dead row versions cannot be removed yet.
> >
> > Hmm.  So you don't have a long-running-transactions problem (else that
> > DETAIL number would have been large).  What you do have is a failure
> > to vacuum sl_log_1 on a regular basis (because there are so many
> > dead/removable rows).  I suspect also some sort of Slony problem,
> > because AFAIK a properly operating Slony system shouldn't have that
> > many live rows in sl_log_1 either --- don't they all represent
> > as-yet-unpropagated events?  I'm no Slony expert though.  You probably
> > should ask about that on the Slony lists.
> >
> >> ...I then checked the disk and those pages are still there.
> >
> > Yes, regular VACUUM doesn't try very hard to shorten the disk file.
> >
> >> Would a VACUUM FULL take care of this?
> >
> > It would, but it will take an unpleasantly long time with so many live
> > rows to reshuffle.  I'd advise first working to see if you can get the
> > table down to a few live rows.  Then a VACUUM FULL will be a snap.
> > Also, you might want to do REINDEX after VACUUM FULL to compress the
> > indexes --- VACUUM FULL isn't good at that.
> >
> >             regards, tom lane
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information
and is intended only for the individual named. If the reader of
this message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
On Jun 19, 2007, at 7:26 PM, Tom Lane wrote:

> Kurt Overberg <kurt@hotdogrecords.com> writes:
>> That's the thing thats kinda blowing my mind here, when I look at
>> that table:
>
>> db1=# select count(*) from _my_cluster.sl_log_1 ;
>> count
>> -------
>>    6788
>> (1 row)
>
> Well, that's real interesting.  AFAICS there are only two
> possibilities:
>
> 1. VACUUM sees the other 300k tuples as INSERT_IN_PROGRESS; a look at
> the code shows that these are counted the same as plain live tuples,
> but they'd not be visible to other transactions.  I wonder if you
> could
> have any really old open transactions that might have inserted all
> those
> tuples?
>

Unlikely- the database has been stopped and restarted, which I think
closes
out transactions?  Or could that cause the problems?


> 2. The other 300k tuples are committed good, but they are not seen as
> valid by a normal MVCC-aware transaction, probably because of
> transaction wraparound.  This would require the sl_log_1 table to have
> escaped vacuuming for more than 2 billion transactions, which seems a
> bit improbable but maybe not impossible.  (You did say you were
> running
> PG 8.0.x, right?  That's the last version without any strong defenses
> against transaction wraparound...)

Yep, this 8.0.4.  It has been running for over a year, fairly heavy
updates, so
I would guess its possible.

> The way to get some facts, instead of speculating, would be to get
> hold
> of the appropriate version of pg_filedump from
> http://sources.redhat.com/rhdb/ and dump out sl_log_1 with it
> (probably the -i option would be sufficient), then take a close look
> at the tuples that aren't visible to other transactions.  (You could
> do "select ctid from sl_log_1" to determine which ones are visible.)
>

Okay, I've grabbed pg_filedump and got it running on the appropriate
server.
I really have No Idea how to read its output though.  Where does the
ctid from sl_log_1
appear in the following listing?


Block    0 ********************************************************
<Header> -----
Block Offset: 0x00000000         Offsets: Lower      20 (0x0014)
Block: Size 8192  Version    2            Upper    8176 (0x1ff0)
LSN:  logid    949 recoff 0xae63b06c      Special  8176 (0x1ff0)
Items:    0                   Free Space: 8156
Length (including item array): 24

BTree Meta Data:  Magic (0x00053162)   Version (2)
                    Root:     Block (1174413)  Level (3)
                    FastRoot: Block (4622)  Level (1)

<Data> ------
Empty block - no items listed

<Special Section> -----
BTree Index Section:
   Flags: 0x0008 (META)
   Blocks: Previous (0)  Next (0)  Level (0)


.../this was taken from the first page file (955960160.0 I guess you
could
call it).  Does this look interesting to you, Tom?

FWIW- this IS on my master DB.  I've been slowly preparing an upgrade
to 8.2, I guess
I'd better get that inta gear, hmmm?  :-(

/kurt



>             regards, tom lane


Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> Okay, I've grabbed pg_filedump and got it running on the appropriate
> server.
> I really have No Idea how to read its output though.  Where does the
> ctid from sl_log_1
> appear in the following listing?

ctid is (block number, item number)

> Block    0 ********************************************************
> BTree Meta Data:  Magic (0x00053162)   Version (2)
>                     Root:     Block (1174413)  Level (3)
>                     FastRoot: Block (4622)  Level (1)

This seems to be an index, not the sl_log_1 table.

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
OOooooookaaaaaaaaay.   Since the discussion has wandered a bit I just
wanted to restate things in an effort to clear the problem in my head.

Okay, so the sl_log_1 TABLE looks okay.  Its the indexes that seem to
be messed up, specifically sl_log_1_idx1 seems to think that there's
 > 300,000 rows in the table its associated with.  I just want to fix
the index, really.  So my question remains:

Its it okay to dump and recreate that index (or reindex it) while the
servers are down and the database is not being accessed?

Tom, Bill, Chris and Richard, thank you so much for your thoughts on
this matter so far.  It helps to not feel "so alone" when dealing
with difficult issues (for me anyway) on a system I don't know so
much about.

Thanks guys,

/kurt

On Jun 19, 2007, at 10:51 PM, Tom Lane wrote:

> Kurt Overberg <kurt@hotdogrecords.com> writes:
>> Okay, I've grabbed pg_filedump and got it running on the appropriate
>> server.
>> I really have No Idea how to read its output though.  Where does the
>> ctid from sl_log_1
>> appear in the following listing?
>
> ctid is (block number, item number)
>
>> Block    0 ********************************************************
>> BTree Meta Data:  Magic (0x00053162)   Version (2)
>>                     Root:     Block (1174413)  Level (3)
>>                     FastRoot: Block (4622)  Level (1)
>
> This seems to be an index, not the sl_log_1 table.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend


Re: Maintenance question / DB size anomaly...

From
Bill Moran
Date:
In response to Kurt Overberg <kurt@hotdogrecords.com>:

> OOooooookaaaaaaaaay.   Since the discussion has wandered a bit I just
> wanted to restate things in an effort to clear the problem in my head.
>
> Okay, so the sl_log_1 TABLE looks okay.  Its the indexes that seem to
> be messed up, specifically sl_log_1_idx1 seems to think that there's
>  > 300,000 rows in the table its associated with.  I just want to fix
> the index, really.  So my question remains:

Apologies, I must have misunderstood some previous message.

> Its it okay to dump and recreate that index (or reindex it) while the
> servers are down and the database is not being accessed?

There are people here who know _way_ more than me -- but I can't see any
reason why you couldn't just REINDEX it while everything is running.  There
may be some performance slowdown during the reindex, but everything should
continue to chug along.  A drop/recreate of the index should be OK as well.

> Tom, Bill, Chris and Richard, thank you so much for your thoughts on
> this matter so far.  It helps to not feel "so alone" when dealing
> with difficult issues (for me anyway) on a system I don't know so
> much about.

:D  Isn't Open Source great!

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Maintenance question / DB size anomaly...

From
Andreas Kostyrka
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Kurt Overberg wrote:
> OOooooookaaaaaaaaay.   Since the discussion has wandered a bit I just
> wanted to restate things in an effort to clear the problem in my head.
>
> Okay, so the sl_log_1 TABLE looks okay.  Its the indexes that seem to be
> messed up, specifically sl_log_1_idx1 seems to think that there's
>> 300,000 rows in the table its associated with.  I just want to fix the
> index, really.  So my question remains:
>
> Its it okay to dump and recreate that index (or reindex it) while the
> servers are down and the database is not being accessed?

Well, I would probably stop the slon daemons => dropping needed indexes
which slony needs can lead to quite a slowdown, and worse, the slowdown
happens because the database server is doing things the wrong way. But
that's mostly what you need to do.

OTOH, depending upon the size of your database, you might consider
starting out from a scratch database.

Andreas

>
> Tom, Bill, Chris and Richard, thank you so much for your thoughts on
> this matter so far.  It helps to not feel "so alone" when dealing
> with difficult issues (for me anyway) on a system I don't know so much
> about.

#slony@Freenode.net, #postgresql@freenode.net are quite helpful, and
sometimes faster than mail.

Andreas

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGeTFtHJdudm4KnO0RAqDaAKDB1/eGqdwtLQdpTJzrChcp4J5M5wCglphW
ljxag882h33fDWXX1ILiUU8=
=jzBw
-----END PGP SIGNATURE-----

Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> Okay, so the sl_log_1 TABLE looks okay.  Its the indexes that seem to
> be messed up, specifically sl_log_1_idx1 seems to think that there's
>>> 300,000 rows in the table its associated with.  I just want to fix
> the index, really.

I'm not sure how you arrive at that conclusion.  The VACUUM VERBOSE
output you provided here:
http://archives.postgresql.org/pgsql-performance/2007-06/msg00370.php
shows clearly that there are lots of rows in the table as well as
the indexes.  A REINDEX would certainly cut the size of the indexes
but it isn't going to do anything about the extraneous rows.

When last heard from, you were working on getting pg_filedump output for
some of the bogus rows --- what was the result?

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
Dang it, Tom, don't you ever get tired of being right?  I guess I had
been focusing
on the index numbers since they came up first, and its the index
files that are > 10Gb.

Okay, so I did some digging with pg_filedump, and found the following:

.
.
.
.
Block  406 ********************************************************
<Header> -----
Block Offset: 0x0032c000         Offsets: Lower     208 (0x00d0)
Block: Size 8192  Version    2            Upper     332 (0x014c)
LSN:  logid    950 recoff 0x9ebcc6e4      Special  8192 (0x2000)
Items:   47                   Free Space:  124
Length (including item array): 212

<Data> ------
Item   1 -- Length:  472  Offset: 7720 (0x1e28)  Flags: USED
   XMIN: 1489323584  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
   Block Id: 406  linp Index: 1   Attributes: 6   Size: 32
   infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

Item   2 -- Length:  185  Offset: 7532 (0x1d6c)  Flags: USED
   XMIN: 1489323584  CMIN: 4  XMAX: 0  CMAX|XVAC: 0
   Block Id: 406  linp Index: 2   Attributes: 6   Size: 32
   infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

Item   3 -- Length:  129  Offset: 7400 (0x1ce8)  Flags: USED
   XMIN: 1489323590  CMIN: 2  XMAX: 0  CMAX|XVAC: 0
   Block Id: 406  linp Index: 3   Attributes: 6   Size: 32
   infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

Item   4 -- Length:   77  Offset: 7320 (0x1c98)  Flags: USED
   XMIN: 1489323592  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
   Block Id: 406  linp Index: 4   Attributes: 6   Size: 32
   infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)


...I then looked in the DB:

mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,1)';
log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype |
log_cmddata
------------+---------+-------------+---------------+-------------
+-------------
(0 rows)

mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,2)';
log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype |
log_cmddata
------------+---------+-------------+---------------+-------------
+-------------
(0 rows)

mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,3)';
log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype |
log_cmddata
------------+---------+-------------+---------------+-------------
+-------------
(0 rows)


...is this what you were looking for, Tom?  The only thing that
stands out to me is
the XMAX_INVALID mask.  Thoughts?

Thanks,

/kurt





On Jun 20, 2007, at 11:22 AM, Tom Lane wrote:

> Kurt Overberg <kurt@hotdogrecords.com> writes:
>> Okay, so the sl_log_1 TABLE looks okay.  Its the indexes that seem to
>> be messed up, specifically sl_log_1_idx1 seems to think that there's
>>>> 300,000 rows in the table its associated with.  I just want to fix
>> the index, really.
>
> I'm not sure how you arrive at that conclusion.  The VACUUM VERBOSE
> output you provided here:
> http://archives.postgresql.org/pgsql-performance/2007-06/msg00370.php
> shows clearly that there are lots of rows in the table as well as
> the indexes.  A REINDEX would certainly cut the size of the indexes
> but it isn't going to do anything about the extraneous rows.
>
> When last heard from, you were working on getting pg_filedump
> output for
> some of the bogus rows --- what was the result?
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> Okay, so I did some digging with pg_filedump, and found the following:

> Block  406 ********************************************************
> Item   1 -- Length:  472  Offset: 7720 (0x1e28)  Flags: USED
>    XMIN: 1489323584  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
>    Block Id: 406  linp Index: 1   Attributes: 6   Size: 32
>    infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

This is pretty much what you'd expect for a never-updated tuple...

> mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,1)';
> log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype |
> log_cmddata
> ------------+---------+-------------+---------------+-------------
> +-------------
> (0 rows)

so I have to conclude that you've got a wraparound problem.  What is the
current XID counter?  (pg_controldata will give you that, along with a
lot of other junk.)  It might also be interesting to take a look at
"ls -l $PGDATA/pg_clog"; the mod times on the files in there would give
us an idea how fast XIDs are being consumed.

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
Drat!  I'm wrong again.  I thought for sure there wouldn't be a
wraparound problem.
So does this affect the entire database server, or just this table?
Is best way to
proceed to immediately ditch this db and promote one of my slaves to
a master?  I'm just
concerned about the data integrity.  Note that I don't use OID for
anything really, so I'm
hoping I'll be safe.

Thanks again, Tom.

/kurt


pg_controldata output:

-bash-3.00$ pg_controldata
pg_control version number:            74
Catalog version number:               200411041
Database system identifier:           4903924957417782767
Database cluster state:               in production
pg_control last modified:             Wed 20 Jun 2007 03:19:52 PM CDT
Current log file ID:                  952
Next log file segment:                154
Latest checkpoint location:           3B8/920F0D78
Prior checkpoint location:            3B8/8328E4A4
Latest checkpoint's REDO location:    3B8/9200BBF0
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          1490547335
Latest checkpoint's NextOID:          3714961319
Time of latest checkpoint:            Wed 20 Jun 2007 03:17:50 PM CDT
Database block size:                  8192
Blocks per segment of large relation: 131072
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum number of function arguments: 32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.UTF-8
LC_CTYPE:                             en_US.UTF-8
-bash-3.00$ echo $PGDATA


Here's the list from pg_clog for June:

-rw-------  1 postgres postgres 262144 Jun  1 03:36 054D
-rw-------  1 postgres postgres 262144 Jun  1 08:16 054E
-rw-------  1 postgres postgres 262144 Jun  1 10:24 054F
-rw-------  1 postgres postgres 262144 Jun  1 17:03 0550
-rw-------  1 postgres postgres 262144 Jun  2 03:32 0551
-rw-------  1 postgres postgres 262144 Jun  2 10:04 0552
-rw-------  1 postgres postgres 262144 Jun  2 19:24 0553
-rw-------  1 postgres postgres 262144 Jun  3 03:38 0554
-rw-------  1 postgres postgres 262144 Jun  3 13:19 0555
-rw-------  1 postgres postgres 262144 Jun  4 00:02 0556
-rw-------  1 postgres postgres 262144 Jun  4 07:12 0557
-rw-------  1 postgres postgres 262144 Jun  4 12:37 0558
-rw-------  1 postgres postgres 262144 Jun  4 19:46 0559
-rw-------  1 postgres postgres 262144 Jun  5 03:36 055A
-rw-------  1 postgres postgres 262144 Jun  5 10:54 055B
-rw-------  1 postgres postgres 262144 Jun  5 18:11 055C
-rw-------  1 postgres postgres 262144 Jun  6 03:38 055D
-rw-------  1 postgres postgres 262144 Jun  6 10:15 055E
-rw-------  1 postgres postgres 262144 Jun  6 15:10 055F
-rw-------  1 postgres postgres 262144 Jun  6 23:21 0560
-rw-------  1 postgres postgres 262144 Jun  7 07:15 0561
-rw-------  1 postgres postgres 262144 Jun  7 13:43 0562
-rw-------  1 postgres postgres 262144 Jun  7 22:53 0563
-rw-------  1 postgres postgres 262144 Jun  8 07:12 0564
-rw-------  1 postgres postgres 262144 Jun  8 14:42 0565
-rw-------  1 postgres postgres 262144 Jun  9 01:30 0566
-rw-------  1 postgres postgres 262144 Jun  9 09:19 0567
-rw-------  1 postgres postgres 262144 Jun  9 20:19 0568
-rw-------  1 postgres postgres 262144 Jun 10 03:39 0569
-rw-------  1 postgres postgres 262144 Jun 10 15:38 056A
-rw-------  1 postgres postgres 262144 Jun 11 03:34 056B
-rw-------  1 postgres postgres 262144 Jun 11 09:14 056C
-rw-------  1 postgres postgres 262144 Jun 11 13:59 056D
-rw-------  1 postgres postgres 262144 Jun 11 19:41 056E
-rw-------  1 postgres postgres 262144 Jun 12 03:37 056F
-rw-------  1 postgres postgres 262144 Jun 12 09:59 0570
-rw-------  1 postgres postgres 262144 Jun 12 17:23 0571
-rw-------  1 postgres postgres 262144 Jun 13 03:32 0572
-rw-------  1 postgres postgres 262144 Jun 13 09:16 0573
-rw-------  1 postgres postgres 262144 Jun 13 16:25 0574
-rw-------  1 postgres postgres 262144 Jun 14 01:28 0575
-rw-------  1 postgres postgres 262144 Jun 14 08:40 0576
-rw-------  1 postgres postgres 262144 Jun 14 15:07 0577
-rw-------  1 postgres postgres 262144 Jun 14 22:00 0578
-rw-------  1 postgres postgres 262144 Jun 15 03:36 0579
-rw-------  1 postgres postgres 262144 Jun 15 12:21 057A
-rw-------  1 postgres postgres 262144 Jun 15 18:10 057B
-rw-------  1 postgres postgres 262144 Jun 16 03:32 057C
-rw-------  1 postgres postgres 262144 Jun 16 09:17 057D
-rw-------  1 postgres postgres 262144 Jun 16 19:32 057E
-rw-------  1 postgres postgres 262144 Jun 17 03:39 057F
-rw-------  1 postgres postgres 262144 Jun 17 13:26 0580
-rw-------  1 postgres postgres 262144 Jun 17 23:11 0581
-rw-------  1 postgres postgres 262144 Jun 18 04:40 0582
-rw-------  1 postgres postgres 262144 Jun 18 12:23 0583
-rw-------  1 postgres postgres 262144 Jun 18 17:22 0584
-rw-------  1 postgres postgres 262144 Jun 18 19:40 0585
-rw-------  1 postgres postgres 262144 Jun 19 03:38 0586
-rw-------  1 postgres postgres 262144 Jun 19 09:30 0587
-rw-------  1 postgres postgres 262144 Jun 19 10:23 0588
-rw-------  1 postgres postgres 262144 Jun 19 16:10 0589
-rw-------  1 postgres postgres 262144 Jun 19 21:45 058A
-rw-------  1 postgres postgres 262144 Jun 20 03:38 058B
-rw-------  1 postgres postgres 262144 Jun 20 12:17 058C
-rw-------  1 postgres postgres 131072 Jun 20 15:13 058D


On Jun 20, 2007, at 2:37 PM, Tom Lane wrote:
>
> so I have to conclude that you've got a wraparound problem.  What
> is the
> current XID counter?  (pg_controldata will give you that, along with a
> lot of other junk.)  It might also be interesting to take a look at
> "ls -l $PGDATA/pg_clog"; the mod times on the files in there would
> give
> us an idea how fast XIDs are being consumed.
>
>             regards, tom lane


Re: Maintenance question / DB size anomaly...

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> Drat!  I'm wrong again.  I thought for sure there wouldn't be a
> wraparound problem.

Well, I'm not sure what it is now.  You showed some invisible tuples
with XMINs of
   XMIN: 1489323584  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
   XMIN: 1489323590  CMIN: 2  XMAX: 0  CMAX|XVAC: 0
   XMIN: 1489323592  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
but the nextXID is
         1490547335
which is not that far ahead of those --- about 1.2 million transactions,
or less than a day's time according to the clog timestamps, which
suggest that you're burning several million XIDs a day.  Perhaps you've
wrapped past them since your earlier check --- if you try the same
"select where ctid = " queries now, do they show rows?

The other thing that's strange here is that an 8.0 installation should
be pretty aggressive about recycling pg_clog segments, and yet you've
got a bunch there.  How far back do the files in pg_clog go --- what's
the numeric range of the filenames, and the date range of their mod
times?  Have you checked the postmaster log to see if you're getting any
complaints about checkpoint failures or anything like that?  It would
also be useful to look at the output of
select datname, age(datfrozenxid) from pg_database;

            regards, tom lane

Re: Maintenance question / DB size anomaly...

From
Kurt Overberg
Date:
Okay,


  select * from _my_cluster.sl_log_1 where ctid = '(1,1)';
  select * from _my_cluster.sl_log_1 where ctid = '(1,2)';
  select * from _my_cluster.sl_log_1 where ctid = '(1,3)';
  select * from _my_cluster.sl_log_1 where ctid = '(1,4)';

all returns zero rows.  When I do a dump of that file, I get:

Block    1 ********************************************************
<Header> -----
Block Offset: 0x00002000         Offsets: Lower     408 (0x0198)
Block: Size 8192  Version    2            Upper    7680 (0x1e00)
LSN:  logid    955 recoff 0x0daed68c      Special  8192 (0x2000)
Items:   97                   Free Space: 7272
Length (including item array): 412

<Data> ------
Item   1 -- Length:  121  Offset: 8068 (0x1f84)  Flags: USED
   XMIN: 1491480520  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
   Block Id: 1  linp Index: 1   Attributes: 6   Size: 32
   infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

...the fact that they weren't in the table, but in the file (I did
the filedump first,
then the query), then redid the filedump, the results are the same,
the rows are still
in the file.  I have no idea how frequently these files are getting
written to, I assume
frequently.  I also looked at the last block listed in the file,
6445, and also looked for
items 1-4, and also did not find them in the table using a similar
select as above.  That seems
kinda strange, since there's right this second 11,000 items in that
table, but I'll roll with it for awhile.

Intrigued, I wanted to see what a filedump looked like of a row that
WAS in the table:

ctid   | log_origin |  log_xid   | log_tableid | log_actionseq |
log_cmdtype |
  (7,1) |         10 | 1491481037 |           8 |     473490934 |
I           | (memberid,answerid,taskinstanceid) values
('144854','148707','0')



Block    7 ********************************************************
<Header> -----
Block Offset: 0x0000e000         Offsets: Lower     424 (0x01a8)
Block: Size 8192  Version    2            Upper     508 (0x01fc)
LSN:  logid    955 recoff 0x0dc4bcc0      Special  8192 (0x2000)
Items:  101                   Free Space:   84
Length (including item array): 428

<Data> ------
Item   1 -- Length:  129  Offset: 8060 (0x1f7c)  Flags: USED
   XMIN: 1491481037  CMIN: 7  XMAX: 0  CMAX|XVAC: 0
   Block Id: 7  linp Index: 1   Attributes: 6   Size: 32
   infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)


...the NextID was (taken about 5 mins after the previous filedump):
  Latest checkpoint's NextXID:          1491498183


I don't see any real differences in the file entry for a row that is
in the table, and one that I
don't see in the table.  I hope I'm getting this right, its totally
fascinating seeing how
all this works.

About your other questions:

1.  I have pg_clog segments all the way back to the start of the
database, all the way back
to March 14th, 2006 (most likely when the database was first brought
up on this machine).
The numeric names start at 0000 and go to 058E.  I checked the recent
(within last 8 days)
and saw no errors containing the word 'checkpoint'.  In fact, very
few errors at all.
The dang thing seems to be running pretty well, just a little slow.

mydb=# select datname, age(datfrozenxid) from pg_database;
   datname  |    age
-----------+------------
template1 | 1491520270
template0 | 1491520270
postgres  | 1491520270
mydb      | 1076194904


Oooooooo..... thats not good, is it?  Thanks for taking an interest,
Tom.  I'm most likely going to
promote one of my subscribers to be master, then nuke this database,
but I have no problems keeping it
around if you think I may have found some obscure bug that could help
someone debug.  Again, this
DB gets vacuumed every day, and in the beginning, I think I remember
doing a vacuum full every
day.

Thanks,

/kurt


On Jun 20, 2007, at 5:08 PM, Tom Lane wrote:

> Kurt Overberg <kurt@hotdogrecords.com> writes:
>> Drat!  I'm wrong again.  I thought for sure there wouldn't be a
>> wraparound problem.
>
> Well, I'm not sure what it is now.  You showed some invisible tuples
> with XMINs of
>    XMIN: 1489323584  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
>    XMIN: 1489323590  CMIN: 2  XMAX: 0  CMAX|XVAC: 0
>    XMIN: 1489323592  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
> but the nextXID is
>          1490547335
> which is not that far ahead of those --- about 1.2 million
> transactions,
> or less than a day's time according to the clog timestamps, which
> suggest that you're burning several million XIDs a day.  Perhaps
> you've
> wrapped past them since your earlier check --- if you try the same
> "select where ctid = " queries now, do they show rows?
>
> The other thing that's strange here is that an 8.0 installation should
> be pretty aggressive about recycling pg_clog segments, and yet you've
> got a bunch there.  How far back do the files in pg_clog go --- what's
> the numeric range of the filenames, and the date range of their mod
> times?  Have you checked the postmaster log to see if you're
> getting any
> complaints about checkpoint failures or anything like that?  It would
> also be useful to look at the output of
> select datname, age(datfrozenxid) from pg_database;
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq