Thread: WAL partition overloaded--by autovacuum?

WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
Hi everyone,

I'm running 8.4.2 on a CentOS machine, and postgres recently died with signal 6 because the pg_xlog partition filled up
(33GB)on 7/4/10 10:34:23 (perfect timing, as I was hiking in the mountains in the remotest parts of our country).  I
didsome digging and found the following: 

-- The current state of /db/data/pg_xlog (after postgres died on 7/4/10) indicates there are 2056 files, all created
between7/4/10 10:01 and 7/4/10 10:34, and a quick bash-kung-fu check shows that these files were created anywhere from
3per minute (10:22) to 106 per minute (10:12) 
-- wal_buffers=8MB; commit_siblings=200; commit_delay=0; checkpoint_segments=16
-- All other WAL-related parameters are either defaults or commented out

-- syslog shows that on 7/2/10 from 10:16:11 to 10:21:51, messages similar to this occurred 29 times:
> Jul  2 10:16:11 db4.sac postgres[20526]: [56-1] 2010-07-02 10:16:11.117 PDT [user=,db=  PID:20526 XID:0]LOG:
checkpointsare occurring too frequently (4 seconds apart) 
> Jul  2 10:16:11 db4.sac postgres[20526]: [56-2] 2010-07-02 10:16:11.117 PDT [user=,db=  PID:20526 XID:0]HINT:
Considerincreasing the configuration parameter "checkpoint_segments". 

-- On 7/4/10 from 09:09:02 to 09:10:08, the same type of messages occurred 21 times in syslog
-- This message did not occur on 7/3/10 at all

-- Preceding the "checkpoints are occurring too frequently" syslog entries are autovacuum entries:
> Jul  2 10:16:04 db4.sac postgres[11357]: [7-1] 2010-07-02 10:16:04.576 PDT [user=,db=  PID:11357 XID:0]LOG:
automaticvacuum of table "tii._sac_global.sl_event": index scans: 1 
> Jul  2 10:16:04 db4.sac postgres[11357]: [7-2]     pages: 0 removed, 25 remain
> Jul  2 10:16:04 db4.sac postgres[11357]: [7-3]     tuples: 676 removed, 955 remain
> Jul  2 10:16:04 db4.sac postgres[11357]: [7-4]     system usage: CPU 0.01s/0.00u sec elapsed 0.02 sec
> Jul  2 10:16:04 db4.sac postgres[11357]: [8-1] 2010-07-02 10:16:04.580 PDT [user=,db=  PID:11357 XID:197431667]LOG:
automaticanalyze of table "tii._sac_global.sl_event" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec 
> Jul  2 10:16:04 db4.sac postgres[11357]: [9-1] 2010-07-02 10:16:04.965 PDT [user=,db=  PID:11357 XID:0]LOG:
automaticvacuum of table "tii._sac_global.sl_confirm": index scans: 1 
> Jul  2 10:16:04 db4.sac postgres[11357]: [9-2]     pages: 0 removed, 154 remain
> Jul  2 10:16:04 db4.sac postgres[11357]: [9-3]     tuples: 1834 removed, 8385 remain
> Jul  2 10:16:04 db4.sac postgres[11357]: [9-4]     system usage: CPU 0.32s/0.04u sec elapsed 0.37 sec

> Jul  4 09:08:56 db4.sac postgres[21798]: [13-1] 2010-07-04 09:08:56.107 PDT [user=,db=  PID:21798 XID:0]LOG:
automaticvacuum of table "tii.public.city": index scans: 0 
> Jul  4 09:08:56 db4.sac postgres[21798]: [13-2]    pages: 0 removed, 151 remain
> Jul  4 09:08:56 db4.sac postgres[21798]: [13-3]    tuples: 0 removed, 20223 remain
> Jul  4 09:08:56 db4.sac postgres[21798]: [13-4]    system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec
> Jul  4 09:08:56 db4.sac postgres[21798]: [14-1] 2010-07-04 09:08:56.118 PDT [user=,db=  PID:21798 XID:0]LOG:
automaticvacuum of table "tii.public.country": index scans: 1 
> Jul  4 09:08:56 db4.sac postgres[21798]: [14-2]    pages: 0 removed, 2 remain
> Jul  4 09:08:56 db4.sac postgres[21798]: [14-3]    tuples: 77 removed, 185 remain
> Jul  4 09:08:56 db4.sac postgres[21798]: [14-4]    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> Jul  4 09:08:56 db4.sac postgres[21798]: [15-1] 2010-07-04 09:08:56.335 PDT [user=,db=  PID:21798 XID:0]LOG:
automaticvacuum of table "tii.public.gm3_clipboard": index scans: 1 
> Jul  4 09:08:56 db4.sac postgres[21798]: [15-2]    pages: 0 removed, 897 remain
> Jul  4 09:08:56 db4.sac postgres[21798]: [15-3]    tuples: 2 removed, 121594 remain
> Jul  4 09:08:56 db4.sac postgres[21798]: [15-4]    system usage: CPU 0.07s/0.08u sec elapsed 0.20 sec
...snip...
> Jul  4 09:10:25 db4.sac postgres[22066]: [23-1] 2010-07-04 09:10:25.921 PDT [user=,db=  PID:22066 XID:0]LOG:
automaticvacuum of table "tii.public.pm_assignment": index scans: 1 
> Jul  4 09:10:25 db4.sac postgres[22066]: [23-2]    pages: 0 removed, 995 remain
> Jul  4 09:10:25 db4.sac postgres[22066]: [23-3]    tuples: 323 removed, 83964 remain
> Jul  4 09:10:25 db4.sac postgres[22066]: [23-4]    system usage: CPU 0.01s/0.09u sec elapsed 0.52 sec
> Jul  4 09:10:25 db4.sac postgres[22073]: [7-1] 2010-07-04 09:10:25.978 PDT [user=,db=  PID:22073 XID:0]LOG:
automaticvacuum of table "tii.public.pm_question_type": index scans: 0 
> Jul  4 09:10:25 db4.sac postgres[22073]: [7-2]     pages: 0 removed, 1 remain
> Jul  4 09:10:25 db4.sac postgres[22073]: [7-3]     tuples: 0 removed, 2 remain
> Jul  4 09:10:25 db4.sac postgres[22073]: [7-4]     system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> Jul  4 09:10:26 db4.sac postgres[22066]: [24-1] 2010-07-04 09:10:26.301 PDT [user=,db=  PID:22066 XID:0]LOG:
automaticvacuum of table "tii.public.pm_association_rule": index scans: 1 
> Jul  4 09:10:26 db4.sac postgres[22066]: [24-2]    pages: 0 removed, 286 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [24-3]    tuples: 46 removed, 51321 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [24-4]    system usage: CPU 0.00s/0.03u sec elapsed 0.34 sec
> Jul  4 09:10:26 db4.sac postgres[22066]: [25-1] 2010-07-04 09:10:26.328 PDT [user=,db=  PID:22066 XID:0]LOG:
automaticvacuum of table "tii.public.role": index scans: 0 
> Jul  4 09:10:26 db4.sac postgres[22066]: [25-2]    pages: 0 removed, 1 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [25-3]    tuples: 0 removed, 5 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [25-4]    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> Jul  4 09:10:26 db4.sac postgres[22066]: [26-1] 2010-07-04 09:10:26.373 PDT [user=,db=  PID:22066 XID:0]LOG:
automaticvacuum of table "tii.public.pm_review_type": index scans: 0 
> Jul  4 09:10:26 db4.sac postgres[22066]: [26-2]    pages: 0 removed, 1 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [26-3]    tuples: 0 removed, 4 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [26-4]    system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
> Jul  4 09:10:26 db4.sac postgres[22066]: [27-1] 2010-07-04 09:10:26.428 PDT [user=,db=  PID:22066 XID:0]LOG:
automaticvacuum of table "tii.public.permission": index scans: 0 
> Jul  4 09:10:26 db4.sac postgres[22066]: [27-2]    pages: 0 removed, 1 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [27-3]    tuples: 0 removed, 8 remain
> Jul  4 09:10:26 db4.sac postgres[22066]: [27-4]    system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec

-- min_duration_statement=1000, so there is a chance that the volume of really-fast write traffic might be masked, but
I'mdoubtful as this is our low-usage season (we run a web service for the education sector). 

-- Another tidbit of information is that I am using slony 2.0.3 for replication, and my sync_interval is 500.  However,
I'mdoubtful that slony traffic is responsible because my other nodes use the exact same config and hardware, and none
ofthem had this issue. 

-- My last administrative action on this machine was to SIGHUP this machine to push changes to pg_hba.conf on 6/29/10.
Justnow, I diff-ed my postgresql.conf file with a copy that I keep in svn--no changes at all. 

This leads me to believe that there was a sudden flurry of write activity that occurred, and the process that would
flushWAL files to /db/data/ couldn't keep up, thereby filling up the disk.  I'm wondering if anyone else out there
mightbe able to give me some insight or comments to my assessment--is it accurate?  Any input would be helpful, and
I'lltry to make necessary architectural changes to keep this from happening again. 

Your help is much appreciated!
--Richard

Re: WAL partition overloaded--by autovacuum?

From
Scott Marlowe
Date:
On Tue, Jul 6, 2010 at 7:10 PM, Richard Yen <richyen@iparadigms.com> wrote:
> This leads me to believe that there was a sudden flurry of write activity that occurred, and the process that would
flushWAL files to /db/data/ couldn't keep up, thereby filling up the disk.  I'm wondering if anyone else out there
mightbe able to give me some insight or comments to my assessment--is it accurate?  Any input would be helpful, and
I'lltry to make necessary architectural changes to keep this from happening again. 

I tend to agree.  What kind of disk setup is under your data store?
If it's RAID-5 or RAID-6 is it possibly degraded?

I'd run iostat -xd 10 while the write load it high to see how your
xlog partition %util compares to the main data store partition.

Do you have your main data store partition under a battery backed
caching RAID controller?  Tell us what you can about your hardware
setup.

Re: WAL partition overloaded--by autovacuum?

From
Scott Marlowe
Date:
On Tue, Jul 6, 2010 at 7:10 PM, Richard Yen <richyen@iparadigms.com> wrote:

One more thing, do you have long running transactions during these periods?

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Mark Kirkwood
Date:
On 07/07/10 13:10, Richard Yen wrote:
>
> This leads me to believe that there was a sudden flurry of write activity that occurred, and the process that would
flushWAL files to /db/data/ couldn't keep up, thereby filling up the disk.  I'm wondering if anyone else out there
mightbe able to give me some insight or comments to my assessment--is it accurate?  Any input would be helpful, and
I'lltry to make necessary architectural changes to keep this from happening again. 
>

Do you have wal archiving enabled? (if so lets see your archive_command).

Cheers

Mark

Re: WAL partition overloaded--by autovacuum?

From
"Kevin Grittner"
Date:
Richard Yen wrote:

> the pg_xlog partition filled up (33GB)

> checkpoint_segments=16

> a sudden flurry of write activity

Was this database bulk-loaded in some way (like by restoring the
output of pg_dump, for example)?  If so, all rows inserted into all
tables would have the same (or very nearly the same) xmin value. At
some later time, virtually all tuples would need to be rewritten to
freeze them. This would be a logged operation, so WAL files would be
getting created rapidly.  If you have WAL archiving turned on, and
you can't copy the files out as fast as they're being created, this
might happen.

To avoid such a crushing mass freeze at an unpredictable time, we
always run VACUUM FREEZE ANALYZE on a bulk-loaded database before
turning on WAL archiving.

Of course, if this database wasn't bulk-loaded, this can't be your
problem....

-Kevin


Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
Sorry, I forgot to mention that archive_mode is "off" and commented out, and archive command is '' and commented out.

Thanks for following up!
-- Richard

On Jul 7, 2010, at 1:58, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:

> On 07/07/10 13:10, Richard Yen wrote:
>>
>> This leads me to believe that there was a sudden flurry of write activity that occurred, and the process that would
flushWAL files to /db/data/ couldn't keep up, thereby filling up the disk.  I'm wondering if anyone else out there
mightbe able to give me some insight or comments to my assessment--is it accurate?  Any input would be helpful, and
I'lltry to make necessary architectural changes to keep this from happening again. 
>>
>
> Do you have wal archiving enabled? (if so lets see your archive_command).
>
> Cheers
>
> Mark
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
On Jul 6, 2010, at 8:25 PM, Scott Marlowe wrote:

> Tell us what you can about your hardware setup.

Sorry, I made the bad assumption that the hardware setup would be irrelevant--dunno why I thought that.

My hardware setup is 2 FusionIO 160GB drives in a RAID-1 configuration, running on an HP DL360 G5

I think I figured out the problem:

-- I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there
areno spindles. 
-- However, I didn't take into account the fact that pg_xlog might grow in size when autovacuum does its work when
vacuumingto prevent XID wraparound.  I *just* discovered this when one of my other replication nodes decided to die on
meand fill up its disk. 
-- Unfortunately, my db is 114GB (including indexes) or 60GB (without indexes), leaving ~37GB for pg_xlog (since they
aresharing a partition).  So I'm guessing what happened was that when autovacuum ran to prevent XID wraparound, it
takeseach table and changes the XID, and it gets recorded in WAL, causing WAL to bloat.  This this the correct
understanding?

Question for now is, documentation says:
> There will always be at least one WAL segment file, and will normally not be more than (2 +
checkpoint_completion_target)* checkpoint_segments + 1 files. Each segment file is normally 16 MB (though this size can
bealtered when building the server). You can use this to estimate space requirements for WAL. Ordinarily, when old log
segmentfiles are no longer needed, they are recycled (renamed to become the next segments in the numbered sequence).
If,due to a short-term peak of log output rate, there are more than 3 * checkpoint_segments + 1 segment files, the
unneededsegment files will be deleted instead of recycled until the system gets back under this limit. 

This means my pg_xlog partition should be (2 + checkpoint_completion_target) * checkpoint_segments + 1 = 41 files, or
656MB. Then, if there are more than 49 files, unneeded segment files will be deleted, but in this case all segment
filesare needed, so they never got deleted.  Perhaps we should add in the docs that pg_xlog should be the size of the
DBor larger? 

--Richard




Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Robert Haas
Date:
On Wed, Jul 7, 2010 at 3:32 PM, Richard Yen <richyen@iparadigms.com> wrote:
> On Jul 6, 2010, at 8:25 PM, Scott Marlowe wrote:
>
>> Tell us what you can about your hardware setup.
>
> Sorry, I made the bad assumption that the hardware setup would be irrelevant--dunno why I thought that.
>
> My hardware setup is 2 FusionIO 160GB drives in a RAID-1 configuration, running on an HP DL360 G5
>
> I think I figured out the problem:
>
> -- I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when
thereare no spindles. 
> -- However, I didn't take into account the fact that pg_xlog might grow in size when autovacuum does its work when
vacuumingto prevent XID wraparound.  I *just* discovered this when one of my other replication nodes decided to die on
meand fill up its disk. 
> -- Unfortunately, my db is 114GB (including indexes) or 60GB (without indexes), leaving ~37GB for pg_xlog (since they
aresharing a partition).  So I'm guessing what happened was that when autovacuum ran to prevent XID wraparound, it
takeseach table and changes the XID, and it gets recorded in WAL, causing WAL to bloat.  This this the correct
understanding?

That seems logical (and un-fun), but I don't understand how you
managed to fill up 37GB of disk with WAL files.  Every time you fill
up checkpoint_segments * 16MB of WAL files, you ought to get a
checkpoint.  When it's complete, WAL segments completely written
before the start of the checkpoint should be recyclable.  Unless I'm
confused, which apparently I am.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> I don't understand how you managed to fill up 37GB of disk with
> WAL files.  Every time you fill up checkpoint_segments * 16MB of
> WAL files, you ought to get a checkpoint.  When it's complete, WAL
> segments completely written before the start of the checkpoint
> should be recyclable.  Unless I'm confused, which apparently I am.

You're not alone.  At first I was assuming that it was because of
archiving, but the OP says that's turned off.  Unless it had been on
and there wasn't a *restart* after changing the configuration, I
can't see how this could happen, and was hoping someone else could
cast some light on the issue.

The one setting that gave me pause was:

commit_siblings=200

but it doesn't seem like that should matter with:

commit_delay=0;

-Kevin

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
On Jul 8, 2010, at 12:04 PM, Kevin Grittner wrote:

> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I don't understand how you managed to fill up 37GB of disk with
>> WAL files.  Every time you fill up checkpoint_segments * 16MB of
>> WAL files, you ought to get a checkpoint.  When it's complete, WAL
>> segments completely written before the start of the checkpoint
>> should be recyclable.  Unless I'm confused, which apparently I am.
>
> You're not alone.  At first I was assuming that it was because of
> archiving, but the OP says that's turned off.  Unless it had been on
> and there wasn't a *restart* after changing the configuration, I
> can't see how this could happen, and was hoping someone else could
> cast some light on the issue.

I'm fairly confused myself.  I'm beginning to think that because data/base and data/pg_xlog were on the same partition
(/db),when the /db partition filled, up the WAL files couldn't get flushed to data/base, thereby preventing
data/pg_xlogfrom being emptied out, as per the documentation. 

My concern is that--as in the original post--there were moments where 129 WAL files were generated in one minute.  Is
itplausible that this autovacuum could be responsible for this? 

--Richard

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> You're not alone.  At first I was assuming that it was because of
> archiving, but the OP says that's turned off.  Unless it had been on
> and there wasn't a *restart* after changing the configuration,

Yeah, I was less than convinced by his eyeball report of that, too.
"show archive_mode" would be a much more convincing check of the
server's state.  Or would have been, if the server hadn't been restarted
since the problem occurred.

archive_mode on with a bad archive_command would lead directly to the
reported problem ... although it should also lead to pretty obvious
complaints in the postmaster log.

(Hmm ... but those complaints are logged at level WARNING, which as
discussed elsewhere is really lower than LOG.  Should we change them?)

            regards, tom lane

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
On Jul 8, 2010, at 12:27 PM, Tom Lane wrote:
>
> (Hmm ... but those complaints are logged at level WARNING, which as
> discussed elsewhere is really lower than LOG.  Should we change them?)

Hmm, I did a grep on "WARNING" on my log, and the only thing that turns up are the "WARNING:  terminating connection
becauseof crash of another server process" entries when postgres died and shut down when the disks filled up. 

Would this be conclusive evidence that archive_mode=off?

--Richard

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
"Kevin Grittner"
Date:
Richard Yen <richyen@iparadigms.com> wrote:

> there were moments where 129 WAL files were generated in one
> minute.  Is it plausible that this autovacuum could be responsible
> for this?

I don't remember seeing your autovacuum and vacuum config settings,
or an answer to my question about whether there was a bulk load of a
significant portion of current data.  With agressive autovacuum
settings, hitting the freeze threshold for bulk-loaded data could do
that.

-Kevin

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Tom Lane
Date:
Richard Yen <richyen@iparadigms.com> writes:
> My concern is that--as in the original post--there were moments where 129 WAL files were generated in one minute.  Is
itplausible that this autovacuum could be responsible for this? 

That's not a particularly surprising WAL creation rate for a busy
database.  I wouldn't expect autovacuum to cause it by itself, but
that's true only because autovacuum processing is typically throttled
by autovacuum_vacuum_cost_delay.  Perhaps you had that set to zero?

            regards, tom lane

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
On Jul 8, 2010, at 12:50 PM, Tom Lane wrote:

> Richard Yen <richyen@iparadigms.com> writes:
>> My concern is that--as in the original post--there were moments where 129 WAL files were generated in one minute.
Isit plausible that this autovacuum could be responsible for this? 
>
> That's not a particularly surprising WAL creation rate for a busy
> database.  I wouldn't expect autovacuum to cause it by itself, but
> that's true only because autovacuum processing is typically throttled
> by autovacuum_vacuum_cost_delay.  Perhaps you had that set to zero?
>

Ah, yes, autovacuum_vacuum_cost_delay = 0 in my config.  That explains it--guess I'm playing with knives if I set
thingsthat way... 

--Richard


Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Richard Yen
Date:
On Jul 8, 2010, at 12:50 PM, Kevin Grittner wrote:

> Richard Yen <richyen@iparadigms.com> wrote:
>
>> there were moments where 129 WAL files were generated in one
>> minute.  Is it plausible that this autovacuum could be responsible
>> for this?
>
> I don't remember seeing your autovacuum and vacuum config settings,
> or an answer to my question about whether there was a bulk load of a
> significant portion of current data.  With agressive autovacuum
> settings, hitting the freeze threshold for bulk-loaded data could do
> that.
>

Yeah, autovacuum is pretty aggressive, as I recall:
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 400
autovacuum_analyze_threshold = 200
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_freeze_max_age = 200000000
autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = -1

vacuum_cost_delay = 0
vacuum_cost_limit = 200

When you say "bulk-loaded," I suppose that also includes loading the data via slony ADD NODE as well--correct?  I
createdthis node maybe 6 months ago via slony ADD NODE 

Thanks much for your time,
--Richard

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
"Kevin Grittner"
Date:
Richard Yen <richyen@iparadigms.com> wrote:

> When you say "bulk-loaded," I suppose that also includes loading
> the data via slony ADD NODE as well--correct?

I would think so.  Anything which loads a lot of data in relatively
few database transactions would qualify; I would think slony would
do this, as it's generally required to get decent performance.

-Kevin

Re: [Slony1-general] WAL partition overloaded--by autovacuum?

From
Greg Smith
Date:
Richard Yen wrote:
> I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there
areno spindles. 
>


(Rolls eyes) Please be careful about how much SSD Kool-Aid you drink,
and be skeptical of vendor claims. They don't just make latency go away,
particularly on heavy write workloads where the technology is at its
weakest.

Also, random note, I'm seeing way too many FusionIO drive setups where
people don't have any redundancy to cope with a drive failure, because
the individual drives are so expensive they don't have more than one.
Make sure that if you lose one of the drives, you won't have a massive
data loss. Replication might help with that, if you can stand a little
bit of data loss when the SSD dies. Not if--when. Even if you have a
good one they don't last forever.

> This means my pg_xlog partition should be (2 + checkpoint_completion_target) * checkpoint_segments + 1 = 41 files, or
656MB. Then, if there are more than 49 files, unneeded segment files will be deleted, but in this case all segment
filesare needed, so they never got deleted.  Perhaps we should add in the docs that pg_xlog should be the size of the
DBor larger? 
>

Excessive write volume beyond the capacity of the hardware can end up
delaying the normal checkpoint that would have cleaned up all the xlog
files. There's a nasty spiral that can get into I've seen a couple of
times in similar form to what you reported. The pg_xlog should never
exceed the size computed by that formula for very long, but it can burst
above its normal size limits for a little bit. This is already mentioned
as possibility in the manual: "If, due to a short-term peak of log
output rate, there are more than 3 * checkpoint_segments + 1 segment
files, the unneeded segment files will be deleted instead of recycled
until the system gets back under this limit." Autovacuum is an easy way
to get the sort of activity needed to cause this problem, but I don't
know if it's a necessary component to see the problem. You have to be in
an unusual situation before the sum of the xlog files is anywhere close
to the size of the database though.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us