Re: pg_xlog becomes extremely large during CREATE INDEX - Mailing list pgsql-general

From Jeffrey W. Baker
Subject Re: pg_xlog becomes extremely large during CREATE INDEX
Date
Msg-id 1084422703.15941.11.camel@noodles
Whole thread Raw
In response to Re: pg_xlog becomes extremely large during CREATE INDEX  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_xlog becomes extremely large during CREATE INDEX
Re: pg_xlog becomes extremely large during CREATE INDEX
List pgsql-general
On Wed, 2004-05-12 at 20:18, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > I have configured 24 checkpoint segments,
> > which I expect gives me a worst-case usage in pg_xlog of 384MB.
> > Unfortunately, during the CREATE INDEX, pg_xlog becomes full!
>
> Postgres version?  AFAIR 7.1.2 or thereabouts would act that way,
> because it wouldn't discard log segments containing work of open
> transactions.

This is 7.4.1.

> The only reason I can think of for such behavior in a current version
> is if the checkpoint isn't completing for some reason.  Have you looked
> in the postmaster log to see if checkpoint is bleating about anything?

Oh sure, it's bleating.  Apparently my computer is too fast:

syslog:May 12 21:20:32 mistral postgres[17458]: [214-1] LOG:  checkpoints are occurring too frequently (12 seconds
apart)
syslog:May 12 21:20:32 mistral postgres[17458]: [214-2] HINT:  Consider increasing the configuration parameter
"checkpoint_segments".

That's also what preceeded the most recent crash:

May 12 16:36:53 mistral postgres[506]: [173-1] LOG:  checkpoints are occurring too frequently (16 seconds apart)
May 12 16:36:53 mistral postgres[506]: [173-2] HINT:  Consider increasing the configuration parameter
"checkpoint_segments".
May 12 16:37:08 mistral postgres[506]: [174-1] LOG:  server process (PID 16403) was terminated by signal 6
May 12 16:37:08 mistral postgres[506]: [175-1] LOG:  terminating any other active server processes
May 12 16:37:08 mistral postgres[26212]: [3-1] WARNING:  terminating connection because of crash of another server
process
May 12 16:37:08 mistral postgres[26212]: [3-2] DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server 
May 12 16:37:08 mistral postgres[26212]: [3-3]  process exited abnormally and possibly corrupted shared memory.
May 12 16:37:08 mistral postgres[26212]: [3-4] HINT:  In a moment you should be able to reconnect to the database and
repeatyour command. 
May 12 16:37:08 mistral postgres[14396]: [1-1] WARNING:  terminating connection because of crash of another server
process
May 12 16:37:08 mistral postgres[14396]: [1-2] DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server 
May 12 16:37:08 mistral postgres[14396]: [1-3]  process exited abnormally and possibly corrupted shared memory.
May 12 16:37:08 mistral postgres[14396]: [1-4] HINT:  In a moment you should be able to reconnect to the database and
repeatyour command. 
May 12 16:37:08 mistral postgres[14396]: [1-5] CONTEXT:  writing block 56889 of relation 17142/289600495
May 12 16:37:08 mistral postgres[506]: [176-1] LOG:  all server processes terminated; reinitializing
May 12 16:37:08 mistral postgres[17086]: [177-1] LOG:  database system was interrupted at 2004-05-12 16:36:53 PDT
May 12 16:37:08 mistral postgres[17086]: [178-1] LOG:  checkpoint record is at 6/92B32D48
May 12 16:37:08 mistral postgres[17086]: [179-1] LOG:  redo record is at 6/92B32D48; undo record is at 0/0; shutdown
FALSE
May 12 16:37:08 mistral postgres[17086]: [180-1] LOG:  next transaction ID: 10232; next OID: 289600632
May 12 16:37:08 mistral postgres[17086]: [181-1] LOG:  database system was not properly shut down; automatic recovery
inprogress 
May 12 16:37:08 mistral postgres[17086]: [182-1] LOG:  redo starts at 6/92B32D88
May 12 16:45:29 mistral postgres[17086]: [183-1] LOG:  could not open file
"/var/lib/postgres/data/pg_xlog/0000000700000071"(log file 7, segment 113): No such file or directory 
May 12 16:45:29 mistral postgres[17086]: [184-1] LOG:  redo done at 7/70FFC400
May 12 16:45:50 mistral postgres[506]: [177-1] LOG:  startup process (PID 17086) was terminated by signal 6
May 12 16:45:50 mistral postgres[506]: [178-1] LOG:  aborting startup due to startup process failure

I also have many "recycled" messages:

May 12 13:10:01 mistral postgres[16373]: [20-1] LOG:  recycled transaction log file "0000000600000078"
May 12 13:10:01 mistral postgres[16373]: [21-1] LOG:  recycled transaction log file "0000000600000079"
May 12 13:10:01 mistral postgres[16373]: [22-1] LOG:  recycled transaction log file "000000060000007A"
May 12 13:10:01 mistral postgres[16373]: [23-1] LOG:  recycled transaction log file "000000060000007B"
May 12 13:10:01 mistral postgres[16373]: [24-1] LOG:  recycled transaction log file "000000060000007C"
May 12 13:20:03 mistral postgres[16408]: [1-1] LOG:  removing transaction log file "0000000600000090"
May 12 13:20:03 mistral postgres[16408]: [2-1] LOG:  removing transaction log file "0000000600000091"

> Note that your calculation is off anyway: per documentation, recent
> versions will allow up to 2*checkpoint_segments+1 WAL files to exist.
> This is because we save the last two checkpoints worth of data.

That's why I gave it an order of magnitude leeway.  Such calculations
are always off by a factor or two.

It's doing it again at this very moment, I get the feeling that pg_xlog
is converging on the same size as $PGDATA/base/$OID/pgsql_tmp:

mistral:/backup_temp# du -hs pg_xlog /var/lib/postgres/data/base/17142/pgsql_tmp/
4.8G    pg_xlog
5.4G    /var/lib/postgres/data/base/17142/pgsql_tmp

-jwb

pgsql-general by date:

Previous
From: Dennis Gearon
Date:
Subject: Functionality in database or external langauge
Next
From: "Jeffrey W. Baker"
Date:
Subject: Re: pg_xlog becomes extremely large during CREATE INDEX