Thread: pg_xlog becomes extremely large during CREATE INDEX

pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
Greetings,

I have a 23GB data table upon which I am building a primary key of three
columns.  The data is mounted in a 137GB device and pg_xlog is mounted
on a separate 3.5GB device.  I have configured 24 checkpoint segments,
which I expect gives me a worst-case usage in pg_xlog of 384MB.
Unfortunately, during the CREATE INDEX, pg_xlog becomes full!

[There's an ls(1) listing at the end of this mail, if you care to see]

The operation created 222 files in pg_xlog before it filled the device,
and there's no indication that it was planning to stop there.  Is there
some factor, other than checkpoint segments, which controls the size of
pg_xlog, or is there some other operation that also writes files in
pg_xlog, or, is the server just running away?

I have reset the xlogs and I'm trying again.

Regards,
Jeffrey

mistral:/pg_xlog_full# ls -l
total 3641692
-rw-------    1 postgres postgres 16777216 May 12 15:51 0000000600000092
-rw-------    1 postgres postgres 16777216 May 12 15:51 0000000600000093
-rw-------    1 postgres postgres 16777216 May 12 15:51 0000000600000094
-rw-------    1 postgres postgres 16777216 May 12 15:51 0000000600000095
-rw-------    1 postgres postgres 16777216 May 12 15:52 0000000600000096
-rw-------    1 postgres postgres 16777216 May 12 15:52 0000000600000097
-rw-------    1 postgres postgres 16777216 May 12 15:52 0000000600000098
-rw-------    1 postgres postgres 16777216 May 12 15:52 0000000600000099
-rw-------    1 postgres postgres 16777216 May 12 15:52 000000060000009A
-rw-------    1 postgres postgres 16777216 May 12 15:52 000000060000009B
-rw-------    1 postgres postgres 16777216 May 12 15:53 000000060000009C
-rw-------    1 postgres postgres 16777216 May 12 15:53 000000060000009D
-rw-------    1 postgres postgres 16777216 May 12 15:53 000000060000009E
-rw-------    1 postgres postgres 16777216 May 12 15:53 000000060000009F
-rw-------    1 postgres postgres 16777216 May 12 15:53 00000006000000A0
-rw-------    1 postgres postgres 16777216 May 12 15:53 00000006000000A1
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A2
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A3
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A4
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A5
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A6
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A7
-rw-------    1 postgres postgres 16777216 May 12 15:54 00000006000000A8
-rw-------    1 postgres postgres 16777216 May 12 15:55 00000006000000A9
-rw-------    1 postgres postgres 16777216 May 12 15:55 00000006000000AA
-rw-------    1 postgres postgres 16777216 May 12 15:55 00000006000000AB
-rw-------    1 postgres postgres 16777216 May 12 15:55 00000006000000AC
-rw-------    1 postgres postgres 16777216 May 12 15:55 00000006000000AD
-rw-------    1 postgres postgres 16777216 May 12 15:55 00000006000000AE
-rw-------    1 postgres postgres 16777216 May 12 15:56 00000006000000AF
-rw-------    1 postgres postgres 16777216 May 12 15:56 00000006000000B0
-rw-------    1 postgres postgres 16777216 May 12 15:56 00000006000000B1
-rw-------    1 postgres postgres 16777216 May 12 15:56 00000006000000B2
-rw-------    1 postgres postgres 16777216 May 12 15:56 00000006000000B3
-rw-------    1 postgres postgres 16777216 May 12 15:56 00000006000000B4
-rw-------    1 postgres postgres 16777216 May 12 15:57 00000006000000B5
-rw-------    1 postgres postgres 16777216 May 12 15:57 00000006000000B6
-rw-------    1 postgres postgres 16777216 May 12 15:57 00000006000000B7
-rw-------    1 postgres postgres 16777216 May 12 15:57 00000006000000B8
-rw-------    1 postgres postgres 16777216 May 12 15:57 00000006000000B9
-rw-------    1 postgres postgres 16777216 May 12 15:57 00000006000000BA
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000BB
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000BC
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000BD
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000BE
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000BF
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000C0
-rw-------    1 postgres postgres 16777216 May 12 15:58 00000006000000C1
-rw-------    1 postgres postgres 16777216 May 12 15:59 00000006000000C2
-rw-------    1 postgres postgres 16777216 May 12 15:59 00000006000000C3
-rw-------    1 postgres postgres 16777216 May 12 15:59 00000006000000C4
-rw-------    1 postgres postgres 16777216 May 12 15:59 00000006000000C5
-rw-------    1 postgres postgres 16777216 May 12 15:59 00000006000000C6
-rw-------    1 postgres postgres 16777216 May 12 15:59 00000006000000C7
-rw-------    1 postgres postgres 16777216 May 12 16:00 00000006000000C8
-rw-------    1 postgres postgres 16777216 May 12 16:00 00000006000000C9
-rw-------    1 postgres postgres 16777216 May 12 16:00 00000006000000CA
-rw-------    1 postgres postgres 16777216 May 12 16:00 00000006000000CB
-rw-------    1 postgres postgres 16777216 May 12 16:00 00000006000000CC
-rw-------    1 postgres postgres 16777216 May 12 16:01 00000006000000CD
-rw-------    1 postgres postgres 16777216 May 12 16:01 00000006000000CE
-rw-------    1 postgres postgres 16777216 May 12 16:01 00000006000000CF
-rw-------    1 postgres postgres 16777216 May 12 16:01 00000006000000D0
-rw-------    1 postgres postgres 16777216 May 12 16:01 00000006000000D1
-rw-------    1 postgres postgres 16777216 May 12 16:01 00000006000000D2
-rw-------    1 postgres postgres 16777216 May 12 16:02 00000006000000D3
-rw-------    1 postgres postgres 16777216 May 12 16:02 00000006000000D4
-rw-------    1 postgres postgres 16777216 May 12 16:02 00000006000000D5
-rw-------    1 postgres postgres 16777216 May 12 16:02 00000006000000D6
-rw-------    1 postgres postgres 16777216 May 12 16:02 00000006000000D7
-rw-------    1 postgres postgres 16777216 May 12 16:03 00000006000000D8
-rw-------    1 postgres postgres 16777216 May 12 16:03 00000006000000D9
-rw-------    1 postgres postgres 16777216 May 12 16:03 00000006000000DA
-rw-------    1 postgres postgres 16777216 May 12 16:03 00000006000000DB
-rw-------    1 postgres postgres 16777216 May 12 16:03 00000006000000DC
-rw-------    1 postgres postgres 16777216 May 12 16:03 00000006000000DD
-rw-------    1 postgres postgres 16777216 May 12 16:04 00000006000000DE
-rw-------    1 postgres postgres 16777216 May 12 16:04 00000006000000DF
-rw-------    1 postgres postgres 16777216 May 12 16:04 00000006000000E0
-rw-------    1 postgres postgres 16777216 May 12 16:04 00000006000000E1
-rw-------    1 postgres postgres 16777216 May 12 16:04 00000006000000E2
-rw-------    1 postgres postgres 16777216 May 12 16:04 00000006000000E3
-rw-------    1 postgres postgres 16777216 May 12 16:05 00000006000000E4
-rw-------    1 postgres postgres 16777216 May 12 16:05 00000006000000E5
-rw-------    1 postgres postgres 16777216 May 12 16:05 00000006000000E6
-rw-------    1 postgres postgres 16777216 May 12 16:05 00000006000000E7
-rw-------    1 postgres postgres 16777216 May 12 16:05 00000006000000E8
-rw-------    1 postgres postgres 16777216 May 12 16:06 00000006000000E9
-rw-------    1 postgres postgres 16777216 May 12 16:06 00000006000000EA
-rw-------    1 postgres postgres 16777216 May 12 16:06 00000006000000EB
-rw-------    1 postgres postgres 16777216 May 12 16:06 00000006000000EC
-rw-------    1 postgres postgres 16777216 May 12 16:07 00000006000000ED
-rw-------    1 postgres postgres 16777216 May 12 16:07 00000006000000EE
-rw-------    1 postgres postgres 16777216 May 12 16:07 00000006000000EF
-rw-------    1 postgres postgres 16777216 May 12 16:07 00000006000000F0
-rw-------    1 postgres postgres 16777216 May 12 16:07 00000006000000F1
-rw-------    1 postgres postgres 16777216 May 12 16:08 00000006000000F2
-rw-------    1 postgres postgres 16777216 May 12 16:08 00000006000000F3
-rw-------    1 postgres postgres 16777216 May 12 16:08 00000006000000F4
-rw-------    1 postgres postgres 16777216 May 12 16:08 00000006000000F5
-rw-------    1 postgres postgres 16777216 May 12 16:08 00000006000000F6
-rw-------    1 postgres postgres 16777216 May 12 16:09 00000006000000F7
-rw-------    1 postgres postgres 16777216 May 12 16:09 00000006000000F8
-rw-------    1 postgres postgres 16777216 May 12 16:09 00000006000000F9
-rw-------    1 postgres postgres 16777216 May 12 16:09 00000006000000FA
-rw-------    1 postgres postgres 16777216 May 12 16:09 00000006000000FB
-rw-------    1 postgres postgres 16777216 May 12 16:09 00000006000000FC
-rw-------    1 postgres postgres 16777216 May 12 16:10 00000006000000FD
-rw-------    1 postgres postgres 16777216 May 12 16:10 00000006000000FE
-rw-------    1 postgres postgres 16777216 May 12 16:10 0000000700000000
-rw-------    1 postgres postgres 16777216 May 12 16:10 0000000700000001
-rw-------    1 postgres postgres 16777216 May 12 16:10 0000000700000002
-rw-------    1 postgres postgres 16777216 May 12 16:11 0000000700000003
-rw-------    1 postgres postgres 16777216 May 12 16:11 0000000700000004
-rw-------    1 postgres postgres 16777216 May 12 16:11 0000000700000005
-rw-------    1 postgres postgres 16777216 May 12 16:11 0000000700000006
-rw-------    1 postgres postgres 16777216 May 12 16:12 0000000700000007
-rw-------    1 postgres postgres 16777216 May 12 16:12 0000000700000008
-rw-------    1 postgres postgres 16777216 May 12 16:12 0000000700000009
-rw-------    1 postgres postgres 16777216 May 12 16:12 000000070000000A
-rw-------    1 postgres postgres 16777216 May 12 16:12 000000070000000B
-rw-------    1 postgres postgres 16777216 May 12 16:13 000000070000000C
-rw-------    1 postgres postgres 16777216 May 12 16:13 000000070000000D
-rw-------    1 postgres postgres 16777216 May 12 16:13 000000070000000E
-rw-------    1 postgres postgres 16777216 May 12 16:13 000000070000000F
-rw-------    1 postgres postgres 16777216 May 12 16:13 0000000700000010
-rw-------    1 postgres postgres 16777216 May 12 16:14 0000000700000011
-rw-------    1 postgres postgres 16777216 May 12 16:14 0000000700000012
-rw-------    1 postgres postgres 16777216 May 12 16:14 0000000700000013
-rw-------    1 postgres postgres 16777216 May 12 16:14 0000000700000014
-rw-------    1 postgres postgres 16777216 May 12 16:15 0000000700000015
-rw-------    1 postgres postgres 16777216 May 12 16:15 0000000700000016
-rw-------    1 postgres postgres 16777216 May 12 16:15 0000000700000017
-rw-------    1 postgres postgres 16777216 May 12 16:15 0000000700000018
-rw-------    1 postgres postgres 16777216 May 12 16:15 0000000700000019
-rw-------    1 postgres postgres 16777216 May 12 16:16 000000070000001A
-rw-------    1 postgres postgres 16777216 May 12 16:16 000000070000001B
-rw-------    1 postgres postgres 16777216 May 12 16:16 000000070000001C
-rw-------    1 postgres postgres 16777216 May 12 16:16 000000070000001D
-rw-------    1 postgres postgres 16777216 May 12 16:17 000000070000001E
-rw-------    1 postgres postgres 16777216 May 12 16:17 000000070000001F
-rw-------    1 postgres postgres 16777216 May 12 16:17 0000000700000020
-rw-------    1 postgres postgres 16777216 May 12 16:17 0000000700000021
-rw-------    1 postgres postgres 16777216 May 12 16:18 0000000700000022
-rw-------    1 postgres postgres 16777216 May 12 16:18 0000000700000023
-rw-------    1 postgres postgres 16777216 May 12 16:18 0000000700000024
-rw-------    1 postgres postgres 16777216 May 12 16:18 0000000700000025
-rw-------    1 postgres postgres 16777216 May 12 16:19 0000000700000026
-rw-------    1 postgres postgres 16777216 May 12 16:19 0000000700000027
-rw-------    1 postgres postgres 16777216 May 12 16:19 0000000700000028
-rw-------    1 postgres postgres 16777216 May 12 16:19 0000000700000029
-rw-------    1 postgres postgres 16777216 May 12 16:20 000000070000002A
-rw-------    1 postgres postgres 16777216 May 12 16:20 000000070000002B
-rw-------    1 postgres postgres 16777216 May 12 16:20 000000070000002C
-rw-------    1 postgres postgres 16777216 May 12 16:20 000000070000002D
-rw-------    1 postgres postgres 16777216 May 12 16:20 000000070000002E
-rw-------    1 postgres postgres 16777216 May 12 16:21 000000070000002F
-rw-------    1 postgres postgres 16777216 May 12 16:21 0000000700000030
-rw-------    1 postgres postgres 16777216 May 12 16:21 0000000700000031
-rw-------    1 postgres postgres 16777216 May 12 16:21 0000000700000032
-rw-------    1 postgres postgres 16777216 May 12 16:22 0000000700000033
-rw-------    1 postgres postgres 16777216 May 12 16:22 0000000700000034
-rw-------    1 postgres postgres 16777216 May 12 16:22 0000000700000035
-rw-------    1 postgres postgres 16777216 May 12 16:22 0000000700000036
-rw-------    1 postgres postgres 16777216 May 12 16:23 0000000700000037
-rw-------    1 postgres postgres 16777216 May 12 16:23 0000000700000038
-rw-------    1 postgres postgres 16777216 May 12 16:23 0000000700000039
-rw-------    1 postgres postgres 16777216 May 12 16:23 000000070000003A
-rw-------    1 postgres postgres 16777216 May 12 16:24 000000070000003B
-rw-------    1 postgres postgres 16777216 May 12 16:24 000000070000003C
-rw-------    1 postgres postgres 16777216 May 12 16:24 000000070000003D
-rw-------    1 postgres postgres 16777216 May 12 16:24 000000070000003E
-rw-------    1 postgres postgres 16777216 May 12 16:25 000000070000003F
-rw-------    1 postgres postgres 16777216 May 12 16:25 0000000700000040
-rw-------    1 postgres postgres 16777216 May 12 16:25 0000000700000041
-rw-------    1 postgres postgres 16777216 May 12 16:25 0000000700000042
-rw-------    1 postgres postgres 16777216 May 12 16:26 0000000700000043
-rw-------    1 postgres postgres 16777216 May 12 16:26 0000000700000044
-rw-------    1 postgres postgres 16777216 May 12 16:26 0000000700000045
-rw-------    1 postgres postgres 16777216 May 12 16:26 0000000700000046
-rw-------    1 postgres postgres 16777216 May 12 16:26 0000000700000047
-rw-------    1 postgres postgres 16777216 May 12 16:27 0000000700000048
-rw-------    1 postgres postgres 16777216 May 12 16:27 0000000700000049
-rw-------    1 postgres postgres 16777216 May 12 16:27 000000070000004A
-rw-------    1 postgres postgres 16777216 May 12 16:27 000000070000004B
-rw-------    1 postgres postgres 16777216 May 12 16:28 000000070000004C
-rw-------    1 postgres postgres 16777216 May 12 16:28 000000070000004D
-rw-------    1 postgres postgres 16777216 May 12 16:28 000000070000004E
-rw-------    1 postgres postgres 16777216 May 12 16:28 000000070000004F
-rw-------    1 postgres postgres 16777216 May 12 16:29 0000000700000050
-rw-------    1 postgres postgres 16777216 May 12 16:29 0000000700000051
-rw-------    1 postgres postgres 16777216 May 12 16:29 0000000700000052
-rw-------    1 postgres postgres 16777216 May 12 16:29 0000000700000053
-rw-------    1 postgres postgres 16777216 May 12 16:30 0000000700000054
-rw-------    1 postgres postgres 16777216 May 12 16:30 0000000700000055
-rw-------    1 postgres postgres 16777216 May 12 16:30 0000000700000056
-rw-------    1 postgres postgres 16777216 May 12 16:30 0000000700000057
-rw-------    1 postgres postgres 16777216 May 12 16:31 0000000700000058
-rw-------    1 postgres postgres 16777216 May 12 16:31 0000000700000059
-rw-------    1 postgres postgres 16777216 May 12 16:31 000000070000005A
-rw-------    1 postgres postgres 16777216 May 12 16:32 000000070000005B
-rw-------    1 postgres postgres 16777216 May 12 16:32 000000070000005C
-rw-------    1 postgres postgres 16777216 May 12 16:32 000000070000005D
-rw-------    1 postgres postgres 16777216 May 12 16:32 000000070000005E
-rw-------    1 postgres postgres 16777216 May 12 16:33 000000070000005F
-rw-------    1 postgres postgres 16777216 May 12 16:33 0000000700000060
-rw-------    1 postgres postgres 16777216 May 12 16:33 0000000700000061
-rw-------    1 postgres postgres 16777216 May 12 16:33 0000000700000062
-rw-------    1 postgres postgres 16777216 May 12 16:34 0000000700000063
-rw-------    1 postgres postgres 16777216 May 12 16:34 0000000700000064
-rw-------    1 postgres postgres 16777216 May 12 16:34 0000000700000065
-rw-------    1 postgres postgres 16777216 May 12 16:34 0000000700000066
-rw-------    1 postgres postgres 16777216 May 12 16:34 0000000700000067
-rw-------    1 postgres postgres 16777216 May 12 16:35 0000000700000068
-rw-------    1 postgres postgres 16777216 May 12 16:35 0000000700000069
-rw-------    1 postgres postgres 16777216 May 12 16:35 000000070000006A
-rw-------    1 postgres postgres 16777216 May 12 16:35 000000070000006B
-rw-------    1 postgres postgres 16777216 May 12 16:36 000000070000006C
-rw-------    1 postgres postgres 16777216 May 12 16:36 000000070000006D
-rw-------    1 postgres postgres 16777216 May 12 16:36 000000070000006E
-rw-------    1 postgres postgres 16777216 May 12 16:36 000000070000006F
-rw-------    1 postgres postgres 16777216 May 12 16:45 0000000700000070



Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> I have configured 24 checkpoint segments,
> which I expect gives me a worst-case usage in pg_xlog of 384MB.
> Unfortunately, during the CREATE INDEX, pg_xlog becomes full!

Postgres version?  AFAIR 7.1.2 or thereabouts would act that way,
because it wouldn't discard log segments containing work of open
transactions.

The only reason I can think of for such behavior in a current version
is if the checkpoint isn't completing for some reason.  Have you looked
in the postmaster log to see if checkpoint is bleating about anything?

Note that your calculation is off anyway: per documentation, recent
versions will allow up to 2*checkpoint_segments+1 WAL files to exist.
This is because we save the last two checkpoints worth of data.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
On Wed, 2004-05-12 at 20:18, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > I have configured 24 checkpoint segments,
> > which I expect gives me a worst-case usage in pg_xlog of 384MB.
> > Unfortunately, during the CREATE INDEX, pg_xlog becomes full!
>
> Postgres version?  AFAIR 7.1.2 or thereabouts would act that way,
> because it wouldn't discard log segments containing work of open
> transactions.

This is 7.4.1.

> The only reason I can think of for such behavior in a current version
> is if the checkpoint isn't completing for some reason.  Have you looked
> in the postmaster log to see if checkpoint is bleating about anything?

Oh sure, it's bleating.  Apparently my computer is too fast:

syslog:May 12 21:20:32 mistral postgres[17458]: [214-1] LOG:  checkpoints are occurring too frequently (12 seconds
apart)
syslog:May 12 21:20:32 mistral postgres[17458]: [214-2] HINT:  Consider increasing the configuration parameter
"checkpoint_segments".

That's also what preceeded the most recent crash:

May 12 16:36:53 mistral postgres[506]: [173-1] LOG:  checkpoints are occurring too frequently (16 seconds apart)
May 12 16:36:53 mistral postgres[506]: [173-2] HINT:  Consider increasing the configuration parameter
"checkpoint_segments".
May 12 16:37:08 mistral postgres[506]: [174-1] LOG:  server process (PID 16403) was terminated by signal 6
May 12 16:37:08 mistral postgres[506]: [175-1] LOG:  terminating any other active server processes
May 12 16:37:08 mistral postgres[26212]: [3-1] WARNING:  terminating connection because of crash of another server
process
May 12 16:37:08 mistral postgres[26212]: [3-2] DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server 
May 12 16:37:08 mistral postgres[26212]: [3-3]  process exited abnormally and possibly corrupted shared memory.
May 12 16:37:08 mistral postgres[26212]: [3-4] HINT:  In a moment you should be able to reconnect to the database and
repeatyour command. 
May 12 16:37:08 mistral postgres[14396]: [1-1] WARNING:  terminating connection because of crash of another server
process
May 12 16:37:08 mistral postgres[14396]: [1-2] DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server 
May 12 16:37:08 mistral postgres[14396]: [1-3]  process exited abnormally and possibly corrupted shared memory.
May 12 16:37:08 mistral postgres[14396]: [1-4] HINT:  In a moment you should be able to reconnect to the database and
repeatyour command. 
May 12 16:37:08 mistral postgres[14396]: [1-5] CONTEXT:  writing block 56889 of relation 17142/289600495
May 12 16:37:08 mistral postgres[506]: [176-1] LOG:  all server processes terminated; reinitializing
May 12 16:37:08 mistral postgres[17086]: [177-1] LOG:  database system was interrupted at 2004-05-12 16:36:53 PDT
May 12 16:37:08 mistral postgres[17086]: [178-1] LOG:  checkpoint record is at 6/92B32D48
May 12 16:37:08 mistral postgres[17086]: [179-1] LOG:  redo record is at 6/92B32D48; undo record is at 0/0; shutdown
FALSE
May 12 16:37:08 mistral postgres[17086]: [180-1] LOG:  next transaction ID: 10232; next OID: 289600632
May 12 16:37:08 mistral postgres[17086]: [181-1] LOG:  database system was not properly shut down; automatic recovery
inprogress 
May 12 16:37:08 mistral postgres[17086]: [182-1] LOG:  redo starts at 6/92B32D88
May 12 16:45:29 mistral postgres[17086]: [183-1] LOG:  could not open file
"/var/lib/postgres/data/pg_xlog/0000000700000071"(log file 7, segment 113): No such file or directory 
May 12 16:45:29 mistral postgres[17086]: [184-1] LOG:  redo done at 7/70FFC400
May 12 16:45:50 mistral postgres[506]: [177-1] LOG:  startup process (PID 17086) was terminated by signal 6
May 12 16:45:50 mistral postgres[506]: [178-1] LOG:  aborting startup due to startup process failure

I also have many "recycled" messages:

May 12 13:10:01 mistral postgres[16373]: [20-1] LOG:  recycled transaction log file "0000000600000078"
May 12 13:10:01 mistral postgres[16373]: [21-1] LOG:  recycled transaction log file "0000000600000079"
May 12 13:10:01 mistral postgres[16373]: [22-1] LOG:  recycled transaction log file "000000060000007A"
May 12 13:10:01 mistral postgres[16373]: [23-1] LOG:  recycled transaction log file "000000060000007B"
May 12 13:10:01 mistral postgres[16373]: [24-1] LOG:  recycled transaction log file "000000060000007C"
May 12 13:20:03 mistral postgres[16408]: [1-1] LOG:  removing transaction log file "0000000600000090"
May 12 13:20:03 mistral postgres[16408]: [2-1] LOG:  removing transaction log file "0000000600000091"

> Note that your calculation is off anyway: per documentation, recent
> versions will allow up to 2*checkpoint_segments+1 WAL files to exist.
> This is because we save the last two checkpoints worth of data.

That's why I gave it an order of magnitude leeway.  Such calculations
are always off by a factor or two.

It's doing it again at this very moment, I get the feeling that pg_xlog
is converging on the same size as $PGDATA/base/$OID/pgsql_tmp:

mistral:/backup_temp# du -hs pg_xlog /var/lib/postgres/data/base/17142/pgsql_tmp/
4.8G    pg_xlog
5.4G    /var/lib/postgres/data/base/17142/pgsql_tmp

-jwb

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
On Wed, 2004-05-12 at 21:31, Jeffrey W. Baker wrote:
> On Wed, 2004-05-12 at 20:18, Tom Lane wrote:
> > "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > > I have configured 24 checkpoint segments,
> > > which I expect gives me a worst-case usage in pg_xlog of 384MB.
> > > Unfortunately, during the CREATE INDEX, pg_xlog becomes full!
> >
> > Postgres version?  AFAIR 7.1.2 or thereabouts would act that way,
> > because it wouldn't discard log segments containing work of open
> > transactions.

The CREATE INDEX finally returned, and none of the xlog space was
reclaimed.  The terminal size is 6.1GB.  Interestingly, there is a
continuous group of xlogs that haven't been touched since data started
being copied from pgsql_tmp to ../.  (what do you call this stage of the
index build?)  In reverse order of modification time:

May 12 21:55 00000008000000F7
May 12 21:49 00000008000000F6
May 12 21:49 00000008000000F5
[... decreasing by 1 until ...]
May 12 20:40 000000070000008C
May 12 20:40 000000070000008B
May 12 20:40 000000070000008A
[the ones below have higher numbers, but aren't being written to!]
May 12 20:40 0000000900000010
May 12 20:40 000000090000000F
May 12 20:39 000000090000000E
May 12 20:39 000000090000000D
May 12 20:39 000000090000000C
May 12 20:39 000000090000000B
May 12 20:39 000000090000000A
May 12 20:39 0000000900000009
May 12 20:39 0000000900000008
May 12 20:39 0000000900000007
May 12 20:39 0000000900000006
May 12 20:39 0000000900000005
May 12 20:39 0000000900000004
May 12 20:38 0000000900000003
May 12 20:38 0000000900000002
May 12 20:38 0000000900000001
May 12 20:38 0000000900000000
May 12 20:38 00000008000000FE
May 12 20:38 00000008000000FD
May 12 20:38 00000008000000FC
May 12 20:38 00000008000000FB
May 12 20:38 00000008000000FA
May 12 20:38 00000008000000F9
May 12 20:37 00000008000000F8

Not coincidentally, there are 24 of these oddballs and 24 is also the
number of checkpoint segments I have configured.

-jwb

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> Oh sure, it's bleating.  Apparently my computer is too fast:

I don't think the checkpoint process is completing.

> May 12 16:37:08 mistral postgres[506]: [174-1] LOG:  server process (PID 16403) was terminated by signal 6

You need to find out why these aborts (presumably Assert failures) are
occurring.  Having just looked at the source code, I see that Assert
messages are never sent to syslog only to stderr.  So you'll want to
set things up to capture the postmaster's stderr instead of discarding it.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
On Thu, 2004-05-13 at 06:10, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > Oh sure, it's bleating.  Apparently my computer is too fast:
>
> I don't think the checkpoint process is completing.
>
> > May 12 16:37:08 mistral postgres[506]: [174-1] LOG:  server process (PID 16403) was terminated by signal 6
>
> You need to find out why these aborts (presumably Assert failures) are
> occurring.  Having just looked at the source code, I see that Assert
> messages are never sent to syslog only to stderr.  So you'll want to
> set things up to capture the postmaster's stderr instead of discarding it.

The server process aborted when the device filled up.

-jwb

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> Sorry, my last mail got cut off.  The server aborted because it couldn't
> write the xlog.  Looks like I omitted this from my last mail:

Selective quoting of the log output?  Naughty naughty.

However, that still doesn't explain how you got into the current state.
Had you once had checkpoint_segments set much higher than the current
value of 24?  On looking at the code I see that it doesn't make any
attempt to prune future log segments after a decrease in
checkpoint_segments, so if a previous misconfiguration had allowed the
number of future segments to get really large, that could be the root of
the issue.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
On Thu, 2004-05-13 at 09:28, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > Sorry, my last mail got cut off.  The server aborted because it couldn't
> > write the xlog.  Looks like I omitted this from my last mail:
>
> Selective quoting of the log output?  Naughty naughty.
>
> However, that still doesn't explain how you got into the current state.
> Had you once had checkpoint_segments set much higher than the current
> value of 24?  On looking at the code I see that it doesn't make any
> attempt to prune future log segments after a decrease in
> checkpoint_segments, so if a previous misconfiguration had allowed the
> number of future segments to get really large, that could be the root of
> the issue.

The database where this happened was freshly initialized and this was
one of its first operations.  I think I will write a small example
script so you can reproduce it locally.  Give me a few hours or so.

-jwb

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
[For whatever reason this got delayed in delivery.  Sending again from another route.  -jwb]

On Thu, 2004-05-13 at 09:28, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > Sorry, my last mail got cut off.  The server aborted because it couldn't
> > write the xlog.  Looks like I omitted this from my last mail:
>
> Selective quoting of the log output?  Naughty naughty.
>
> However, that still doesn't explain how you got into the current state.
> Had you once had checkpoint_segments set much higher than the current
> value of 24?  On looking at the code I see that it doesn't make any
> attempt to prune future log segments after a decrease in
> checkpoint_segments, so if a previous misconfiguration had allowed the
> number of future segments to get really large, that could be the root of
> the issue.

Okay, I installed a fresh, completely stock 7.4.2 and did the following:

#!/bin/sh

createdb growxlog
echo "create table data (a int, b int, c int, d int, e int)" | psql growxlog
perl -e 'use POSIX qw(floor); print "COPY data FROM STDIN;\n"; for ($i = 0; $i < 100000000; $i++) {print(join("\t", $i,
floor(rand()*1000000),floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000)), "\n")}' | psql growxlog 
echo "create unique index data_pkey on data(a,b,c)" | psql growxlog

The result was a table with 100 million rows, a 5.3GB data table, a
2.7GB index, and 2.6GB in pg_xlog.  Reproducable every time.

For the less patient, the problem is also reproduceable at only 10
million rows (xlog = 337MB), but not at 1 million rows (presumably the
whole mess fits inside the usual # of segments).

You'll need approximately rows*125 bytes of free space to run the test.

-jwb


Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> Okay, I installed a fresh, completely stock 7.4.2 and did the following:

> createdb growxlog
> echo "create table data (a int, b int, c int, d int, e int)" | psql growxlog
> perl -e 'use POSIX qw(floor); print "COPY data FROM STDIN;\n"; for ($i = 0; $i < 100000000; $i++) {print(join("\t",
$i,floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000)), "\n")}' | psql growxlog 
> echo "create unique index data_pkey on data(a,b,c)" | psql growxlog

I tried this locally, and what I see happening is that a checkpoint
process starts shortly after the CREATE INDEX begins whomping out the
index data --- but it doesn't finish until after the CREATE INDEX does.
AFAICS there is not any sort of locking problem, it's just that the
CREATE INDEX is chewing all the I/O bandwidth.  If we could get some
more checkpoints pushed through then the xlog would get truncated, but
it's not happening.

I'm running this on a development machine with an ok CPU and junk
consumer-grade-IDE disk drive, so lack of I/O bandwidth is hardly
surprising; can anyone confirm the observation on better hardware?

I think one reason for the problem is that btree CREATE INDEX is as bad
as VACUUM, if not worse, with respect to chewing all available shared
buffers.  It will fill shared buffers with newly-created dirty pages ---
and I'll bet the CHECKPOINT process is getting stuck with dumping the
majority of those pages to disk.  It might be worth tweaking CREATE
INDEX so it pushes out the newly created pages for itself, thus limiting
the number of dirtied buffers it creates.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
On Thu, 2004-05-13 at 21:00, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > Okay, I installed a fresh, completely stock 7.4.2 and did the following:
>
> > createdb growxlog
> > echo "create table data (a int, b int, c int, d int, e int)" | psql growxlog
> > perl -e 'use POSIX qw(floor); print "COPY data FROM STDIN;\n"; for ($i = 0; $i < 100000000; $i++) {print(join("\t",
$i,floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000)), "\n")}' | psql growxlog 
> > echo "create unique index data_pkey on data(a,b,c)" | psql growxlog
>
> I tried this locally, and what I see happening is that a checkpoint
> process starts shortly after the CREATE INDEX begins whomping out the
> index data --- but it doesn't finish until after the CREATE INDEX does.
> AFAICS there is not any sort of locking problem, it's just that the
> CREATE INDEX is chewing all the I/O bandwidth.  If we could get some
> more checkpoints pushed through then the xlog would get truncated, but
> it's not happening.
>
> I'm running this on a development machine with an ok CPU and junk
> consumer-grade-IDE disk drive, so lack of I/O bandwidth is hardly
> surprising; can anyone confirm the observation on better hardware?

We're running it on some pretty manly hardware (dual opterons, 8gb main
memory, 4-way SATA RAIDs), so it happens on both ends of the spectrum.

-jwb


Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
"Jeffrey W. Baker" <jwbaker@acm.org> writes:
> On Thu, 2004-05-13 at 21:00, Tom Lane wrote:
>> I tried this locally, and what I see happening is that a checkpoint
>> process starts shortly after the CREATE INDEX begins whomping out the
>> index data --- but it doesn't finish until after the CREATE INDEX does.
>> AFAICS there is not any sort of locking problem, it's just that the
>> CREATE INDEX is chewing all the I/O bandwidth.  If we could get some
>> more checkpoints pushed through then the xlog would get truncated, but
>> it's not happening.
>>
>> I'm running this on a development machine with an ok CPU and junk
>> consumer-grade-IDE disk drive, so lack of I/O bandwidth is hardly
>> surprising; can anyone confirm the observation on better hardware?

> We're running it on some pretty manly hardware (dual opterons, 8gb main
> memory, 4-way SATA RAIDs), so it happens on both ends of the spectrum.

What I wanted to know was whether you see a checkpoint process
persisting throughout the write-the-index phase of the CREATE INDEX.
There probably will be one at pretty nearly all times, but is it
always the same one (same PID)?

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Vivek Khera
Date:
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> However, that still doesn't explain how you got into the current state.
TL> Had you once had checkpoint_segments set much higher than the current
TL> value of 24?  On looking at the code I see that it doesn't make any
TL> attempt to prune future log segments after a decrease in
TL> checkpoint_segments, so if a previous misconfiguration had allowed the
TL> number of future segments to get really large, that could be the root of
TL> the issue.

Wow... that explains it!

I bumped up checkpoint segments to 50 for a restore since it made it
run way faster.  In normal operation I don't need that many so I
dropped it back down but it didn't reclaim any space so I figured I
might as well keep it at 50...


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
Vivek Khera <khera@kcilink.com> writes:
> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:
> TL> ...  On looking at the code I see that it doesn't make any
> TL> attempt to prune future log segments after a decrease in
> TL> checkpoint_segments, so if a previous misconfiguration had allowed the
> TL> number of future segments to get really large, that could be the root of
> TL> the issue.

> Wow... that explains it!

> I bumped up checkpoint segments to 50 for a restore since it made it
> run way faster.  In normal operation I don't need that many so I
> dropped it back down but it didn't reclaim any space so I figured I
> might as well keep it at 50...

How long did you wait?  I believe the code will prune excess segments as
they come around to be recycled.  It just doesn't kill them immediately.

I think that part of what's going on in Jeff's example is that he's
looking at the state immediately after a spike in database traffic, and
not having any patience to see if the system will recover after some
period with more-normal traffic levels.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
I wrote:
> I tried this locally, and what I see happening is that a checkpoint
> process starts shortly after the CREATE INDEX begins whomping out the
> index data --- but it doesn't finish until after the CREATE INDEX does.
> AFAICS there is not any sort of locking problem,

I have to take that back: there definitely is a locking problem.
Perhaps there is an I/O bandwidth issue too.

What I see happening on closer analysis is that btree CREATE INDEX can
hold "exclusive context lock" on some shared buffers for significant
periods of time.  It tries to write all the levels of the btree in
parallel, so it is spitting out level-zero pages at a great rate,
level-one pages at a lesser rate, etc.  For a large index there could
be many btree levels, and pages in the higher levels will be held locked
in the shared buffer arena for considerable periods.

CHECKPOINT scans the shared buffer arena and tries to write every dirty
page it finds.  This requires getting shared context lock, and so will
be blocked by the lock CREATE INDEX is holding.

I am toying with the idea that CREATE INDEX shouldn't use the shared
buffer manager at all; there is no need for other backends to touch the
index until the creating transaction commits.  We'd need to be able to
fsync the index file before committing.  That isn't part of the smgr API
right now, but could easily be added.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Alvaro Herrera
Date:
On Sat, May 15, 2004 at 12:23:18AM -0400, Tom Lane wrote:

> What I see happening on closer analysis is that btree CREATE INDEX can
> hold "exclusive context lock" on some shared buffers for significant
> periods of time.  It tries to write all the levels of the btree in
> parallel, so it is spitting out level-zero pages at a great rate,
> level-one pages at a lesser rate, etc.  For a large index there could
> be many btree levels, and pages in the higher levels will be held locked
> in the shared buffer arena for considerable periods.

I'm not sure how btree from-scratch-building work, but could it be made
so that it first build level 0 completely, scanning the heap; then build
level 1 scanning the level 0, and so on?

I also wonder why index creation would write XLog information; it won't
be used to recreate the tree if the machine goes down while the index is
created.  If you plan on not using the shared bufmgr and issuing a sync
before comit, you could get rid of logging during build ...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)

Re: pg_xlog becomes extremely large during CREATE INDEX

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Sat, May 15, 2004 at 12:23:18AM -0400, Tom Lane wrote:
>> ... It tries to write all the levels of the btree in
>> parallel, so it is spitting out level-zero pages at a great rate,
>> level-one pages at a lesser rate, etc.  For a large index there could
>> be many btree levels, and pages in the higher levels will be held locked
>> in the shared buffer arena for considerable periods.

> I'm not sure how btree from-scratch-building work, but could it be made
> so that it first build level 0 completely, scanning the heap; then build
> level 1 scanning the level 0, and so on?

I don't think that would be a win; it would require an extra read pass
over nearly all of the index.  The idea here is to reduce the amount of
I/O, not increase it.

> I also wonder why index creation would write XLog information;

Yeah, that was implicit in my comment about not using the buffer
manager, and it was why I was concerned about being able to fsync
the index file.  Without xlog for backup, you'd better be able to
force the index to disk before you commit.

It turns out that btree index build currently writes each page three
times: once to extend the file, once into WAL, and then finally to write
the completed index page.  It is difficult to do any better as long as
you're using the shared buffer manager.  (When extending a file, the
buffer manager wants to allocate disk space before it will give you a
buffer to write into.  This is the right thing for heap insertions and
extending an existing index, because we don't force buffers to disk
before commit.  So if it didn't work that way, you might commit your
transaction before finding out there's no disk space to hold the data.
But we could dispense with that worry for index build if we ensure that
we write and fsync all the pages before commit.)  So I was thinking
about keeping the same index build process, but working in private
storage instead of shared buffers, and writing direct to smgr.  This
would not only eliminate the locking conflict against checkpoints,
but reduce the amount of I/O involved quite a lot.

            regards, tom lane

Re: pg_xlog becomes extremely large during CREATE INDEX

From
"Jeffrey W. Baker"
Date:
On Fri, 2004-05-14 at 20:55, Tom Lane wrote:
> Vivek Khera <khera@kcilink.com> writes:
> > "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:
> > TL> ...  On looking at the code I see that it doesn't make any
> > TL> attempt to prune future log segments after a decrease in
> > TL> checkpoint_segments, so if a previous misconfiguration had allowed the
> > TL> number of future segments to get really large, that could be the root of
> > TL> the issue.
>
> > Wow... that explains it!
>
> > I bumped up checkpoint segments to 50 for a restore since it made it
> > run way faster.  In normal operation I don't need that many so I
> > dropped it back down but it didn't reclaim any space so I figured I
> > might as well keep it at 50...
>
> How long did you wait?  I believe the code will prune excess segments as
> they come around to be recycled.  It just doesn't kill them immediately.
>
> I think that part of what's going on in Jeff's example is that he's
> looking at the state immediately after a spike in database traffic, and
> not having any patience to see if the system will recover after some
> period with more-normal traffic levels.

It won't ever return to normal levels after the device fills up, the
server panics, and all the databases are inconsistent.

-jwb