Thread: pg_xlog becomes extremely large during CREATE INDEX
Greetings, I have a 23GB data table upon which I am building a primary key of three columns. The data is mounted in a 137GB device and pg_xlog is mounted on a separate 3.5GB device. I have configured 24 checkpoint segments, which I expect gives me a worst-case usage in pg_xlog of 384MB. Unfortunately, during the CREATE INDEX, pg_xlog becomes full! [There's an ls(1) listing at the end of this mail, if you care to see] The operation created 222 files in pg_xlog before it filled the device, and there's no indication that it was planning to stop there. Is there some factor, other than checkpoint segments, which controls the size of pg_xlog, or is there some other operation that also writes files in pg_xlog, or, is the server just running away? I have reset the xlogs and I'm trying again. Regards, Jeffrey mistral:/pg_xlog_full# ls -l total 3641692 -rw------- 1 postgres postgres 16777216 May 12 15:51 0000000600000092 -rw------- 1 postgres postgres 16777216 May 12 15:51 0000000600000093 -rw------- 1 postgres postgres 16777216 May 12 15:51 0000000600000094 -rw------- 1 postgres postgres 16777216 May 12 15:51 0000000600000095 -rw------- 1 postgres postgres 16777216 May 12 15:52 0000000600000096 -rw------- 1 postgres postgres 16777216 May 12 15:52 0000000600000097 -rw------- 1 postgres postgres 16777216 May 12 15:52 0000000600000098 -rw------- 1 postgres postgres 16777216 May 12 15:52 0000000600000099 -rw------- 1 postgres postgres 16777216 May 12 15:52 000000060000009A -rw------- 1 postgres postgres 16777216 May 12 15:52 000000060000009B -rw------- 1 postgres postgres 16777216 May 12 15:53 000000060000009C -rw------- 1 postgres postgres 16777216 May 12 15:53 000000060000009D -rw------- 1 postgres postgres 16777216 May 12 15:53 000000060000009E -rw------- 1 postgres postgres 16777216 May 12 15:53 000000060000009F -rw------- 1 postgres postgres 16777216 May 12 15:53 00000006000000A0 -rw------- 1 postgres postgres 16777216 May 12 15:53 00000006000000A1 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A2 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A3 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A4 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A5 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A6 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A7 -rw------- 1 postgres postgres 16777216 May 12 15:54 00000006000000A8 -rw------- 1 postgres postgres 16777216 May 12 15:55 00000006000000A9 -rw------- 1 postgres postgres 16777216 May 12 15:55 00000006000000AA -rw------- 1 postgres postgres 16777216 May 12 15:55 00000006000000AB -rw------- 1 postgres postgres 16777216 May 12 15:55 00000006000000AC -rw------- 1 postgres postgres 16777216 May 12 15:55 00000006000000AD -rw------- 1 postgres postgres 16777216 May 12 15:55 00000006000000AE -rw------- 1 postgres postgres 16777216 May 12 15:56 00000006000000AF -rw------- 1 postgres postgres 16777216 May 12 15:56 00000006000000B0 -rw------- 1 postgres postgres 16777216 May 12 15:56 00000006000000B1 -rw------- 1 postgres postgres 16777216 May 12 15:56 00000006000000B2 -rw------- 1 postgres postgres 16777216 May 12 15:56 00000006000000B3 -rw------- 1 postgres postgres 16777216 May 12 15:56 00000006000000B4 -rw------- 1 postgres postgres 16777216 May 12 15:57 00000006000000B5 -rw------- 1 postgres postgres 16777216 May 12 15:57 00000006000000B6 -rw------- 1 postgres postgres 16777216 May 12 15:57 00000006000000B7 -rw------- 1 postgres postgres 16777216 May 12 15:57 00000006000000B8 -rw------- 1 postgres postgres 16777216 May 12 15:57 00000006000000B9 -rw------- 1 postgres postgres 16777216 May 12 15:57 00000006000000BA -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000BB -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000BC -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000BD -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000BE -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000BF -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000C0 -rw------- 1 postgres postgres 16777216 May 12 15:58 00000006000000C1 -rw------- 1 postgres postgres 16777216 May 12 15:59 00000006000000C2 -rw------- 1 postgres postgres 16777216 May 12 15:59 00000006000000C3 -rw------- 1 postgres postgres 16777216 May 12 15:59 00000006000000C4 -rw------- 1 postgres postgres 16777216 May 12 15:59 00000006000000C5 -rw------- 1 postgres postgres 16777216 May 12 15:59 00000006000000C6 -rw------- 1 postgres postgres 16777216 May 12 15:59 00000006000000C7 -rw------- 1 postgres postgres 16777216 May 12 16:00 00000006000000C8 -rw------- 1 postgres postgres 16777216 May 12 16:00 00000006000000C9 -rw------- 1 postgres postgres 16777216 May 12 16:00 00000006000000CA -rw------- 1 postgres postgres 16777216 May 12 16:00 00000006000000CB -rw------- 1 postgres postgres 16777216 May 12 16:00 00000006000000CC -rw------- 1 postgres postgres 16777216 May 12 16:01 00000006000000CD -rw------- 1 postgres postgres 16777216 May 12 16:01 00000006000000CE -rw------- 1 postgres postgres 16777216 May 12 16:01 00000006000000CF -rw------- 1 postgres postgres 16777216 May 12 16:01 00000006000000D0 -rw------- 1 postgres postgres 16777216 May 12 16:01 00000006000000D1 -rw------- 1 postgres postgres 16777216 May 12 16:01 00000006000000D2 -rw------- 1 postgres postgres 16777216 May 12 16:02 00000006000000D3 -rw------- 1 postgres postgres 16777216 May 12 16:02 00000006000000D4 -rw------- 1 postgres postgres 16777216 May 12 16:02 00000006000000D5 -rw------- 1 postgres postgres 16777216 May 12 16:02 00000006000000D6 -rw------- 1 postgres postgres 16777216 May 12 16:02 00000006000000D7 -rw------- 1 postgres postgres 16777216 May 12 16:03 00000006000000D8 -rw------- 1 postgres postgres 16777216 May 12 16:03 00000006000000D9 -rw------- 1 postgres postgres 16777216 May 12 16:03 00000006000000DA -rw------- 1 postgres postgres 16777216 May 12 16:03 00000006000000DB -rw------- 1 postgres postgres 16777216 May 12 16:03 00000006000000DC -rw------- 1 postgres postgres 16777216 May 12 16:03 00000006000000DD -rw------- 1 postgres postgres 16777216 May 12 16:04 00000006000000DE -rw------- 1 postgres postgres 16777216 May 12 16:04 00000006000000DF -rw------- 1 postgres postgres 16777216 May 12 16:04 00000006000000E0 -rw------- 1 postgres postgres 16777216 May 12 16:04 00000006000000E1 -rw------- 1 postgres postgres 16777216 May 12 16:04 00000006000000E2 -rw------- 1 postgres postgres 16777216 May 12 16:04 00000006000000E3 -rw------- 1 postgres postgres 16777216 May 12 16:05 00000006000000E4 -rw------- 1 postgres postgres 16777216 May 12 16:05 00000006000000E5 -rw------- 1 postgres postgres 16777216 May 12 16:05 00000006000000E6 -rw------- 1 postgres postgres 16777216 May 12 16:05 00000006000000E7 -rw------- 1 postgres postgres 16777216 May 12 16:05 00000006000000E8 -rw------- 1 postgres postgres 16777216 May 12 16:06 00000006000000E9 -rw------- 1 postgres postgres 16777216 May 12 16:06 00000006000000EA -rw------- 1 postgres postgres 16777216 May 12 16:06 00000006000000EB -rw------- 1 postgres postgres 16777216 May 12 16:06 00000006000000EC -rw------- 1 postgres postgres 16777216 May 12 16:07 00000006000000ED -rw------- 1 postgres postgres 16777216 May 12 16:07 00000006000000EE -rw------- 1 postgres postgres 16777216 May 12 16:07 00000006000000EF -rw------- 1 postgres postgres 16777216 May 12 16:07 00000006000000F0 -rw------- 1 postgres postgres 16777216 May 12 16:07 00000006000000F1 -rw------- 1 postgres postgres 16777216 May 12 16:08 00000006000000F2 -rw------- 1 postgres postgres 16777216 May 12 16:08 00000006000000F3 -rw------- 1 postgres postgres 16777216 May 12 16:08 00000006000000F4 -rw------- 1 postgres postgres 16777216 May 12 16:08 00000006000000F5 -rw------- 1 postgres postgres 16777216 May 12 16:08 00000006000000F6 -rw------- 1 postgres postgres 16777216 May 12 16:09 00000006000000F7 -rw------- 1 postgres postgres 16777216 May 12 16:09 00000006000000F8 -rw------- 1 postgres postgres 16777216 May 12 16:09 00000006000000F9 -rw------- 1 postgres postgres 16777216 May 12 16:09 00000006000000FA -rw------- 1 postgres postgres 16777216 May 12 16:09 00000006000000FB -rw------- 1 postgres postgres 16777216 May 12 16:09 00000006000000FC -rw------- 1 postgres postgres 16777216 May 12 16:10 00000006000000FD -rw------- 1 postgres postgres 16777216 May 12 16:10 00000006000000FE -rw------- 1 postgres postgres 16777216 May 12 16:10 0000000700000000 -rw------- 1 postgres postgres 16777216 May 12 16:10 0000000700000001 -rw------- 1 postgres postgres 16777216 May 12 16:10 0000000700000002 -rw------- 1 postgres postgres 16777216 May 12 16:11 0000000700000003 -rw------- 1 postgres postgres 16777216 May 12 16:11 0000000700000004 -rw------- 1 postgres postgres 16777216 May 12 16:11 0000000700000005 -rw------- 1 postgres postgres 16777216 May 12 16:11 0000000700000006 -rw------- 1 postgres postgres 16777216 May 12 16:12 0000000700000007 -rw------- 1 postgres postgres 16777216 May 12 16:12 0000000700000008 -rw------- 1 postgres postgres 16777216 May 12 16:12 0000000700000009 -rw------- 1 postgres postgres 16777216 May 12 16:12 000000070000000A -rw------- 1 postgres postgres 16777216 May 12 16:12 000000070000000B -rw------- 1 postgres postgres 16777216 May 12 16:13 000000070000000C -rw------- 1 postgres postgres 16777216 May 12 16:13 000000070000000D -rw------- 1 postgres postgres 16777216 May 12 16:13 000000070000000E -rw------- 1 postgres postgres 16777216 May 12 16:13 000000070000000F -rw------- 1 postgres postgres 16777216 May 12 16:13 0000000700000010 -rw------- 1 postgres postgres 16777216 May 12 16:14 0000000700000011 -rw------- 1 postgres postgres 16777216 May 12 16:14 0000000700000012 -rw------- 1 postgres postgres 16777216 May 12 16:14 0000000700000013 -rw------- 1 postgres postgres 16777216 May 12 16:14 0000000700000014 -rw------- 1 postgres postgres 16777216 May 12 16:15 0000000700000015 -rw------- 1 postgres postgres 16777216 May 12 16:15 0000000700000016 -rw------- 1 postgres postgres 16777216 May 12 16:15 0000000700000017 -rw------- 1 postgres postgres 16777216 May 12 16:15 0000000700000018 -rw------- 1 postgres postgres 16777216 May 12 16:15 0000000700000019 -rw------- 1 postgres postgres 16777216 May 12 16:16 000000070000001A -rw------- 1 postgres postgres 16777216 May 12 16:16 000000070000001B -rw------- 1 postgres postgres 16777216 May 12 16:16 000000070000001C -rw------- 1 postgres postgres 16777216 May 12 16:16 000000070000001D -rw------- 1 postgres postgres 16777216 May 12 16:17 000000070000001E -rw------- 1 postgres postgres 16777216 May 12 16:17 000000070000001F -rw------- 1 postgres postgres 16777216 May 12 16:17 0000000700000020 -rw------- 1 postgres postgres 16777216 May 12 16:17 0000000700000021 -rw------- 1 postgres postgres 16777216 May 12 16:18 0000000700000022 -rw------- 1 postgres postgres 16777216 May 12 16:18 0000000700000023 -rw------- 1 postgres postgres 16777216 May 12 16:18 0000000700000024 -rw------- 1 postgres postgres 16777216 May 12 16:18 0000000700000025 -rw------- 1 postgres postgres 16777216 May 12 16:19 0000000700000026 -rw------- 1 postgres postgres 16777216 May 12 16:19 0000000700000027 -rw------- 1 postgres postgres 16777216 May 12 16:19 0000000700000028 -rw------- 1 postgres postgres 16777216 May 12 16:19 0000000700000029 -rw------- 1 postgres postgres 16777216 May 12 16:20 000000070000002A -rw------- 1 postgres postgres 16777216 May 12 16:20 000000070000002B -rw------- 1 postgres postgres 16777216 May 12 16:20 000000070000002C -rw------- 1 postgres postgres 16777216 May 12 16:20 000000070000002D -rw------- 1 postgres postgres 16777216 May 12 16:20 000000070000002E -rw------- 1 postgres postgres 16777216 May 12 16:21 000000070000002F -rw------- 1 postgres postgres 16777216 May 12 16:21 0000000700000030 -rw------- 1 postgres postgres 16777216 May 12 16:21 0000000700000031 -rw------- 1 postgres postgres 16777216 May 12 16:21 0000000700000032 -rw------- 1 postgres postgres 16777216 May 12 16:22 0000000700000033 -rw------- 1 postgres postgres 16777216 May 12 16:22 0000000700000034 -rw------- 1 postgres postgres 16777216 May 12 16:22 0000000700000035 -rw------- 1 postgres postgres 16777216 May 12 16:22 0000000700000036 -rw------- 1 postgres postgres 16777216 May 12 16:23 0000000700000037 -rw------- 1 postgres postgres 16777216 May 12 16:23 0000000700000038 -rw------- 1 postgres postgres 16777216 May 12 16:23 0000000700000039 -rw------- 1 postgres postgres 16777216 May 12 16:23 000000070000003A -rw------- 1 postgres postgres 16777216 May 12 16:24 000000070000003B -rw------- 1 postgres postgres 16777216 May 12 16:24 000000070000003C -rw------- 1 postgres postgres 16777216 May 12 16:24 000000070000003D -rw------- 1 postgres postgres 16777216 May 12 16:24 000000070000003E -rw------- 1 postgres postgres 16777216 May 12 16:25 000000070000003F -rw------- 1 postgres postgres 16777216 May 12 16:25 0000000700000040 -rw------- 1 postgres postgres 16777216 May 12 16:25 0000000700000041 -rw------- 1 postgres postgres 16777216 May 12 16:25 0000000700000042 -rw------- 1 postgres postgres 16777216 May 12 16:26 0000000700000043 -rw------- 1 postgres postgres 16777216 May 12 16:26 0000000700000044 -rw------- 1 postgres postgres 16777216 May 12 16:26 0000000700000045 -rw------- 1 postgres postgres 16777216 May 12 16:26 0000000700000046 -rw------- 1 postgres postgres 16777216 May 12 16:26 0000000700000047 -rw------- 1 postgres postgres 16777216 May 12 16:27 0000000700000048 -rw------- 1 postgres postgres 16777216 May 12 16:27 0000000700000049 -rw------- 1 postgres postgres 16777216 May 12 16:27 000000070000004A -rw------- 1 postgres postgres 16777216 May 12 16:27 000000070000004B -rw------- 1 postgres postgres 16777216 May 12 16:28 000000070000004C -rw------- 1 postgres postgres 16777216 May 12 16:28 000000070000004D -rw------- 1 postgres postgres 16777216 May 12 16:28 000000070000004E -rw------- 1 postgres postgres 16777216 May 12 16:28 000000070000004F -rw------- 1 postgres postgres 16777216 May 12 16:29 0000000700000050 -rw------- 1 postgres postgres 16777216 May 12 16:29 0000000700000051 -rw------- 1 postgres postgres 16777216 May 12 16:29 0000000700000052 -rw------- 1 postgres postgres 16777216 May 12 16:29 0000000700000053 -rw------- 1 postgres postgres 16777216 May 12 16:30 0000000700000054 -rw------- 1 postgres postgres 16777216 May 12 16:30 0000000700000055 -rw------- 1 postgres postgres 16777216 May 12 16:30 0000000700000056 -rw------- 1 postgres postgres 16777216 May 12 16:30 0000000700000057 -rw------- 1 postgres postgres 16777216 May 12 16:31 0000000700000058 -rw------- 1 postgres postgres 16777216 May 12 16:31 0000000700000059 -rw------- 1 postgres postgres 16777216 May 12 16:31 000000070000005A -rw------- 1 postgres postgres 16777216 May 12 16:32 000000070000005B -rw------- 1 postgres postgres 16777216 May 12 16:32 000000070000005C -rw------- 1 postgres postgres 16777216 May 12 16:32 000000070000005D -rw------- 1 postgres postgres 16777216 May 12 16:32 000000070000005E -rw------- 1 postgres postgres 16777216 May 12 16:33 000000070000005F -rw------- 1 postgres postgres 16777216 May 12 16:33 0000000700000060 -rw------- 1 postgres postgres 16777216 May 12 16:33 0000000700000061 -rw------- 1 postgres postgres 16777216 May 12 16:33 0000000700000062 -rw------- 1 postgres postgres 16777216 May 12 16:34 0000000700000063 -rw------- 1 postgres postgres 16777216 May 12 16:34 0000000700000064 -rw------- 1 postgres postgres 16777216 May 12 16:34 0000000700000065 -rw------- 1 postgres postgres 16777216 May 12 16:34 0000000700000066 -rw------- 1 postgres postgres 16777216 May 12 16:34 0000000700000067 -rw------- 1 postgres postgres 16777216 May 12 16:35 0000000700000068 -rw------- 1 postgres postgres 16777216 May 12 16:35 0000000700000069 -rw------- 1 postgres postgres 16777216 May 12 16:35 000000070000006A -rw------- 1 postgres postgres 16777216 May 12 16:35 000000070000006B -rw------- 1 postgres postgres 16777216 May 12 16:36 000000070000006C -rw------- 1 postgres postgres 16777216 May 12 16:36 000000070000006D -rw------- 1 postgres postgres 16777216 May 12 16:36 000000070000006E -rw------- 1 postgres postgres 16777216 May 12 16:36 000000070000006F -rw------- 1 postgres postgres 16777216 May 12 16:45 0000000700000070
"Jeffrey W. Baker" <jwbaker@acm.org> writes: > I have configured 24 checkpoint segments, > which I expect gives me a worst-case usage in pg_xlog of 384MB. > Unfortunately, during the CREATE INDEX, pg_xlog becomes full! Postgres version? AFAIR 7.1.2 or thereabouts would act that way, because it wouldn't discard log segments containing work of open transactions. The only reason I can think of for such behavior in a current version is if the checkpoint isn't completing for some reason. Have you looked in the postmaster log to see if checkpoint is bleating about anything? Note that your calculation is off anyway: per documentation, recent versions will allow up to 2*checkpoint_segments+1 WAL files to exist. This is because we save the last two checkpoints worth of data. regards, tom lane
On Wed, 2004-05-12 at 20:18, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > I have configured 24 checkpoint segments, > > which I expect gives me a worst-case usage in pg_xlog of 384MB. > > Unfortunately, during the CREATE INDEX, pg_xlog becomes full! > > Postgres version? AFAIR 7.1.2 or thereabouts would act that way, > because it wouldn't discard log segments containing work of open > transactions. This is 7.4.1. > The only reason I can think of for such behavior in a current version > is if the checkpoint isn't completing for some reason. Have you looked > in the postmaster log to see if checkpoint is bleating about anything? Oh sure, it's bleating. Apparently my computer is too fast: syslog:May 12 21:20:32 mistral postgres[17458]: [214-1] LOG: checkpoints are occurring too frequently (12 seconds apart) syslog:May 12 21:20:32 mistral postgres[17458]: [214-2] HINT: Consider increasing the configuration parameter "checkpoint_segments". That's also what preceeded the most recent crash: May 12 16:36:53 mistral postgres[506]: [173-1] LOG: checkpoints are occurring too frequently (16 seconds apart) May 12 16:36:53 mistral postgres[506]: [173-2] HINT: Consider increasing the configuration parameter "checkpoint_segments". May 12 16:37:08 mistral postgres[506]: [174-1] LOG: server process (PID 16403) was terminated by signal 6 May 12 16:37:08 mistral postgres[506]: [175-1] LOG: terminating any other active server processes May 12 16:37:08 mistral postgres[26212]: [3-1] WARNING: terminating connection because of crash of another server process May 12 16:37:08 mistral postgres[26212]: [3-2] DETAIL: The postmaster has commanded this server process to roll back thecurrent transaction and exit, because another server May 12 16:37:08 mistral postgres[26212]: [3-3] process exited abnormally and possibly corrupted shared memory. May 12 16:37:08 mistral postgres[26212]: [3-4] HINT: In a moment you should be able to reconnect to the database and repeatyour command. May 12 16:37:08 mistral postgres[14396]: [1-1] WARNING: terminating connection because of crash of another server process May 12 16:37:08 mistral postgres[14396]: [1-2] DETAIL: The postmaster has commanded this server process to roll back thecurrent transaction and exit, because another server May 12 16:37:08 mistral postgres[14396]: [1-3] process exited abnormally and possibly corrupted shared memory. May 12 16:37:08 mistral postgres[14396]: [1-4] HINT: In a moment you should be able to reconnect to the database and repeatyour command. May 12 16:37:08 mistral postgres[14396]: [1-5] CONTEXT: writing block 56889 of relation 17142/289600495 May 12 16:37:08 mistral postgres[506]: [176-1] LOG: all server processes terminated; reinitializing May 12 16:37:08 mistral postgres[17086]: [177-1] LOG: database system was interrupted at 2004-05-12 16:36:53 PDT May 12 16:37:08 mistral postgres[17086]: [178-1] LOG: checkpoint record is at 6/92B32D48 May 12 16:37:08 mistral postgres[17086]: [179-1] LOG: redo record is at 6/92B32D48; undo record is at 0/0; shutdown FALSE May 12 16:37:08 mistral postgres[17086]: [180-1] LOG: next transaction ID: 10232; next OID: 289600632 May 12 16:37:08 mistral postgres[17086]: [181-1] LOG: database system was not properly shut down; automatic recovery inprogress May 12 16:37:08 mistral postgres[17086]: [182-1] LOG: redo starts at 6/92B32D88 May 12 16:45:29 mistral postgres[17086]: [183-1] LOG: could not open file "/var/lib/postgres/data/pg_xlog/0000000700000071"(log file 7, segment 113): No such file or directory May 12 16:45:29 mistral postgres[17086]: [184-1] LOG: redo done at 7/70FFC400 May 12 16:45:50 mistral postgres[506]: [177-1] LOG: startup process (PID 17086) was terminated by signal 6 May 12 16:45:50 mistral postgres[506]: [178-1] LOG: aborting startup due to startup process failure I also have many "recycled" messages: May 12 13:10:01 mistral postgres[16373]: [20-1] LOG: recycled transaction log file "0000000600000078" May 12 13:10:01 mistral postgres[16373]: [21-1] LOG: recycled transaction log file "0000000600000079" May 12 13:10:01 mistral postgres[16373]: [22-1] LOG: recycled transaction log file "000000060000007A" May 12 13:10:01 mistral postgres[16373]: [23-1] LOG: recycled transaction log file "000000060000007B" May 12 13:10:01 mistral postgres[16373]: [24-1] LOG: recycled transaction log file "000000060000007C" May 12 13:20:03 mistral postgres[16408]: [1-1] LOG: removing transaction log file "0000000600000090" May 12 13:20:03 mistral postgres[16408]: [2-1] LOG: removing transaction log file "0000000600000091" > Note that your calculation is off anyway: per documentation, recent > versions will allow up to 2*checkpoint_segments+1 WAL files to exist. > This is because we save the last two checkpoints worth of data. That's why I gave it an order of magnitude leeway. Such calculations are always off by a factor or two. It's doing it again at this very moment, I get the feeling that pg_xlog is converging on the same size as $PGDATA/base/$OID/pgsql_tmp: mistral:/backup_temp# du -hs pg_xlog /var/lib/postgres/data/base/17142/pgsql_tmp/ 4.8G pg_xlog 5.4G /var/lib/postgres/data/base/17142/pgsql_tmp -jwb
On Wed, 2004-05-12 at 21:31, Jeffrey W. Baker wrote: > On Wed, 2004-05-12 at 20:18, Tom Lane wrote: > > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > > I have configured 24 checkpoint segments, > > > which I expect gives me a worst-case usage in pg_xlog of 384MB. > > > Unfortunately, during the CREATE INDEX, pg_xlog becomes full! > > > > Postgres version? AFAIR 7.1.2 or thereabouts would act that way, > > because it wouldn't discard log segments containing work of open > > transactions. The CREATE INDEX finally returned, and none of the xlog space was reclaimed. The terminal size is 6.1GB. Interestingly, there is a continuous group of xlogs that haven't been touched since data started being copied from pgsql_tmp to ../. (what do you call this stage of the index build?) In reverse order of modification time: May 12 21:55 00000008000000F7 May 12 21:49 00000008000000F6 May 12 21:49 00000008000000F5 [... decreasing by 1 until ...] May 12 20:40 000000070000008C May 12 20:40 000000070000008B May 12 20:40 000000070000008A [the ones below have higher numbers, but aren't being written to!] May 12 20:40 0000000900000010 May 12 20:40 000000090000000F May 12 20:39 000000090000000E May 12 20:39 000000090000000D May 12 20:39 000000090000000C May 12 20:39 000000090000000B May 12 20:39 000000090000000A May 12 20:39 0000000900000009 May 12 20:39 0000000900000008 May 12 20:39 0000000900000007 May 12 20:39 0000000900000006 May 12 20:39 0000000900000005 May 12 20:39 0000000900000004 May 12 20:38 0000000900000003 May 12 20:38 0000000900000002 May 12 20:38 0000000900000001 May 12 20:38 0000000900000000 May 12 20:38 00000008000000FE May 12 20:38 00000008000000FD May 12 20:38 00000008000000FC May 12 20:38 00000008000000FB May 12 20:38 00000008000000FA May 12 20:38 00000008000000F9 May 12 20:37 00000008000000F8 Not coincidentally, there are 24 of these oddballs and 24 is also the number of checkpoint segments I have configured. -jwb
"Jeffrey W. Baker" <jwbaker@acm.org> writes: > Oh sure, it's bleating. Apparently my computer is too fast: I don't think the checkpoint process is completing. > May 12 16:37:08 mistral postgres[506]: [174-1] LOG: server process (PID 16403) was terminated by signal 6 You need to find out why these aborts (presumably Assert failures) are occurring. Having just looked at the source code, I see that Assert messages are never sent to syslog only to stderr. So you'll want to set things up to capture the postmaster's stderr instead of discarding it. regards, tom lane
On Thu, 2004-05-13 at 06:10, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > Oh sure, it's bleating. Apparently my computer is too fast: > > I don't think the checkpoint process is completing. > > > May 12 16:37:08 mistral postgres[506]: [174-1] LOG: server process (PID 16403) was terminated by signal 6 > > You need to find out why these aborts (presumably Assert failures) are > occurring. Having just looked at the source code, I see that Assert > messages are never sent to syslog only to stderr. So you'll want to > set things up to capture the postmaster's stderr instead of discarding it. The server process aborted when the device filled up. -jwb
"Jeffrey W. Baker" <jwbaker@acm.org> writes: > Sorry, my last mail got cut off. The server aborted because it couldn't > write the xlog. Looks like I omitted this from my last mail: Selective quoting of the log output? Naughty naughty. However, that still doesn't explain how you got into the current state. Had you once had checkpoint_segments set much higher than the current value of 24? On looking at the code I see that it doesn't make any attempt to prune future log segments after a decrease in checkpoint_segments, so if a previous misconfiguration had allowed the number of future segments to get really large, that could be the root of the issue. regards, tom lane
On Thu, 2004-05-13 at 09:28, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > Sorry, my last mail got cut off. The server aborted because it couldn't > > write the xlog. Looks like I omitted this from my last mail: > > Selective quoting of the log output? Naughty naughty. > > However, that still doesn't explain how you got into the current state. > Had you once had checkpoint_segments set much higher than the current > value of 24? On looking at the code I see that it doesn't make any > attempt to prune future log segments after a decrease in > checkpoint_segments, so if a previous misconfiguration had allowed the > number of future segments to get really large, that could be the root of > the issue. The database where this happened was freshly initialized and this was one of its first operations. I think I will write a small example script so you can reproduce it locally. Give me a few hours or so. -jwb
[For whatever reason this got delayed in delivery. Sending again from another route. -jwb] On Thu, 2004-05-13 at 09:28, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > Sorry, my last mail got cut off. The server aborted because it couldn't > > write the xlog. Looks like I omitted this from my last mail: > > Selective quoting of the log output? Naughty naughty. > > However, that still doesn't explain how you got into the current state. > Had you once had checkpoint_segments set much higher than the current > value of 24? On looking at the code I see that it doesn't make any > attempt to prune future log segments after a decrease in > checkpoint_segments, so if a previous misconfiguration had allowed the > number of future segments to get really large, that could be the root of > the issue. Okay, I installed a fresh, completely stock 7.4.2 and did the following: #!/bin/sh createdb growxlog echo "create table data (a int, b int, c int, d int, e int)" | psql growxlog perl -e 'use POSIX qw(floor); print "COPY data FROM STDIN;\n"; for ($i = 0; $i < 100000000; $i++) {print(join("\t", $i, floor(rand()*1000000),floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000)), "\n")}' | psql growxlog echo "create unique index data_pkey on data(a,b,c)" | psql growxlog The result was a table with 100 million rows, a 5.3GB data table, a 2.7GB index, and 2.6GB in pg_xlog. Reproducable every time. For the less patient, the problem is also reproduceable at only 10 million rows (xlog = 337MB), but not at 1 million rows (presumably the whole mess fits inside the usual # of segments). You'll need approximately rows*125 bytes of free space to run the test. -jwb
"Jeffrey W. Baker" <jwbaker@acm.org> writes: > Okay, I installed a fresh, completely stock 7.4.2 and did the following: > createdb growxlog > echo "create table data (a int, b int, c int, d int, e int)" | psql growxlog > perl -e 'use POSIX qw(floor); print "COPY data FROM STDIN;\n"; for ($i = 0; $i < 100000000; $i++) {print(join("\t", $i,floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000)), "\n")}' | psql growxlog > echo "create unique index data_pkey on data(a,b,c)" | psql growxlog I tried this locally, and what I see happening is that a checkpoint process starts shortly after the CREATE INDEX begins whomping out the index data --- but it doesn't finish until after the CREATE INDEX does. AFAICS there is not any sort of locking problem, it's just that the CREATE INDEX is chewing all the I/O bandwidth. If we could get some more checkpoints pushed through then the xlog would get truncated, but it's not happening. I'm running this on a development machine with an ok CPU and junk consumer-grade-IDE disk drive, so lack of I/O bandwidth is hardly surprising; can anyone confirm the observation on better hardware? I think one reason for the problem is that btree CREATE INDEX is as bad as VACUUM, if not worse, with respect to chewing all available shared buffers. It will fill shared buffers with newly-created dirty pages --- and I'll bet the CHECKPOINT process is getting stuck with dumping the majority of those pages to disk. It might be worth tweaking CREATE INDEX so it pushes out the newly created pages for itself, thus limiting the number of dirtied buffers it creates. regards, tom lane
On Thu, 2004-05-13 at 21:00, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > Okay, I installed a fresh, completely stock 7.4.2 and did the following: > > > createdb growxlog > > echo "create table data (a int, b int, c int, d int, e int)" | psql growxlog > > perl -e 'use POSIX qw(floor); print "COPY data FROM STDIN;\n"; for ($i = 0; $i < 100000000; $i++) {print(join("\t", $i,floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000), floor(rand()*1000000)), "\n")}' | psql growxlog > > echo "create unique index data_pkey on data(a,b,c)" | psql growxlog > > I tried this locally, and what I see happening is that a checkpoint > process starts shortly after the CREATE INDEX begins whomping out the > index data --- but it doesn't finish until after the CREATE INDEX does. > AFAICS there is not any sort of locking problem, it's just that the > CREATE INDEX is chewing all the I/O bandwidth. If we could get some > more checkpoints pushed through then the xlog would get truncated, but > it's not happening. > > I'm running this on a development machine with an ok CPU and junk > consumer-grade-IDE disk drive, so lack of I/O bandwidth is hardly > surprising; can anyone confirm the observation on better hardware? We're running it on some pretty manly hardware (dual opterons, 8gb main memory, 4-way SATA RAIDs), so it happens on both ends of the spectrum. -jwb
"Jeffrey W. Baker" <jwbaker@acm.org> writes: > On Thu, 2004-05-13 at 21:00, Tom Lane wrote: >> I tried this locally, and what I see happening is that a checkpoint >> process starts shortly after the CREATE INDEX begins whomping out the >> index data --- but it doesn't finish until after the CREATE INDEX does. >> AFAICS there is not any sort of locking problem, it's just that the >> CREATE INDEX is chewing all the I/O bandwidth. If we could get some >> more checkpoints pushed through then the xlog would get truncated, but >> it's not happening. >> >> I'm running this on a development machine with an ok CPU and junk >> consumer-grade-IDE disk drive, so lack of I/O bandwidth is hardly >> surprising; can anyone confirm the observation on better hardware? > We're running it on some pretty manly hardware (dual opterons, 8gb main > memory, 4-way SATA RAIDs), so it happens on both ends of the spectrum. What I wanted to know was whether you see a checkpoint process persisting throughout the write-the-index phase of the CREATE INDEX. There probably will be one at pretty nearly all times, but is it always the same one (same PID)? regards, tom lane
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: TL> However, that still doesn't explain how you got into the current state. TL> Had you once had checkpoint_segments set much higher than the current TL> value of 24? On looking at the code I see that it doesn't make any TL> attempt to prune future log segments after a decrease in TL> checkpoint_segments, so if a previous misconfiguration had allowed the TL> number of future segments to get really large, that could be the root of TL> the issue. Wow... that explains it! I bumped up checkpoint segments to 50 for a restore since it made it run way faster. In normal operation I don't need that many so I dropped it back down but it didn't reclaim any space so I figured I might as well keep it at 50... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Vivek Khera <khera@kcilink.com> writes: > "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: > TL> ... On looking at the code I see that it doesn't make any > TL> attempt to prune future log segments after a decrease in > TL> checkpoint_segments, so if a previous misconfiguration had allowed the > TL> number of future segments to get really large, that could be the root of > TL> the issue. > Wow... that explains it! > I bumped up checkpoint segments to 50 for a restore since it made it > run way faster. In normal operation I don't need that many so I > dropped it back down but it didn't reclaim any space so I figured I > might as well keep it at 50... How long did you wait? I believe the code will prune excess segments as they come around to be recycled. It just doesn't kill them immediately. I think that part of what's going on in Jeff's example is that he's looking at the state immediately after a spike in database traffic, and not having any patience to see if the system will recover after some period with more-normal traffic levels. regards, tom lane
I wrote: > I tried this locally, and what I see happening is that a checkpoint > process starts shortly after the CREATE INDEX begins whomping out the > index data --- but it doesn't finish until after the CREATE INDEX does. > AFAICS there is not any sort of locking problem, I have to take that back: there definitely is a locking problem. Perhaps there is an I/O bandwidth issue too. What I see happening on closer analysis is that btree CREATE INDEX can hold "exclusive context lock" on some shared buffers for significant periods of time. It tries to write all the levels of the btree in parallel, so it is spitting out level-zero pages at a great rate, level-one pages at a lesser rate, etc. For a large index there could be many btree levels, and pages in the higher levels will be held locked in the shared buffer arena for considerable periods. CHECKPOINT scans the shared buffer arena and tries to write every dirty page it finds. This requires getting shared context lock, and so will be blocked by the lock CREATE INDEX is holding. I am toying with the idea that CREATE INDEX shouldn't use the shared buffer manager at all; there is no need for other backends to touch the index until the creating transaction commits. We'd need to be able to fsync the index file before committing. That isn't part of the smgr API right now, but could easily be added. regards, tom lane
On Sat, May 15, 2004 at 12:23:18AM -0400, Tom Lane wrote: > What I see happening on closer analysis is that btree CREATE INDEX can > hold "exclusive context lock" on some shared buffers for significant > periods of time. It tries to write all the levels of the btree in > parallel, so it is spitting out level-zero pages at a great rate, > level-one pages at a lesser rate, etc. For a large index there could > be many btree levels, and pages in the higher levels will be held locked > in the shared buffer arena for considerable periods. I'm not sure how btree from-scratch-building work, but could it be made so that it first build level 0 completely, scanning the heap; then build level 1 scanning the level 0, and so on? I also wonder why index creation would write XLog information; it won't be used to recreate the tree if the machine goes down while the index is created. If you plan on not using the shared bufmgr and issuing a sync before comit, you could get rid of logging during build ... -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Investigación es lo que hago cuando no sé lo que estoy haciendo" (Wernher von Braun)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Sat, May 15, 2004 at 12:23:18AM -0400, Tom Lane wrote: >> ... It tries to write all the levels of the btree in >> parallel, so it is spitting out level-zero pages at a great rate, >> level-one pages at a lesser rate, etc. For a large index there could >> be many btree levels, and pages in the higher levels will be held locked >> in the shared buffer arena for considerable periods. > I'm not sure how btree from-scratch-building work, but could it be made > so that it first build level 0 completely, scanning the heap; then build > level 1 scanning the level 0, and so on? I don't think that would be a win; it would require an extra read pass over nearly all of the index. The idea here is to reduce the amount of I/O, not increase it. > I also wonder why index creation would write XLog information; Yeah, that was implicit in my comment about not using the buffer manager, and it was why I was concerned about being able to fsync the index file. Without xlog for backup, you'd better be able to force the index to disk before you commit. It turns out that btree index build currently writes each page three times: once to extend the file, once into WAL, and then finally to write the completed index page. It is difficult to do any better as long as you're using the shared buffer manager. (When extending a file, the buffer manager wants to allocate disk space before it will give you a buffer to write into. This is the right thing for heap insertions and extending an existing index, because we don't force buffers to disk before commit. So if it didn't work that way, you might commit your transaction before finding out there's no disk space to hold the data. But we could dispense with that worry for index build if we ensure that we write and fsync all the pages before commit.) So I was thinking about keeping the same index build process, but working in private storage instead of shared buffers, and writing direct to smgr. This would not only eliminate the locking conflict against checkpoints, but reduce the amount of I/O involved quite a lot. regards, tom lane
On Fri, 2004-05-14 at 20:55, Tom Lane wrote: > Vivek Khera <khera@kcilink.com> writes: > > "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > TL> ... On looking at the code I see that it doesn't make any > > TL> attempt to prune future log segments after a decrease in > > TL> checkpoint_segments, so if a previous misconfiguration had allowed the > > TL> number of future segments to get really large, that could be the root of > > TL> the issue. > > > Wow... that explains it! > > > I bumped up checkpoint segments to 50 for a restore since it made it > > run way faster. In normal operation I don't need that many so I > > dropped it back down but it didn't reclaim any space so I figured I > > might as well keep it at 50... > > How long did you wait? I believe the code will prune excess segments as > they come around to be recycled. It just doesn't kill them immediately. > > I think that part of what's going on in Jeff's example is that he's > looking at the state immediately after a spike in database traffic, and > not having any patience to see if the system will recover after some > period with more-normal traffic levels. It won't ever return to normal levels after the device fills up, the server panics, and all the databases are inconsistent. -jwb