Re: WAL recycling, Linux 2.4.18 - Mailing list pgsql-general
From | Doug Fields |
---|---|
Subject | Re: WAL recycling, Linux 2.4.18 |
Date | |
Msg-id | 5.1.0.14.2.20020708134721.02f23578@pop.pexicom.com Whole thread Raw |
In response to | WAL recycling, ext3, Linux 2.4.18 (Doug Fields <dfields-pg-general@pexicom.com>) |
Responses |
Re: WAL recycling, Linux 2.4.18
Hardware for PG |
List | pgsql-general |
Hi Tom, all, >Also, could you do the checkpoint manually and get a stack trace from >that backend while others are hung up? Yes, see below. >directly try to acquire ControlFileLock. In any case it's hard to >credit that the recycling process could take 90 seconds to rename a >dozen or so files. If you have a gdb attached to a process doing a >manual checkpoint, it would be fairly easy to see how long >MoveOfflineLogs() runs. (Set a breakpoint at its start, when control >reaches the breakpoint issue "fin" and see how long it takes to come >back.) Two things: 1) Remounted my ext3 filesystems as ext2 to rule out an ext3fs related problem; the problems persist, so it's probably not an ext3 thing 2) Doing as you suggest, attaching to a /usr/lib/postgresql/bin/postgres process from which I run a manual checkpoint results in the following behavior: I was able to set a breakpoint at CreateCheckPoint - gdb never found a MoveOfflineLogs for me to set a breakpoint. Doing a single step from the breakpoint after the CreateCheckPoint takes quite a few moments (perhaps 10-30 seconds, but I didn't use a stopwatch). During this time control-C has no effect, and when it does take effect, it leaves me at the displayed location below. Single stepping from mdsync() to smgrsync() also seems to take a few seconds (perhaps 3). My GDB output is below. The checkpoint seems to take 42 seconds to complete: pexicast_lg=# select now(); checkpoint; select now(); now ------------------------------- 2002-07-08 13:57:57.918766-04 (1 row) CHECKPOINT now ------------------------------- 2002-07-08 13:58:39.790787-04 (1 row) I was wondering if it could be the sync function call, but I sit there on the server and type sync until I'm blue in the face and they seem to run so fast I don't even notice any delay. Also, I can't seem to recompile PostgreSQL because Debian can't find a tclConfig.sh. Thanks, Doug (gdb) c Continuing. Breakpoint 1, 0x08087ae5 in CreateCheckPoint () (gdb) where #0 0x08087ae5 in CreateCheckPoint () #1 0x08111066 in ProcessUtility () #2 0x0810ecc5 in pg_exec_query_string () #3 0x0810fd5e in PostgresMain () #4 0x080f6d4e in ClosePostmasterPorts () #5 0x080f669f in ClosePostmasterPorts () #6 0x080f5882 in PostmasterMain () #7 0x080f5391 in PostmasterMain () #8 0x080d4e18 in main () #9 0x401d114f in __libc_start_main () from /lib/libc.so.6 (gdb) n Single stepping until exit from function CreateCheckPoint, which has no line number information. Program received signal SIGINT, Interrupt. 0x40282967 in sync () from /lib/libc.so.6 (gdb) where #0 0x40282967 in sync () from /lib/libc.so.6 #1 0x0810d167 in mdsync () #2 0x0810de5f in smgrsync () #3 0x081036d8 in FlushBufferPool () #4 0x08087d13 in CreateCheckPoint () #5 0x08111066 in ProcessUtility () #6 0x0810ecc5 in pg_exec_query_string () #7 0x0810fd5e in PostgresMain () #8 0x080f6d4e in ClosePostmasterPorts () #9 0x080f669f in ClosePostmasterPorts () #10 0x080f5882 in PostmasterMain () #11 0x080f5391 in PostmasterMain () #12 0x080d4e18 in main () #13 0x401d114f in __libc_start_main () from /lib/libc.so.6 (gdb) break MoveOfflineLogs Function "MoveOfflineLogs" not defined. (gdb) break MoveOfflineLog Function "MoveOfflineLog" not defined. (gdb) break MoveOfflineLogs() Function "MoveOfflineLogs()" not defined. (gdb) s Single stepping until exit from function sync, which has no line number information. 0x0810d167 in mdsync () (gdb) s Single stepping until exit from function mdsync, which has no line number information. 0x0810de5f in smgrsync () (gdb) s Single stepping until exit from function smgrsync, which has no line number information. 0x081036d8 in FlushBufferPool () (gdb) where #0 0x081036d8 in FlushBufferPool () #1 0x08087d13 in CreateCheckPoint () #2 0x08111066 in ProcessUtility () #3 0x0810ecc5 in pg_exec_query_string () #4 0x0810fd5e in PostgresMain () #5 0x080f6d4e in ClosePostmasterPorts () #6 0x080f669f in ClosePostmasterPorts () #7 0x080f5882 in PostmasterMain () #8 0x080f5391 in PostmasterMain () #9 0x080d4e18 in main () #10 0x401d114f in __libc_start_main () from /lib/libc.so.6 (gdb) s Single stepping until exit from function FlushBufferPool, which has no line number information. 0x08087d13 in CreateCheckPoint () (gdb) where #0 0x08087d13 in CreateCheckPoint () #1 0x08111066 in ProcessUtility () #2 0x0810ecc5 in pg_exec_query_string () #3 0x0810fd5e in PostgresMain () #4 0x080f6d4e in ClosePostmasterPorts () #5 0x080f669f in ClosePostmasterPorts () #6 0x080f5882 in PostmasterMain () #7 0x080f5391 in PostmasterMain () #8 0x080d4e18 in main () #9 0x401d114f in __libc_start_main () from /lib/libc.so.6 (gdb) s Single stepping until exit from function CreateCheckPoint, which has no line number information. 0x08085ac4 in XLogFlush () (gdb) where #0 0x08085ac4 in XLogFlush () #1 0x08111066 in ProcessUtility () #2 0x0810ecc5 in pg_exec_query_string () #3 0x0810fd5e in PostgresMain () #4 0x080f6d4e in ClosePostmasterPorts () #5 0x080f669f in ClosePostmasterPorts () #6 0x080f5882 in PostmasterMain () #7 0x080f5391 in PostmasterMain () #8 0x080d4e18 in main () #9 0x401d114f in __libc_start_main () from /lib/libc.so.6 (gdb) s Single stepping until exit from function XLogFlush, which has no line number information. Program received signal SIGINT, Interrupt. 0x402887f2 in recv () from /lib/libc.so.6 (gdb) where #0 0x402887f2 in recv () from /lib/libc.so.6 #1 0x080d42bc in StreamClose () #2 0x080d430d in pq_getbyte () #3 0x0810e7c8 in HandleFunctionRequest () #4 0x0810e837 in HandleFunctionRequest () #5 0x0810fc5e in PostgresMain () #6 0x080f6d4e in ClosePostmasterPorts () #7 0x080f669f in ClosePostmasterPorts () #8 0x080f5882 in PostmasterMain () #9 0x080f5391 in PostmasterMain () #10 0x080d4e18 in main () #11 0x401d114f in __libc_start_main () from /lib/libc.so.6 (gdb) c Continuing.
pgsql-general by date: