Re: How to investiage slow insert problem - Mailing list pgsql-performance
From | Rural Hunter |
---|---|
Subject | Re: How to investiage slow insert problem |
Date | |
Msg-id | 521418DA.6020608@gmail.com Whole thread Raw |
In response to | How to investiage slow insert problem (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-performance |
Hi Jeff, Thanks a lot for such a detailed guide! 于 2013/8/21 8:24, Jeff Janes 写道: > > OK, that certainly does sound like network problems and not disk > contention. But what I don't see is why it would be listed as > "active" in pg_stat_activity. If it is blocked on a network > connection, I would think it would show 'idle'. > > strace and gdb are aggressive forms of monitoring, so you should talk > to your sys admins before using them (if your organization has that > type of segregation of duties) or read their documentation. > > It is best to run them on a test system if you can get the problem to > occur there. If you can't, then I would be willing to run them > against a hung backend on a production system, but I might be more > adventurous than most. > > I have occasionally seen strace cause the straced program to > seg-fault, but I've only seen this with GUI programs and never with > postgresql. > > In theory attaching gdb could make a backend pause while holding a > spinlock, and if left that way for long enough could cause a > postgresql-wide panic over a stuck spinlock. But I've never seen this > happen without intentionally causing. If the backend you attach is > truly hung and hasn't already causes a panic, then it almost surely > can't be happen, but just to be sure you should quit gdb as soon as > possible so that the backend can continue unimpeded. > > Anyway, both are fairly easy to run once you find the pid of one of a > stuck backend (either using ps, or using pg_stat_activity). Then you > give the pid to the debugging program with -p option. > > You probably have to run as the postgres user, or you won't have > permissions to attach to the backend. > > With strace, once you attach you will see a stream of system calls go > to your screen, until you hit ctrl-C to detach. But if the backend is > hung on a network connection, there should really only be one system > call, and then it just wait until you detach or the network connection > times out, like this: > > $ strace -p 21116 > Process 21116 attached - interrupt to quit > recvfrom(9, > > So it is waiting on a recvfrom call, and that call never returns until > I got sick of waiting and hit ctrl-C. Not very interesting, but it > does show it is indeed stuck on the network > > for gdb, it is similar to invoke: > > $ gdb -p 21116 > > and it then produces several screenfuls of diagnostic gibberish and > gives you an interactive command-line environment. Once attached, you > want to get a backtrace ("bt", return), and then quit promptly ("q", > return, "y"). > > That produces something like this: > > Loaded symbols for /lib64/libnss_files.so.2 > 0x00000032a80e9672 in __libc_recv (fd=<value optimized out>, > buf=0xb33f60, n=8192, flags=0) at > ../sysdeps/unix/sysv/linux/x86_64/recv.c:30 > 30 return INLINE_SYSCALL (recvfrom, 6, fd, buf, n, flags, > NULL, NULL); > (gdb) bt > #0 0x00000032a80e9672 in __libc_recv (fd=<value optimized out>, > buf=0xb33f60, n=8192, flags=0) at > ../sysdeps/unix/sysv/linux/x86_64/recv.c:30 > #1 0x00000000005a4846 in secure_read (port=0x22f1190, ptr=0xb33f60, > len=8192) at be-secure.c:304 > #2 0x00000000005ae33b in pq_recvbuf () at pqcomm.c:824 > #3 0x00000000005ae73b in pq_getbyte () at pqcomm.c:865 > #4 0x0000000000651c11 in SocketBackend (argc=<value optimized out>, > argv=<value optimized out>, dbname=0x22d2a28 "jjanes", username=<value > optimized out>) > at postgres.c:342 > #5 ReadCommand (argc=<value optimized out>, argv=<value optimized > out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>) at > postgres.c:490 > #6 PostgresMain (argc=<value optimized out>, argv=<value optimized > out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>) at > postgres.c:3923 > #7 0x000000000060e861 in BackendRun () at postmaster.c:3614 > #8 BackendStartup () at postmaster.c:3304 > #9 ServerLoop () at postmaster.c:1367 > #10 0x00000000006113b1 in PostmasterMain (argc=<value optimized out>, > argv=<value optimized out>) at postmaster.c:1127 > #11 0x00000000005b0440 in main (argc=3, argv=0x22d1cb0) at main.c:199 > (gdb) q > A debugging session is active. > > Inferior 1 [process 21116] will be detached. > > Quit anyway? (y or n) y > Detaching from program: /usr/local/pgsql9_2/bin/postgres, process 21116 > > Cheers, > > Jeff
pgsql-performance by date: