Stats collector performance improvement - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Stats collector performance improvement |
Date | |
Msg-id | 200601021840.k02Ieed21704@candle.pha.pa.us Whole thread Raw |
Responses |
Re: Stats collector performance improvement
Re: Stats collector performance improvement Re: Stats collector performance improvement Re: Stats collector performance improvement |
List | pgsql-patches |
Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > Further tests show that for this application > > the killer is stats_command_string, not stats_block_level or > > stats_row_level. > > I tried it with pgbench -c 10, and got these results: > 41% reduction in TPS rate for stats_command_string > 9% reduction in TPS rate for stats_block/row_level (any combination) > > strace'ing a backend confirms my belief that stats_block/row_level send > just one stats message per transaction (at least for the relatively > small number of tables touched per transaction by pgbench). However > stats_command_string sends 14(!) --- there are seven commands per > pgbench transaction and each results in sending a <command> message and > later an <IDLE> message. > > Given the rather lackadaisical way in which the stats collector makes > the data available, it seems like the backends are being much too > enthusiastic about posting their stats_command_string status > immediately. Might be worth thinking about how to cut back the > overhead by suppressing some of these messages. I did some research on this because the numbers Tom quotes indicate there is something wrong in the way we process stats_command_string statistics. I made a small test script: if [ ! -f /tmp/pgstat.sql ] then i=0 while [ $i -lt 10000 ] do i=`expr $i + 1` echo "SELECT 1;" done > /tmp/pgstat.sql fi time sql test </tmp/pgstat.sql >/dev/null This sends 10,000 "SELECT 1" queries to the backend, and reports the execution time. I found that without stats_command_string defined, it ran in 3.5 seconds. With stats_command_string defined, it took 5.5 seconds, meaning the command string is causing a 57% slowdown. That is way too much considering that the SELECT 1 has to be send from psql to the backend, parsed, optimized, and executed, and the result returned to the psql, while stats_command_string only has to send a string to a backend collector. There is _no_ way that collector should take 57% of the time it takes to run the actual query. With the test program, I tried various options. The basic code we have sends a UDP packet to a statistics buffer process, which recv()'s the packet, puts it into a memory queue buffer, and writes it to a pipe() that is read by the statistics collector process which processes the packet. I tried various ways of speeding up the buffer and collector processes. I found if I put a pg_usleep(100) in the buffer process the backend speed was good, but packets were lost. What I found worked well was to do multiple recv() calls in a loop. The previous code did a select(), then perhaps a recv() and pipe write() based on the results of the select(). This caused many small packets to be written to the pipe and the pipe write overhead seems fairly large. The best fix I found was to loop over the recv() call at most 25 times, collecting a group of packets that can then be sent to the collector in one pipe write. The recv() socket is non-blocking, so a zero return indicates there are no more packets available. Patch attached. This change reduced the stats_command_string time from 5.5 to 3.9, which is closer to the 3.5 seconds with stats_command_string off. A second improvement I discovered is that the statistics collector is calling gettimeofday() for every packet received, so it can determine the timeout for the select() call to write the flat file. I removed that behavior and instead used setitimer() to issue a SIGINT every 500ms, which was the original behavior. This eliminates the gettimeofday() call and makes the code cleaner. Second patch attached. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/postmaster/pgstat.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.116 diff -c -c -r1.116 pgstat.c *** src/backend/postmaster/pgstat.c 2 Jan 2006 00:58:00 -0000 1.116 --- src/backend/postmaster/pgstat.c 2 Jan 2006 18:36:43 -0000 *************** *** 1911,1916 **** --- 1911,1918 ---- */ for (;;) { + loop_again: + FD_ZERO(&rfds); FD_ZERO(&wfds); maxfd = -1; *************** *** 1970,2014 **** */ if (FD_ISSET(pgStatSock, &rfds)) { ! len = recv(pgStatSock, (char *) &input_buffer, ! sizeof(PgStat_Msg), 0); ! if (len < 0) ! ereport(ERROR, ! (errcode_for_socket_access(), ! errmsg("could not read statistics message: %m"))); ! ! /* ! * We ignore messages that are smaller than our common header ! */ ! if (len < sizeof(PgStat_MsgHdr)) ! continue; ! ! /* ! * The received length must match the length in the header ! */ ! if (input_buffer.msg_hdr.m_size != len) ! continue; ! /* ! * O.K. - we accept this message. Copy it to the circular ! * msgbuffer. */ ! frm = 0; ! while (len > 0) { ! xfr = PGSTAT_RECVBUFFERSZ - msg_recv; ! if (xfr > len) ! xfr = len; ! Assert(xfr > 0); ! memcpy(msgbuffer + msg_recv, ! ((char *) &input_buffer) + frm, ! xfr); ! msg_recv += xfr; ! if (msg_recv == PGSTAT_RECVBUFFERSZ) ! msg_recv = 0; ! msg_have += xfr; ! frm += xfr; ! len -= xfr; } } --- 1972,2033 ---- */ if (FD_ISSET(pgStatSock, &rfds)) { ! int loops = 0; ! /* ! * While pipewrite() can send multiple data packets, recv() pulls ! * only a single packet per call. For busy systems, doing ! * multiple recv() calls and then one pipewrite() can improve ! * query speed by 40%. 25 was chosen because 25 packets should ! * easily fit in a single pipewrite() call. recv()'s socket is ! * non-blocking. */ ! while (++loops < 25 && ! (len = recv(pgStatSock, (char *) &input_buffer, ! sizeof(PgStat_Msg), 0)) != 0) { ! if (len < 0) ! { ! if (errno == EAGAIN) ! continue; ! ereport(ERROR, ! (errcode_for_socket_access(), ! errmsg("could not read statistics message: %m"))); ! } ! ! /* ! * We ignore messages that are smaller than our common header ! */ ! if (len < sizeof(PgStat_MsgHdr)) ! goto loop_again; ! ! /* ! * The received length must match the length in the header ! */ ! if (input_buffer.msg_hdr.m_size != len) ! goto loop_again; ! ! /* ! * O.K. - we accept this message. Copy it to the circular ! * msgbuffer. ! */ ! frm = 0; ! while (len > 0) ! { ! xfr = PGSTAT_RECVBUFFERSZ - msg_recv; ! if (xfr > len) ! xfr = len; ! Assert(xfr > 0); ! memcpy(msgbuffer + msg_recv, ! ((char *) &input_buffer) + frm, ! xfr); ! msg_recv += xfr; ! if (msg_recv == PGSTAT_RECVBUFFERSZ) ! msg_recv = 0; ! msg_have += xfr; ! frm += xfr; ! len -= xfr; ! } } } *************** *** 2023,2029 **** * caught up, or because more data arrives so that we have more than * PIPE_BUF bytes buffered). This is not good, but is there any way * around it? We have no way to tell when the collector has caught ! * up... */ if (FD_ISSET(writePipe, &wfds)) { --- 2042,2048 ---- * caught up, or because more data arrives so that we have more than * PIPE_BUF bytes buffered). This is not good, but is there any way * around it? We have no way to tell when the collector has caught ! * up. Followup, the pipe rarely fills up. */ if (FD_ISSET(writePipe, &wfds)) { Index: src/backend/postmaster/pgstat.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.116 diff -c -c -r1.116 pgstat.c *** src/backend/postmaster/pgstat.c 2 Jan 2006 00:58:00 -0000 1.116 --- src/backend/postmaster/pgstat.c 2 Jan 2006 18:21:28 -0000 *************** *** 145,150 **** --- 145,151 ---- static PgStat_StatBeEntry *pgStatBeTable = NULL; static int pgStatNumBackends = 0; + static volatile bool need_statwrite; /* ---------- * Local function forward declarations *************** *** 164,169 **** --- 165,171 ---- NON_EXEC_STATIC void PgstatBufferMain(int argc, char *argv[]); NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]); + static void force_statwrite(SIGNAL_ARGS); static void pgstat_recvbuffer(void); static void pgstat_exit(SIGNAL_ARGS); static void pgstat_die(SIGNAL_ARGS); *************** *** 1548,1560 **** PgStat_Msg msg; fd_set rfds; int readPipe; - int nready; int len = 0; ! struct timeval timeout; ! struct timeval next_statwrite; ! bool need_statwrite; HASHCTL hash_ctl; ! MyProcPid = getpid(); /* reset MyProcPid */ /* --- 1550,1560 ---- PgStat_Msg msg; fd_set rfds; int readPipe; int len = 0; ! struct itimerval timeval; HASHCTL hash_ctl; ! bool need_timer = false; ! MyProcPid = getpid(); /* reset MyProcPid */ /* *************** *** 1572,1578 **** /* kluge to allow buffer process to kill collector; FIXME */ pqsignal(SIGQUIT, pgstat_exit); #endif ! pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, SIG_IGN); pqsignal(SIGUSR2, SIG_IGN); --- 1572,1578 ---- /* kluge to allow buffer process to kill collector; FIXME */ pqsignal(SIGQUIT, pgstat_exit); #endif ! pqsignal(SIGALRM, force_statwrite); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, SIG_IGN); pqsignal(SIGUSR2, SIG_IGN); *************** *** 1597,1608 **** init_ps_display("stats collector process", "", ""); set_ps_display(""); - /* - * Arrange to write the initial status file right away - */ - gettimeofday(&next_statwrite, NULL); need_statwrite = TRUE; /* * Read in an existing statistics stats file or initialize the stats to * zero. --- 1597,1608 ---- init_ps_display("stats collector process", "", ""); set_ps_display(""); need_statwrite = TRUE; + MemSet(&timeval, 0, sizeof(struct itimerval)); + timeval.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; + timeval.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; + /* * Read in an existing statistics stats file or initialize the stats to * zero. *************** *** 1634,1667 **** */ for (;;) { - /* - * If we need to write the status file again (there have been changes - * in the statistics since we wrote it last) calculate the timeout - * until we have to do so. - */ if (need_statwrite) { ! struct timeval now; ! ! gettimeofday(&now, NULL); ! /* avoid assuming that tv_sec is signed */ ! if (now.tv_sec > next_statwrite.tv_sec || ! (now.tv_sec == next_statwrite.tv_sec && ! now.tv_usec >= next_statwrite.tv_usec)) ! { ! timeout.tv_sec = 0; ! timeout.tv_usec = 0; ! } ! else ! { ! timeout.tv_sec = next_statwrite.tv_sec - now.tv_sec; ! timeout.tv_usec = next_statwrite.tv_usec - now.tv_usec; ! if (timeout.tv_usec < 0) ! { ! timeout.tv_sec--; ! timeout.tv_usec += 1000000; ! } ! } } /* --- 1634,1644 ---- */ for (;;) { if (need_statwrite) { ! pgstat_write_statsfile(); ! need_statwrite = false; ! need_timer = true; } /* *************** *** 1673,1681 **** /* * Now wait for something to do. */ ! nready = select(readPipe + 1, &rfds, NULL, NULL, ! (need_statwrite) ? &timeout : NULL); ! if (nready < 0) { if (errno == EINTR) continue; --- 1650,1656 ---- /* * Now wait for something to do. */ ! if (select(readPipe + 1, &rfds, NULL, NULL, NULL) < 0) { if (errno == EINTR) continue; *************** *** 1685,1702 **** } /* - * If there are no descriptors ready, our timeout for writing the - * stats file happened. - */ - if (nready == 0) - { - pgstat_write_statsfile(); - need_statwrite = FALSE; - - continue; - } - - /* * Check if there is a new statistics message to collect. */ if (FD_ISSET(readPipe, &rfds)) --- 1660,1665 ---- *************** *** 1813,1829 **** */ pgStatNumMessages++; ! /* ! * If this is the first message after we wrote the stats file the ! * last time, setup the timeout that it'd be written. ! */ ! if (!need_statwrite) { ! gettimeofday(&next_statwrite, NULL); ! next_statwrite.tv_usec += ((PGSTAT_STAT_INTERVAL) * 1000); ! next_statwrite.tv_sec += (next_statwrite.tv_usec / 1000000); ! next_statwrite.tv_usec %= 1000000; ! need_statwrite = TRUE; } } --- 1776,1787 ---- */ pgStatNumMessages++; ! if (need_timer) { ! if (setitimer(ITIMER_REAL, &timeval, NULL)) ! ereport(ERROR, ! (errmsg("unable to set statistics collector timer: %m"))); ! need_timer = false; } } *************** *** 1848,1853 **** --- 1806,1818 ---- } + static void + force_statwrite(SIGNAL_ARGS) + { + need_statwrite = true; + } + + /* ---------- * pgstat_recvbuffer() - *
pgsql-patches by date: