Thread: Re: postgresql and process titles
The only way that I'm aware of for disabling this is at compile time ... after running configure, you want to modify: src/include/pg_config.h and undef HAVE_SETPROCTITLE ... I'm CC'ng -hackers about this though, since I think you are the first to point to setproctitle() as being a serious performance bottleneck ... On Sun, 11 Jun 2006, Kris Kennaway wrote: > Why does postgresql change its process title so frequently and how can > this be disabled? Profiling suggests it's a fairly serious > performance bottleneck. > > Kris > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
> On Sun, 11 Jun 2006, Kris Kennaway wrote: >> Why does postgresql change its process title so frequently and how can >> this be disabled? Profiling suggests it's a fairly serious >> performance bottleneck. Let's see the evidence. regards, tom lane
Kris Kennaway <kris@obsecurity.org> writes: > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: >> Let's see the evidence. > The calls to setproctitle() (it looks like 4 setproctitle syscalls per > DB query) are causing contention on the Giant lock 25% of the time on > a dual p4 + HTT. Disabling process title setting completely gives an > 8% peak performance boost to the super-smack select benchmark. I think you misunderstood me: I asked for evidence, not interpretation. What are you measuring, and with what tool, and what are the numbers? On what benchmark case? And what did you do to "disable process title setting completely"? The reason I'm being doubting Thomas here is that I've never seen any indication on any other platform that ps_status is a major bottleneck. Now maybe FreeBSD really sucks, or maybe you're onto something of interest, but let's see the proof in a form that someone else can check and reproduce. regards, tom lane
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote: > Kris Kennaway <kris@obsecurity.org> writes: > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: > >> Let's see the evidence. > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per > > DB query) are causing contention on the Giant lock 25% of the time on > > a dual p4 + HTT. Disabling process title setting completely gives an > > 8% peak performance boost to the super-smack select benchmark. > > I think you misunderstood me: I asked for evidence, not interpretation. > What are you measuring, and with what tool, and what are the numbers? > On what benchmark case? As I said, I'm using the super-smack select benchmark; presumably you are aware of it. > And what did you do to "disable process title setting completely"? Added this to ps_status.c: #undef PS_USE_SETPROCTITLE #undef PS_USE_PSTAT #undef PS_USE_PS_STRINGS #undef PS_USE_CHANGE_ARGV #undef PS_USE_CLOBBER_ARGV #define PS_USE_NONE Here are the queries/second data, before and after: x pg + pg-noproctitle +------------------------------------------------------------+ |x xx + | |x xx + + | |xx xxx + + + ++++ | | |AM| |____AM___|| +------------------------------------------------------------+ N Min Max Median Avg Stddev x 11 3399.29 3425.1 3413.93 3411.8418 9.4287675 + 10 3615.63 3699.32 3685.515 3679.344 24.894177 Difference at 95.0% confidence 267.502 +/- 16.871 7.8404% +/- 0.494483% (Student's t, pooled s = 18.4484) Kris
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: > > On Sun, 11 Jun 2006, Kris Kennaway wrote: > >> Why does postgresql change its process title so frequently and how can > >> this be disabled? Profiling suggests it's a fairly serious > >> performance bottleneck. > > Let's see the evidence. The calls to setproctitle() (it looks like 4 setproctitle syscalls per DB query) are causing contention on the Giant lock 25% of the time on a dual p4 + HTT. Disabling process title setting completely gives an 8% peak performance boost to the super-smack select benchmark. It's actually worse than that in stock FreeBSD because in certain "mixed" workloads with other sources of Giant contention present they will contend with each other to make overall machine performance much worse; that's the "serious" aspect, although 8% performance loss just in case someone should ever want to run 'ps' to check the state of the database processes is nothing to laugh at either. Kris
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote: > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote: > > Kris Kennaway <kris@obsecurity.org> writes: > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: > > >> Let's see the evidence. > > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per > > > DB query) are causing contention on the Giant lock 25% of the time on > > > a dual p4 + HTT. Disabling process title setting completely gives an > > > 8% peak performance boost to the super-smack select benchmark. > > > > I think you misunderstood me: I asked for evidence, not interpretation. > > What are you measuring, and with what tool, and what are the numbers? > > On what benchmark case? And what did you do to "disable process title > > setting completely"? > > > > The reason I'm being doubting Thomas here is that I've never seen any > > Ba-da-bum! > > > indication on any other platform that ps_status is a major bottleneck. > > Now maybe FreeBSD really sucks, or maybe you're onto something of > > interest, but let's see the proof in a form that someone else can > > check and reproduce. > > It's also important to find out what version of FreeBSD this is. A lot > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely > possible this isn't an issue in newer versions. It's still true in 6.x and 7.x. I have a patch that removes Giant from the sysctl in question, and I have also removed it from another relevant part of the kernel (semop() is bogusly acquiring Giant when it is supposed to be mpsafe). When it's possible to commit that patch (should be in time for 7.0, but not sure if it will make it into 6.2) it will eliminate the worst part of the problem, but it still leaves postgresql making thousands of syscalls per second to flip its process titles back and forth, which needs to be looked at carefully for a performance impact. For now, users of FreeBSD who want that 8% should turn it off though (or maybe one of the alternative methods is usable). BTW, another promising performance/scalability change on BSD systems would be to convert pgsql to use kqueue instead of select, since mutex profiling traces show a lot of contention on the select lock in FreeBSD. FYI, the biggest source of contention is via semop() - it might be possible to optimize that some more in FreeBSD, I don't know. Kris
On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote: > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote: > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote: > > > Kris Kennaway <kris@obsecurity.org> writes: > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: > > > >> Let's see the evidence. > > > > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per > > > > DB query) are causing contention on the Giant lock 25% of the time on > > > > a dual p4 + HTT. Disabling process title setting completely gives an > > > > 8% peak performance boost to the super-smack select benchmark. > > > > > > I think you misunderstood me: I asked for evidence, not interpretation. > > > What are you measuring, and with what tool, and what are the numbers? > > > On what benchmark case? And what did you do to "disable process title > > > setting completely"? > > > > > > The reason I'm being doubting Thomas here is that I've never seen any > > > > Ba-da-bum! > > > > > indication on any other platform that ps_status is a major bottleneck. > > > Now maybe FreeBSD really sucks, or maybe you're onto something of > > > interest, but let's see the proof in a form that someone else can > > > check and reproduce. > > > > It's also important to find out what version of FreeBSD this is. A lot > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely > > possible this isn't an issue in newer versions. Can you provide the actual commands you used to setup and run the test? This would allow others to duplicate your results and debug the situation on their own. This is also important because we've generally found HTT to be a loss (except on Windows), so it'd be good to see what impact this has on AMD hardware. It would also be very useful to have the raw test result numbers you obtained. > It's still true in 6.x and 7.x. I have a patch that removes Giant > from the sysctl in question, and I have also removed it from another > relevant part of the kernel (semop() is bogusly acquiring Giant when > it is supposed to be mpsafe). What affect did that patch have on the numbers? And where is it, in case anyone here wants to try it? > When it's possible to commit that patch (should be in time for 7.0, > but not sure if it will make it into 6.2) it will eliminate the worst > part of the problem, but it still leaves postgresql making thousands > of syscalls per second to flip its process titles back and forth, > which needs to be looked at carefully for a performance impact. For > now, users of FreeBSD who want that 8% should turn it off though (or > maybe one of the alternative methods is usable). We have a similar issue internally with stats_command_string. The issue is that it's very helpful to be able to see what a 'long running' (more than a second or so) statement is doing, but there's very little value in doing all that work for a statement that's only going to run for a few ms. If there's a very cheap way to set some kind of a timer that would update this information once a statement's been around long enough that might be a way to handle this (I don't know if we're already using ALRM in the backend, or if that's a cheap enough solution). > FYI, the biggest source of contention is via semop() - it might be > possible to optimize that some more in FreeBSD, I don't know. Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot of procs in either semwait or semlock. :( -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote: > On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote: > > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote: > > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote: > > > > Kris Kennaway <kris@obsecurity.org> writes: > > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: > > > > >> Let's see the evidence. > > > > > > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per > > > > > DB query) are causing contention on the Giant lock 25% of the time on > > > > > a dual p4 + HTT. Disabling process title setting completely gives an > > > > > 8% peak performance boost to the super-smack select benchmark. > > > > > > > > I think you misunderstood me: I asked for evidence, not interpretation. > > > > What are you measuring, and with what tool, and what are the numbers? > > > > On what benchmark case? And what did you do to "disable process title > > > > setting completely"? > > > > > > > > The reason I'm being doubting Thomas here is that I've never seen any > > > > > > Ba-da-bum! > > > > > > > indication on any other platform that ps_status is a major bottleneck. > > > > Now maybe FreeBSD really sucks, or maybe you're onto something of > > > > interest, but let's see the proof in a form that someone else can > > > > check and reproduce. > > > > > > It's also important to find out what version of FreeBSD this is. A lot > > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely > > > possible this isn't an issue in newer versions. > > Can you provide the actual commands you used to setup and run the test? I actually forget all the steps I needed to do to get super-smack working with postgresql since it required a lot of trial and error for a database newbie like me (compiling it from the benchmarks/super-smack port was trivial, but unlike mysql it required configuring the database by hand - this should hopefully be more obvious to someone familiar with pgsql though). It would be great if someone on your end could make this easier, BTW - e.g. at least document the steps. Also super-smack should be changed to allow use via a local socket with pgsql (this is the default with mysql) - this avoids measuring network stack overhead. The only thing I had to change on FreeBSD was to edit the select-key.smack and change "localhost" to "127.0.0.1" in two locations to avoid possibly using IPv6 transport. > This would allow others to duplicate your results and debug the > situation on their own. This is also important because we've generally > found HTT to be a loss (except on Windows), so it'd be good to see what > impact this has on AMD hardware. It would also be very useful to have > the raw test result numbers you obtained. They were posted previously. This is Intel hardware (AMD doesn't do HTT), and I didn't retest without HTT. I'll try to do so if I have the time (however previously when profiling mysql, HTT did give a small positive change). > > It's still true in 6.x and 7.x. I have a patch that removes Giant > > from the sysctl in question, and I have also removed it from another > > relevant part of the kernel (semop() is bogusly acquiring Giant when > > it is supposed to be mpsafe). > > What affect did that patch have on the numbers? And where is it, in case > anyone here wants to try it? I didn't yet retest with the patch. It's in my perforce branch: http://perforce.freebsd.org/changeList.cgi?FSPC=//depot/user/kris/contention/... although you probably need a combination of the changes in order for it to be usable. > > When it's possible to commit that patch (should be in time for 7.0, > > but not sure if it will make it into 6.2) it will eliminate the worst > > part of the problem, but it still leaves postgresql making thousands > > of syscalls per second to flip its process titles back and forth, > > which needs to be looked at carefully for a performance impact. For > > now, users of FreeBSD who want that 8% should turn it off though (or > > maybe one of the alternative methods is usable). > > We have a similar issue internally with stats_command_string. The issue > is that it's very helpful to be able to see what a 'long running' (more > than a second or so) statement is doing, but there's very little value > in doing all that work for a statement that's only going to run for a > few ms. If there's a very cheap way to set some kind of a timer that > would update this information once a statement's been around long enough > that might be a way to handle this (I don't know if we're already using > ALRM in the backend, or if that's a cheap enough solution). I don't know what the best way to implement it would be, but limiting the frequency of these updates does seem to be the way to go. > > FYI, the biggest source of contention is via semop() - it might be > > possible to optimize that some more in FreeBSD, I don't know. > > Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot > of procs in either semwait or semlock. :( Part of that is Giant contention again; for example on 6.x semop() and setproctitle() both want to acquire it, so they'll fight with each other and with anything else on the system that wants Giant (e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an issue here going forward, but there is still as much lock contention just between semop() calls running on different CPUs. It may be possible for someone to implement more fine-grained locking here, but I don't know if there is available interest. Kris
On Mon, Jun 12, 2006 at 11:38:01AM -0400, Kris Kennaway wrote: > On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote: > > On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote: > > > On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote: > > > > On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote: > > > > > Kris Kennaway <kris@obsecurity.org> writes: > > > > > > On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote: > > > > > >> Let's see the evidence. > > > > > > > > > > > The calls to setproctitle() (it looks like 4 setproctitle syscalls per > > > > > > DB query) are causing contention on the Giant lock 25% of the time on > > > > > > a dual p4 + HTT. Disabling process title setting completely gives an > > > > > > 8% peak performance boost to the super-smack select benchmark. > > > > > > > > > > I think you misunderstood me: I asked for evidence, not interpretation. > > > > > What are you measuring, and with what tool, and what are the numbers? > > > > > On what benchmark case? And what did you do to "disable process title > > > > > setting completely"? > > > > > > > > > > The reason I'm being doubting Thomas here is that I've never seen any > > > > > > > > Ba-da-bum! > > > > > > > > > indication on any other platform that ps_status is a major bottleneck. > > > > > Now maybe FreeBSD really sucks, or maybe you're onto something of > > > > > interest, but let's see the proof in a form that someone else can > > > > > check and reproduce. > > > > > > > > It's also important to find out what version of FreeBSD this is. A lot > > > > of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely > > > > possible this isn't an issue in newer versions. > > > > Can you provide the actual commands you used to setup and run the test? > > I actually forget all the steps I needed to do to get super-smack > working with postgresql since it required a lot of trial and error for > a database newbie like me (compiling it from the > benchmarks/super-smack port was trivial, but unlike mysql it required > configuring the database by hand - this should hopefully be more > obvious to someone familiar with pgsql though). > > It would be great if someone on your end could make this easier, BTW - > e.g. at least document the steps. Also super-smack should be changed > to allow use via a local socket with pgsql (this is the default with > mysql) - this avoids measuring network stack overhead. Unless supersmack has improved substantially, you're unlikely to find much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3 (http://sourceforge.net/projects/osdldbt) is much more realistic (based on TPC-C and TPC-H). > > > FYI, the biggest source of contention is via semop() - it might be > > > possible to optimize that some more in FreeBSD, I don't know. > > > > Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot > > of procs in either semwait or semlock. :( > > Part of that is Giant contention again; for example on 6.x semop() and > setproctitle() both want to acquire it, so they'll fight with each > other and with anything else on the system that wants Giant > (e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an > issue here going forward, but there is still as much lock contention > just between semop() calls running on different CPUs. It may be > possible for someone to implement more fine-grained locking here, but > I don't know if there is available interest. FWIW, getting turning off stats_command_string substantially reduced this contention, so it appears the issue is somewhere in the stats code. This code sends stats messages to a different process via a socket (or is it UDP?), with the intention that if the system gets heavily loaded we'll lose some stats in the interest of not bogging down all the backends. It seems that doesn't work so hot on FreeBSD. :( -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > FWIW, getting turning off stats_command_string substantially reduced > this contention, so it appears the issue is somewhere in the stats code. > This code sends stats messages to a different process via a socket (or > is it UDP?), with the intention that if the system gets heavily loaded > we'll lose some stats in the interest of not bogging down all the > backends. It seems that doesn't work so hot on FreeBSD. :( I am working on a patch for 8.2 to fix that for all platforms. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote: > Jim C. Nasby wrote: > > FWIW, getting turning off stats_command_string substantially reduced > > this contention, so it appears the issue is somewhere in the stats code. > > This code sends stats messages to a different process via a socket (or > > is it UDP?), with the intention that if the system gets heavily loaded > > we'll lose some stats in the interest of not bogging down all the > > backends. It seems that doesn't work so hot on FreeBSD. :( > > I am working on a patch for 8.2 to fix that for all platforms. Excellent. Did I miss discussion of that or have you not mentioned it? I'm curious as to how you're fixing it... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote: > > Jim C. Nasby wrote: > > > FWIW, getting turning off stats_command_string substantially reduced > > > this contention, so it appears the issue is somewhere in the stats code. > > > This code sends stats messages to a different process via a socket (or > > > is it UDP?), with the intention that if the system gets heavily loaded > > > we'll lose some stats in the interest of not bogging down all the > > > backends. It seems that doesn't work so hot on FreeBSD. :( > > > > I am working on a patch for 8.2 to fix that for all platforms. > > Excellent. Did I miss discussion of that or have you not mentioned it? > I'm curious as to how you're fixing it... The patches are in the hold queue: http://momjian.postgresql.org/cgi-bin/pgpatches_hold Title is "Stats collector performance improvement". I need someone to test my patches on a non-BSD platform to move forward. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Jim C. Nasby wrote: >> Excellent. Did I miss discussion of that or have you not mentioned it? >> I'm curious as to how you're fixing it... > The patches are in the hold queue: > http://momjian.postgresql.org/cgi-bin/pgpatches_hold That's talking about the stats code, which has approximately zip to do with setproctitle (ps_status.c). But IIRC that patch is on hold because nobody particularly liked the approach it's taking. I think we should investigate rewriting the stats communication architecture entirely --- in particular, do we really need the stats buffer process at all? It'd be interesting to see what happens if we just make the collector process read the UDP socket directly. Or alternatively drop the UDP socket in favor of having the backends write directly to the collector process' input pipe (not sure if this would port to Windows though). As far as Kris' complaint goes, one thing that might be interesting is to delay both the setproctitle call and the stats command message send until the current command has been running a little while (say 100ms or so). The main objection I see to this is that it replaces a kernel call that actually does some work with a kernel call to start a timer, plus possibly a later kernel call to actually do the work. Not clear that there's a win there. (If you're using statement_timeout it might not matter, but if you aren't...) Also not clear how you make the necessary actions happen when the timer expires --- I seriously doubt it'd be safe to try to do either action directly in a signal handler. regards, tom lane
On Tue, Jun 13, 2006 at 04:35:24PM -0400, Tom Lane wrote: > ... The main objection I see to this is that it replaces a kernel > call that actually does some work with a kernel call to start a timer, > plus possibly a later kernel call to actually do the work. Not clear > that there's a win there. And ofcourse it's an almost guarenteed loss on systems that don't require a syscall to set the proc title. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Jim C. Nasby wrote: > >> Excellent. Did I miss discussion of that or have you not mentioned it? > >> I'm curious as to how you're fixing it... > > > The patches are in the hold queue: > > http://momjian.postgresql.org/cgi-bin/pgpatches_hold > > That's talking about the stats code, which has approximately zip to do > with setproctitle (ps_status.c). But IIRC that patch is on hold because I thought the bug reporter was asking about the stats code was well. > nobody particularly liked the approach it's taking. I think we should > investigate rewriting the stats communication architecture entirely --- > in particular, do we really need the stats buffer process at all? It'd > be interesting to see what happens if we just make the collector process > read the UDP socket directly. Or alternatively drop the UDP socket in Agreed, that's what I would prefer, and tested something like that, but even pulling the packet into the buffer and throwing them away had significant overhead, so I think the timeout trick has to be employed as well as going to a single process. > favor of having the backends write directly to the collector process' > input pipe (not sure if this would port to Windows though). > > As far as Kris' complaint goes, one thing that might be interesting is > to delay both the setproctitle call and the stats command message send > until the current command has been running a little while (say 100ms > or so). The main objection I see to this is that it replaces a kernel > call that actually does some work with a kernel call to start a timer, > plus possibly a later kernel call to actually do the work. Not clear > that there's a win there. (If you're using statement_timeout it might > not matter, but if you aren't...) > > Also not clear how you make the necessary actions happen when the timer > expires --- I seriously doubt it'd be safe to try to do either action > directly in a signal handler. Right. What if the postmaster signals the backend once a second to do their reporting. I am not sure what the final solution will be, but we _need_ one based on the performance numbers I and others have seen. Could we have PGPROC have a reporting boolean that is set every second and somehow checked by each backend? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 05:05:31PM -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Jim C. Nasby wrote: > > >> Excellent. Did I miss discussion of that or have you not mentioned it? > > >> I'm curious as to how you're fixing it... > > > > > The patches are in the hold queue: > > > http://momjian.postgresql.org/cgi-bin/pgpatches_hold > > > > That's talking about the stats code, which has approximately zip to do > > with setproctitle (ps_status.c). But IIRC that patch is on hold because > > I thought the bug reporter was asking about the stats code was well. It did get brought up... > > As far as Kris' complaint goes, one thing that might be interesting is > > to delay both the setproctitle call and the stats command message send > > until the current command has been running a little while (say 100ms > > or so). The main objection I see to this is that it replaces a kernel > > call that actually does some work with a kernel call to start a timer, > > plus possibly a later kernel call to actually do the work. Not clear > > that there's a win there. (If you're using statement_timeout it might > > not matter, but if you aren't...) > > > > Also not clear how you make the necessary actions happen when the timer > > expires --- I seriously doubt it'd be safe to try to do either action > > directly in a signal handler. > > Right. What if the postmaster signals the backend once a second to do > their reporting. I am not sure what the final solution will be, but we > _need_ one based on the performance numbers I and others have seen. > Could we have PGPROC have a reporting boolean that is set every second > and somehow checked by each backend? One second might be a bit more delay than some folks want... it would be nice if this was tuneable. Also, what would the overhead on this look like if there's a large number of idle backends? It does sound more appealing than setting a timer every time you start a transaction, though... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote: >>> FYI, the biggest source of contention is via semop() - it might be >>> possible to optimize that some more in FreeBSD, I don't know. >> >> Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with >> a lot >> of procs in either semwait or semlock. :( > > Part of that is Giant contention again; for example on 6.x semop() and > setproctitle() both want to acquire it, so they'll fight with each > other and with anything else on the system that wants Giant > (e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an > issue here going forward, but there is still as much lock contention > just between semop() calls running on different CPUs. It may be > possible for someone to implement more fine-grained locking here, but > I don't know if there is available interest. BTW, there's another FBSD performance odditiy I've run across. Running pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 & which dumps the email_contrib table to bzip2 then to disk, the OS won't use more than 1 CPU on an SMP system... unless the data is cached. According to both gstat and systat -v, the system isn't I/O bound; both are reporting the RAID10 with that table on it as only about 10% busy. If I let that command run for a bit then cancel it and re-start it so that the beginning of that table is in cache, it will use one entire CPU for bzip2, which is what I'd expect to happen. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, 13 Jun 2006, Jim Nasby wrote: > > On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote: >>>> FYI, the biggest source of contention is via semop() - it might be >>>> possible to optimize that some more in FreeBSD, I don't know. >>> >>> Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot >>> of procs in either semwait or semlock. :( >> >> Part of that is Giant contention again; for example on 6.x semop() and >> setproctitle() both want to acquire it, so they'll fight with each >> other and with anything else on the system that wants Giant >> (e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an >> issue here going forward, but there is still as much lock contention >> just between semop() calls running on different CPUs. It may be >> possible for someone to implement more fine-grained locking here, but >> I don't know if there is available interest. > > BTW, there's another FBSD performance odditiy I've run across. Running > > pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 & > > which dumps the email_contrib table to bzip2 then to disk, the OS won't use > more than 1 CPU on an SMP system... unless the data is cached. According to > both gstat and systat -v, the system isn't I/O bound; both are reporting the > RAID10 with that table on it as only about 10% busy. If I let that command > run for a bit then cancel it and re-start it so that the beginning of that > table is in cache, it will use one entire CPU for bzip2, which is what I'd > expect to happen. What version of FreeBSD are you dealing with here? I'm guessing at least 6.x, but just figured I'd clarify ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Right. What if the postmaster signals the backend once a second to do > their reporting. I am not sure what the final solution will be, but we > _need_ one based on the performance numbers I and others have seen. > Could we have PGPROC have a reporting boolean that is set every second > and somehow checked by each backend? I don't see any point in involving the postmaster in it. What might be interesting is to replace the current backend timer-interrupt handling by a free-running cyclic interrupt every N milliseconds (the resolution of the statement_timeout and deadlock_check delays would then be no better than N milliseconds, since those actions would occur at the next cyclic interrupt after the desired time expires). We could possibly drive stats reports and ps_status updates from this, as well as sampling EXPLAIN ANALYZE if anyone cares to pursue that. Complaints I can foresee: * lots of cycles wasted in idle backends. Possibly a backend that's not received any command for a second or two could shut down its interrupt until it next gets a command. * not clear whether the interrupt happens when waiting for I/O. I already mentioned that this would be a problem for EXPLAIN ANALYZE, but it might be no big deal for the other uses. regards, tom lane
On Tue, Jun 13, 2006 at 11:13:55PM -0300, Marc G. Fournier wrote: > >BTW, there's another FBSD performance odditiy I've run across. Running > > > >pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 & > > > >which dumps the email_contrib table to bzip2 then to disk, the OS won't > >use more than 1 CPU on an SMP system... unless the data is cached. > >According to both gstat and systat -v, the system isn't I/O bound; both > >are reporting the RAID10 with that table on it as only about 10% busy. If > >I let that command run for a bit then cancel it and re-start it so that > >the beginning of that table is in cache, it will use one entire CPU for > >bzip2, which is what I'd expect to happen. > > What version of FreeBSD are you dealing with here? I'm guessing at least > 6.x, but just figured I'd clarify ... FreeBSD 6.0-STABLE #6: Fri Dec 9 19:14:19 UTC 2005 -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, Jun 13, 2006 at 10:36:27PM -0400, Tom Lane wrote: > * not clear whether the interrupt happens when waiting for I/O. I > already mentioned that this would be a problem for EXPLAIN ANALYZE, > but it might be no big deal for the other uses. If we're going y UNIX signal semantics, in theory any interruptable function (as listed on the signal manpage) would be interrupted every N milliseconds to handle the interrupt. So yes, if possible the interrupt will be serviced anyway, even during I/O, by aborting a restarting the I/O continuously. Not terribly efficient. With one big exception: sometimes I/O is non-interruptable (the good old D state). In this case the interrupts will happen but will simply be queued and in fact will all be dropped except the last one (non-realtime signals are never stacked). The handler will probably be called the moment it returns to user-space. Note: Do we want SIGPROF, SIGALRM or SIGVTALRM? There's apparently a distinction, see: http://en.wikipedia.org/wiki/SIGPROF Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > With one big exception: sometimes I/O is non-interruptable (the good > old D state). In this case the interrupts will happen but will simply > be queued and in fact will all be dropped except the last one > (non-realtime signals are never stacked). The handler will probably be > called the moment it returns to user-space. If backends store their current status in shared memory then a separate process entirely can receive the interrupts, scan through the shared memory process states and do the accounting. It would never be interuppting i/o since that process would never be doing i/o. It could use real-time signals, reads on /dev/rtc, or whatever other methods exist for scheduling periodic events since it would be doing nothing but handling these interrupts. The neat thing about this is that it would be possible to look at a process from outside instead of having to hijack the application to get feedback. You could find information like total time spent in i/o versus cpu aggregated across all queries from a single backend or for all backends. The downside is that to handle things like EXPLAIN ANALYZE you would have to provide enough information about plans to this accounting process for it to store the information. Plans are currently purely local to the backend running them. Perhaps it would be enough to provide a unique id for the plan (sequential id per backend would be enough). And a unique id for the plan node. The accounting process wouldn't know anything more about what that node represented but the backend could later query to fetch the information and associate it with the plan. -- greg
Greg Stark <gsstark@mit.edu> writes: > If backends store their current status in shared memory then a separate > process entirely can receive the interrupts, scan through the shared memory > process states and do the accounting. This sounds good until you think about locking. It'd be quite impractical to implement anything as fine-grained as EXPLAIN ANALYZE this way, because of the overhead involved in taking and releasing spinlocks. It could be practical as a replacement for stats_command_string messages, though. I'm not sure about replacing ps_status with this. I don't think there is a way for one process to set another's status (on most platforms anyway). You might argue that we could abandon ps_status reporting altogether if we had something better, but I'm unconvinced ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > If backends store their current status in shared memory then a separate > > process entirely can receive the interrupts, scan through the shared memory > > process states and do the accounting. > > This sounds good until you think about locking. It'd be quite > impractical to implement anything as fine-grained as EXPLAIN ANALYZE > this way, because of the overhead involved in taking and releasing > spinlocks. I'm not entirely convinced. The only other process that would be looking at the information would be the statistics accumulator which would only be waking up every 100ms or so. There would be no contention with other backends reporting their info. I was also thinking of reporting only basic information like which of a small number of states the backend was in and which node of the plan is being executed. If those are just integers then it might be possible to get away without locking at all, storing them as sig_atomic_t. I think there would still be a fair amount of cache coherency overhead like we see with the tas instruction on Xeons but only if that backend's monitor block is actually being touched by the statistics accumulator process which would presumably be only when running an EXPLAIN ANALYZE or other tracing facility. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> This sounds good until you think about locking. It'd be quite >> impractical to implement anything as fine-grained as EXPLAIN ANALYZE >> this way, because of the overhead involved in taking and releasing >> spinlocks. > I'm not entirely convinced. The only other process that would be looking at > the information would be the statistics accumulator which would only be waking > up every 100ms or so. There would be no contention with other backends > reporting their info. The numbers I've been looking at lately say that heavy lock traffic is expensive, particularly on SMP machines, even with zero contention. Seems the cache coherency protocol costs a lot even when it's not doing anything... regards, tom lane
On Wed, Jun 14, 2006 at 03:51:28PM -0400, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> This sounds good until you think about locking. It'd be quite > >> impractical to implement anything as fine-grained as EXPLAIN ANALYZE > >> this way, because of the overhead involved in taking and releasing > >> spinlocks. > > > I'm not entirely convinced. The only other process that would be looking at > > the information would be the statistics accumulator which would only be waking > > up every 100ms or so. There would be no contention with other backends > > reporting their info. > > The numbers I've been looking at lately say that heavy lock traffic is > expensive, particularly on SMP machines, even with zero contention. > Seems the cache coherency protocol costs a lot even when it's not doing > anything... Are there any ways we could avoid the locking? One idea would be to keep something akin to a FIFO, where the backend would write records instead of updating/over-writing them, and the reader process would only read records where there was no risk that they were still being written. That would mean that the reader would need to stay at least one record behind the backend, but that's probably manageable. The downside is more memory usage, but I think we could limit that by also keeping track of what record the reader had last read. The backend would check that, and if the reader was more than X records behind, the backend would update the most recent record it had written, instead of writing a new one. That would place an effective limit on how much memory was consumed. But... I have no idea how exactly shared memory works, so maybe this plan is fundamentally broken. But hopefully there's some way to handle the locking problems, because a seperate reader process does sound like an interesting possibility. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote: > One idea would be to keep something akin to a FIFO, where the backend > would write records instead of updating/over-writing them, and the > reader process would only read records where there was no risk that they > were still being written. That would mean that the reader would need to > stay at least one record behind the backend, but that's probably > manageable. The problem with a FIFO is that the pointers controlling where the start/end are become the contention. The only thing I can think of is to have the backends only write atomic types and give each backend their own cache-line. The profiler would simply wake up every N ms and read each value, accumulating them somewhere. No locking and there would be a maximum of one cache-line bounce per backend per N ms. I'm not sure you can get better than that under the situation. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
"Jim C. Nasby" <jnasby@pervasive.com> writes: > > The numbers I've been looking at lately say that heavy lock traffic is > > expensive, particularly on SMP machines, even with zero contention. > > Seems the cache coherency protocol costs a lot even when it's not doing > > anything... > > Are there any ways we could avoid the locking? Well if all we want to do is reproduce the current functionality of EXPLAIN ANALYZE, all you need is a single sig_atomic_t int that you store the address of the current node in. When you're done with the query you ask the central statistics accumulator for how the counts of how many times it saw each node address. But that's sort of skipping some steps. The reason tas is slow is because it needs to ensure cache coherency. Even just storing an int into shared memory when its cache line is owned by the processor running the stats accumulator would will have the same overhead. That said I don't think it would actually be as much. TAS has to ensure no other processor overwrites the data from this processor. Whereas merely storing into a sig_atomic_t just has to ensure that you don't get a partially written integer. And it seems sort of silly to be doing this much work just to reproduce the current functionality even if it's lower overhead. The big advantage of a separate process would be that it could do much more. But that would require pushing some representation of the plan out to the stats collector, storing more information about the state than just what node you're at currently such as whether you're in the midst of index lookups, sequential scans, operator and function execution, etc. -- greg
On Wed, Jun 14, 2006 at 10:37:59PM +0200, Martijn van Oosterhout wrote: > On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote: > > One idea would be to keep something akin to a FIFO, where the backend > > would write records instead of updating/over-writing them, and the > > reader process would only read records where there was no risk that they > > were still being written. That would mean that the reader would need to > > stay at least one record behind the backend, but that's probably > > manageable. > > The problem with a FIFO is that the pointers controlling where the > start/end are become the contention. Even if what's being written is a single byte/word and we're taking intentional steps to make sure that reading a pointer that's off by 1 isn't an issue? Is the issue that if CPU A writes a pointer and CPU B immediately tries to read it that it could get old data from it's cache? > The only thing I can think of is to have the backends only write atomic > types and give each backend their own cache-line. The profiler would > simply wake up every N ms and read each value, accumulating them > somewhere. Would everything we need fit in a cache line? > No locking and there would be a maximum of one cache-line bounce per > backend per N ms. I'm not sure you can get better than that under the > situation. What scheduler timer resolution do modern OSes use? It's likely that on a system with more runnable processes than processors, each process will only run for N ms anyway... Speaking of which... doesn't the variables involved in scheduling mean we couldn't use sampling to calculate cost in EXPLAIN ANALYZE? Or is the assumption just that it will affect all nodes evenly? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Greg Stark <gsstark@mit.edu> writes: > Well if all we want to do is reproduce the current functionality of EXPLAIN > ANALYZE, all you need is a single sig_atomic_t int that you store the address > of the current node in. Do I need to point out that sig_atomic_t can't portably be assumed to be wider than char? We do currently assume that TransactionId can be read/written atomically, but (a) that's an int not a pointer, and (b) the assumption is cruft that we really ought to get rid of. In any case, speculating about whether we can do something useful with atomic types ignores the main real problem the thread is about. Anybody remember process titles and current command strings? regards, tom lane
"Jim C. Nasby" <jnasby@pervasive.com> writes: > Even if what's being written is a single byte/word and we're taking > intentional steps to make sure that reading a pointer that's off by 1 > isn't an issue? Is the issue that if CPU A writes a pointer and CPU B > immediately tries to read it that it could get old data from it's cache? Well for applications like this you don't care whether you get the old data or the new data. You just want to be sure you don't get half of each. However: Tom Lane <tgl@sss.pgh.pa.us> writes: > Do I need to point out that sig_atomic_t can't portably be assumed to be > wider than char? I didn't actually realize that. That would mean EXPLAIN ANALYZE would be limited to counting times for 256 plan nodes and wouldn't really be much of an improvement over the existing infrastructure. > In any case, speculating about whether we can do something useful with > atomic types ignores the main real problem the thread is about. Anybody > remember process titles and current command strings? Oops, I thought this was the ANALYZE patch thread. -- greg
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote: > > > Can you provide the actual commands you used to setup and run the test? > > > > I actually forget all the steps I needed to do to get super-smack > > working with postgresql since it required a lot of trial and error for > > a database newbie like me (compiling it from the > > benchmarks/super-smack port was trivial, but unlike mysql it required > > configuring the database by hand - this should hopefully be more > > obvious to someone familiar with pgsql though). > > > > It would be great if someone on your end could make this easier, BTW - > > e.g. at least document the steps. Also super-smack should be changed > > to allow use via a local socket with pgsql (this is the default with > > mysql) - this avoids measuring network stack overhead. > > Unless supersmack has improved substantially, you're unlikely to find > much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3 > (http://sourceforge.net/projects/osdldbt) is much more realistic (based > on TPC-C and TPC-H). Thanks for the reference, I'll check it out. Brain-dead or not though (and I've also seen evidence that it is, e.g. it does a hell of a lot of 1-byte reads), it's a well known benchmark that seems to be fairly widely used, so there's still some value in making it perform well (but not at the expense of other things). However my interest is not so much for measuring database performance as for measuring kernel performance and trying to optimize bottlenecks. > > > > FYI, the biggest source of contention is via semop() - it might be > > > > possible to optimize that some more in FreeBSD, I don't know. > > > > > > Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot > > > of procs in either semwait or semlock. :( > > > > Part of that is Giant contention again; for example on 6.x semop() and > > setproctitle() both want to acquire it, so they'll fight with each > > other and with anything else on the system that wants Giant > > (e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an > > issue here going forward, but there is still as much lock contention > > just between semop() calls running on different CPUs. It may be > > possible for someone to implement more fine-grained locking here, but > > I don't know if there is available interest. > > FWIW, getting turning off stats_command_string substantially reduced > this contention, so it appears the issue is somewhere in the stats code. > This code sends stats messages to a different process via a socket (or > is it UDP?), with the intention that if the system gets heavily loaded > we'll lose some stats in the interest of not bogging down all the > backends. It seems that doesn't work so hot on FreeBSD. :( It could be it's making assumptions about what is cheap and what is not, which are not true universally. e.g. you might be adding a new contention point that is even worse. I didn't notice a change in behaviour as I varied the load, so perhaps it was not visible on this benchmark or I wasn't looking in the right place. Kris
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote: > Unless supersmack has improved substantially, you're unlikely to find > much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3 > (http://sourceforge.net/projects/osdldbt) is much more realistic (based > on TPC-C and TPC-H). Have you tried to compile this on FreeBSD? It looks like it (dbt1 at least) will need a moderate amount of hacking - there are some Linux assumptions in the source and the configure script makes assumptions about where things are installed that cannot be overridden on the commandline. Kris
On Tue, Jun 13, 2006 at 03:55:38PM -0500, Jim Nasby wrote: > > On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote: > >>>FYI, the biggest source of contention is via semop() - it might be > >>>possible to optimize that some more in FreeBSD, I don't know. > >> > >>Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with > >>a lot > >>of procs in either semwait or semlock. :( > > > >Part of that is Giant contention again; for example on 6.x semop() and > >setproctitle() both want to acquire it, so they'll fight with each > >other and with anything else on the system that wants Giant > >(e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an > >issue here going forward, but there is still as much lock contention > >just between semop() calls running on different CPUs. It may be > >possible for someone to implement more fine-grained locking here, but > >I don't know if there is available interest. > > BTW, there's another FBSD performance odditiy I've run across. Running > > pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 & > > which dumps the email_contrib table to bzip2 then to disk, the OS > won't use more than 1 CPU on an SMP system... unless the data is > cached. According to both gstat and systat -v, the system isn't I/O > bound; both are reporting the RAID10 with that table on it as only > about 10% busy. If I let that command run for a bit then cancel it > and re-start it so that the beginning of that table is in cache, it > will use one entire CPU for bzip2, which is what I'd expect to happen. Hmm, odd..maybe something with the scheduler. I'd need access to a test case to be able to figure it out though. Kris
On Tue, 2006-06-13 at 14:18 -0400, Kris Kennaway wrote: > On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote: > > > Unless supersmack has improved substantially, you're unlikely to find > > much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3 > > (http://sourceforge.net/projects/osdldbt) is much more realistic (based > > on TPC-C and TPC-H). > > Have you tried to compile this on FreeBSD? It looks like it (dbt1 at > least) will need a moderate amount of hacking - there are some Linux > assumptions in the source and the configure script makes assumptions > about where things are installed that cannot be overridden on the > commandline. I did have dbt2 pretty close to functional on FreeBSD a year ago but it's probably gone back into linuxisms since then. --
> > I did have dbt2 pretty close to functional on FreeBSD a year ago but > > it's probably gone back into linuxisms since then. > > :( > > I won't have the chance to work on this further for another 2 months, > but if you have patches I could see about picking up on them when I > get back. Everything has been applied to the dbt2 tree. --
On Thu, Jun 15, 2006 at 11:34:52PM -0400, Rod Taylor wrote: > On Tue, 2006-06-13 at 14:18 -0400, Kris Kennaway wrote: > > On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote: > > > > > Unless supersmack has improved substantially, you're unlikely to find > > > much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3 > > > (http://sourceforge.net/projects/osdldbt) is much more realistic (based > > > on TPC-C and TPC-H). > > > > Have you tried to compile this on FreeBSD? It looks like it (dbt1 at > > least) will need a moderate amount of hacking - there are some Linux > > assumptions in the source and the configure script makes assumptions > > about where things are installed that cannot be overridden on the > > commandline. > > I did have dbt2 pretty close to functional on FreeBSD a year ago but > it's probably gone back into linuxisms since then. :( I won't have the chance to work on this further for another 2 months, but if you have patches I could see about picking up on them when I get back. Kris
On Fri, Jun 16, 2006 at 07:56:30AM -0400, Rod Taylor wrote: > > > I did have dbt2 pretty close to functional on FreeBSD a year ago but > > > it's probably gone back into linuxisms since then. > > > > :( > > > > I won't have the chance to work on this further for another 2 months, > > but if you have patches I could see about picking up on them when I > > get back. > > Everything has been applied to the dbt2 tree. Cool! Kris
Moving to osdldbt-general and dropping Tom and Marc. On Jun 13, 2006, at 1:18 PM, Kris Kennaway wrote: > On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote: > >> Unless supersmack has improved substantially, you're unlikely to find >> much interest. Last I heard it was a pretty brain-dead benchmark. >> DBT2/3 >> (http://sourceforge.net/projects/osdldbt) is much more realistic >> (based >> on TPC-C and TPC-H). > > Have you tried to compile this on FreeBSD? It looks like it (dbt1 at > least) will need a moderate amount of hacking - there are some Linux > assumptions in the source and the configure script makes assumptions > about where things are installed that cannot be overridden on the > commandline. Yeah, there's a number of issues that would need to be addressed. The biggest problem as I see it is that all the framework to run tests is pretty tied together, without any modularity. The run script itself launches all the stats collecting stuff, assumes it needs to happen on the local machine, etc. There is consensus that this kind of thing needs to be done to improve dbt*, it's just a matter of doing it. Also, it's desirable to improve the benchmarks so that they can store results in a database, and set them up so that the monitoring stuff is common to all the different tests. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Jun 13, 2006, at 9:42 PM, Kris Kennaway wrote: >> BTW, there's another FBSD performance odditiy I've run across. >> Running >> >> pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 & >> >> which dumps the email_contrib table to bzip2 then to disk, the OS >> won't use more than 1 CPU on an SMP system... unless the data is >> cached. According to both gstat and systat -v, the system isn't I/O >> bound; both are reporting the RAID10 with that table on it as only >> about 10% busy. If I let that command run for a bit then cancel it >> and re-start it so that the beginning of that table is in cache, it >> will use one entire CPU for bzip2, which is what I'd expect to >> happen. > > Hmm, odd..maybe something with the scheduler. I'd need access to a > test case to be able to figure it out though. well, pg_dump of any sizeable database piped through bzip2 or gzip should show this. Try: pg_dump large_database | bzip2 > databasedump.sql.bz2 -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461