Thread: Impact of checkpoint_segments under continual load conditions
I have a unique scenerio. My DB is under "continual load", meaning that I am constantly using COPY to insert new data into the DB. There is no "quiet period" for the database, at least not for hours on end. Normally, checkpoint_segments can help absorb some of that, but my experience is that if I crank the number up, it simply delays the impact, and when it occurs, it takes a VERY long time (minutes) to clear. Thoughts? Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > I have a unique scenerio. My DB is under "continual load", meaning > that I am constantly using COPY to insert new data into the DB. There > is no "quiet period" for the database, at least not for hours on end. > Normally, checkpoint_segments can help absorb some of that, but my > experience is that if I crank the number up, it simply delays the > impact, and when it occurs, it takes a VERY long time (minutes) to > clear. If you are using 8.0, you can probably alleviate the problem by making the bgwriter more aggressive. I don't have any immediate recommendations for specific settings though. A small checkpoint_segments setting is definitely bad news for performance under heavy load. regards, tom lane
On Jul 17, 2005, at 1:08 PM, Christopher Petrilli wrote: > Normally, checkpoint_segments can help absorb some of that, but my > experience is that if I crank the number up, it simply delays the > impact, and when it occurs, it takes a VERY long time (minutes) to > clear. There comes a point where your only recourse is to throw hardware at the problem. I would suspect that getting faster disks and splitting the checkpoint log to its own RAID partition would help you here. Adding more RAM while you're at it always does wonders for me :-) Vivek Khera, Ph.D. +1-301-869-4449 x806
Attachment
On 7/18/05, Vivek Khera <vivek@khera.org> wrote: > > On Jul 17, 2005, at 1:08 PM, Christopher Petrilli wrote: > > > Normally, checkpoint_segments can help absorb some of that, but my > > experience is that if I crank the number up, it simply delays the > > impact, and when it occurs, it takes a VERY long time (minutes) to > > clear. > > There comes a point where your only recourse is to throw hardware at > the problem. I would suspect that getting faster disks and splitting > the checkpoint log to its own RAID partition would help you here. > Adding more RAM while you're at it always does wonders for me :-) My concern is less with absolute performance, than with the nosedive it goes into. I published some of my earlier findings and comparisons on my blog, but there's a graph here: http://blog.amber.org/diagrams/comparison_mysql_pgsql.png Notice the VERY steep drop off. I'm still trying to get rid of it, but honestly, am not smart enough to know where it's originating. I have no desire to ever use MySQL, but it is a reference point, and since I don't particularly need transactional integrity, a valid comparison. Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > http://blog.amber.org/diagrams/comparison_mysql_pgsql.png > Notice the VERY steep drop off. Hmm. Whatever that is, it's not checkpoint's fault. I would interpret the regular upticks in the Postgres times (every several hundred iterations) as being the effects of checkpoints. You could probably smooth out those peaks some with appropriate hacking on bgwriter parameters, but that's not the issue at hand (is it?). I have no idea at all what's causing the sudden falloff in performance after about 10000 iterations. COPY per se ought to be about a constant-time operation, since APPEND is (or should be) constant-time. What indexes, foreign keys, etc do you have on this table? What else was going on at the time? regards, tom lane
On 7/18/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Christopher Petrilli <petrilli@gmail.com> writes: > > http://blog.amber.org/diagrams/comparison_mysql_pgsql.png > > > Notice the VERY steep drop off. > > Hmm. Whatever that is, it's not checkpoint's fault. I would interpret > the regular upticks in the Postgres times (every several hundred > iterations) as being the effects of checkpoints. You could probably > smooth out those peaks some with appropriate hacking on bgwriter > parameters, but that's not the issue at hand (is it?). I tried hacking that, turning it up to be more agressive, it got worse. Turned it down, it got worse :-) > I have no idea at all what's causing the sudden falloff in performance > after about 10000 iterations. COPY per se ought to be about a > constant-time operation, since APPEND is (or should be) constant-time. > What indexes, foreign keys, etc do you have on this table? What else > was going on at the time? The table has 15 columns, 5 indexes (character, inet and timestamp). No foreign keys. The only other thing running on the machine was the application actually DOING the benchmarking, written in Python (psycopg), but it was, according to top, using less than 1% of the CPU. It was just talking through a pipe to a psql prompt to do the COPY. Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > On 7/18/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I have no idea at all what's causing the sudden falloff in performance >> after about 10000 iterations. COPY per se ought to be about a >> constant-time operation, since APPEND is (or should be) constant-time. >> What indexes, foreign keys, etc do you have on this table? What else >> was going on at the time? > The table has 15 columns, 5 indexes (character, inet and timestamp). > No foreign keys. The only other thing running on the machine was the > application actually DOING the benchmarking, written in Python > (psycopg), but it was, according to top, using less than 1% of the > CPU. It was just talking through a pipe to a psql prompt to do the > COPY. Sounds pretty plain-vanilla all right. Are you in a position to try the same benchmark against CVS tip? (The nightly snapshot tarball would be plenty close enough.) I'm just wondering if the old bgwriter behavior of locking down the bufmgr while it examined the ARC/2Q data structures is causing this... regards, tom lane
On 7/18/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Christopher Petrilli <petrilli@gmail.com> writes: > > On 7/18/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I have no idea at all what's causing the sudden falloff in performance > >> after about 10000 iterations. COPY per se ought to be about a > >> constant-time operation, since APPEND is (or should be) constant-time. > >> What indexes, foreign keys, etc do you have on this table? What else > >> was going on at the time? > > > The table has 15 columns, 5 indexes (character, inet and timestamp). > > No foreign keys. The only other thing running on the machine was the > > application actually DOING the benchmarking, written in Python > > (psycopg), but it was, according to top, using less than 1% of the > > CPU. It was just talking through a pipe to a psql prompt to do the > > COPY. > > Sounds pretty plain-vanilla all right. > > Are you in a position to try the same benchmark against CVS tip? > (The nightly snapshot tarball would be plenty close enough.) I'm > just wondering if the old bgwriter behavior of locking down the > bufmgr while it examined the ARC/2Q data structures is causing this... So here's something odd I noticed: 20735 pgsql 16 0 20640 11m 10m R 48.0 1.2 4:09.65 postmaster 20734 petrilli 25 0 8640 2108 1368 R 38.1 0.2 4:25.80 psql The 47 and 38.1 are %CPU. Why would psql be burning so much CPU? I've got it attached ,via a pipe to another process that's driving it (until I implement the protocol for COPY later). I wouldn't think it should be uing such a huge percentage of the CPU, no? The Python script that's actually driving it is about 10% o the CPU, which is just because it's generating the incoming data on the fly. Thoughts? I will give the CVS head a spin soon, but I wanted to formalize my benchmarking more first. Chris -- | Christopher Petrilli | petrilli@gmail.com
On 7/18/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The table has 15 columns, 5 indexes (character, inet and timestamp). > > No foreign keys. The only other thing running on the machine was the > > application actually DOING the benchmarking, written in Python > > (psycopg), but it was, according to top, using less than 1% of the > > CPU. It was just talking through a pipe to a psql prompt to do the > > COPY. > > Sounds pretty plain-vanilla all right. > > Are you in a position to try the same benchmark against CVS tip? > (The nightly snapshot tarball would be plenty close enough.) I'm > just wondering if the old bgwriter behavior of locking down the > bufmgr while it examined the ARC/2Q data structures is causing this... Tom, It looks like the CVS HEAD is definately "better," but not by a huge amount. The only difference is I wasn't run autovacuum in the background (default settings), but I don't think this explains it. Here's a graph of the differences and density of behavior: http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png I can provide the raw data. Each COPY was 500 rows. Note that fsync is turned off here. Maybe it'd be more stable with it turned on? Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > Here's a graph of the differences and density of behavior: > http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png > I can provide the raw data. How about the complete test case? There's something awfully odd going on there, and I'd like to find out what. > Note that fsync is turned off here. Maybe it'd be more stable with it > turned on? Hard to say. I was about to ask if you'd experimented with altering configuration parameters such as shared_buffers or checkpoint_segments to see if you can move the point of onset of slowdown. I'm thinking the behavioral change might be associated with running out of free buffers or some such. (Are you running these tests under a freshly- started postmaster, or one that's been busy for awhile?) regards, tom lane
What happens if, say at iteration 6000 (a bit after the mess starts), you pause it for a few minutes and resume. Will it restart with a plateau like at the beginning of the test ? or not ? What if, during this pause, you disconnect and reconnect, or restart the postmaster, or vacuum, or analyze ? > On 7/18/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > The table has 15 columns, 5 indexes (character, inet and timestamp). >> > No foreign keys. The only other thing running on the machine was the >> > application actually DOING the benchmarking, written in Python >> > (psycopg), but it was, according to top, using less than 1% of the >> > CPU. It was just talking through a pipe to a psql prompt to do the >> > COPY. >> >> Sounds pretty plain-vanilla all right. >> >> Are you in a position to try the same benchmark against CVS tip? >> (The nightly snapshot tarball would be plenty close enough.) I'm >> just wondering if the old bgwriter behavior of locking down the >> bufmgr while it examined the ARC/2Q data structures is causing this... > > Tom, > > It looks like the CVS HEAD is definately "better," but not by a huge > amount. The only difference is I wasn't run autovacuum in the > background (default settings), but I don't think this explains it. > Here's a graph of the differences and density of behavior: > > http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png > > I can provide the raw data. Each COPY was 500 rows. Note that fsync > is turned off here. Maybe it'd be more stable with it turned on? > > Chris
On 7/19/05, PFC <lists@boutiquenumerique.com> wrote: > > What happens if, say at iteration 6000 (a bit after the mess starts), you > pause it for a few minutes and resume. Will it restart with a plateau like > at the beginning of the test ? or not ? Not sure... my benchmark is designed to represent what the database will do under "typical" circumstances, and unfortunately these are typical for the application. However, I can see about adding some delays, though multiple minutes would be absurd in the application. Perhaps a 5-10 second day? Would that still be interesting? > What if, during this pause, you disconnect and reconnect, or restart the > postmaster, or vacuum, or analyze ? Well, I don't have the numbers any more, but restarting the postmaster has no effect, other than the first few hundreds COPYs are worse than anything (3-4x slower), but then it goes back to following the trend line. The data in the chart for v8.0.3 includes running pg_autovacuum (5 minutes). Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > On 7/19/05, PFC <lists@boutiquenumerique.com> wrote: >> What happens if, say at iteration 6000 (a bit after the mess starts), you >> pause it for a few minutes and resume. Will it restart with a plateau like >> at the beginning of the test ? or not ? > Not sure... my benchmark is designed to represent what the database > will do under "typical" circumstances, and unfortunately these are > typical for the application. However, I can see about adding some > delays, though multiple minutes would be absurd in the application. > Perhaps a 5-10 second day? Would that still be interesting? I think PFC's question was not directed towards modeling your application, but about helping us understand what is going wrong (so we can fix it). It seemed like a good idea to me. > Well, I don't have the numbers any more, but restarting the postmaster > has no effect, other than the first few hundreds COPYs are worse than > anything (3-4x slower), but then it goes back to following the trend > line. The data in the chart for v8.0.3 includes running pg_autovacuum > (5 minutes). The startup transient probably corresponds to the extra I/O needed to repopulate shared buffers with a useful subset of your indexes. But just to be perfectly clear: you tried this, and after the startup transient it returned to the *original* trend line? In particular, the performance goes into the tank after about 5000 total iterations, and not 5000 iterations after the postmaster restart? I'm suddenly wondering if the performance dropoff corresponds to the point where the indexes have grown large enough to not fit in shared buffers anymore. If I understand correctly, the 5000-iterations mark corresponds to 2.5 million total rows in the table; with 5 indexes you'd have 12.5 million index entries or probably a couple hundred MB total. If the insertion pattern is sufficiently random that the entire index ranges are "hot" then you might not have enough RAM. Again, experimenting with different values of shared_buffers seems like a very worthwhile thing to do. regards, tom lane
> I think PFC's question was not directed towards modeling your > application, but about helping us understand what is going wrong > (so we can fix it). Exactly, I was wondering if this delay would allow things to get flushed, for instance, which would give information about the problem (if giving it a few minutes of rest resumed normal operation, it would mean that some buffer somewhere is getting filled faster than it can be flushed). So, go ahead with a few minutes even if it's unrealistic, that is not the point, you have to tweak it in various possible manners to understand the causes. And instead of a pause, why not just set the duration of your test to 6000 iterations and run it two times without dropping the test table ? I'm going into wild guesses, but first you should want to know if the problem is because the table is big, or if it's something else. So you run the complete test, stopping a bit after it starts to make a mess, then instead of dumping the table and restarting the test anew, you leave it as it is, do something, then run a new test, but on this table which already has data. 'something' could be one of those : disconnect, reconnect (well you'll have to do that if you run the test twice anyway) just wait restart postgres unmount and remount the volume with the logs/data on it reboot the machine analyze vacuum vacuum analyze cluster vacuum full reindex defrag your files on disk (stopping postgres and copying the database from your disk to anotherone and back will do) or even dump'n'reload the whole database I think useful information can be extracted that way. If one of these fixes your problem it'l give hints.
> total. If the insertion pattern is sufficiently random that the entire > index ranges are "hot" then you might not have enough RAM. Try doing the test dropping some of your indexes and see if it moves the number of iterations after which it becomes slow.
On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Christopher Petrilli <petrilli@gmail.com> writes: > > Not sure... my benchmark is designed to represent what the database > > will do under "typical" circumstances, and unfortunately these are > > typical for the application. However, I can see about adding some > > delays, though multiple minutes would be absurd in the application. > > Perhaps a 5-10 second day? Would that still be interesting? > > I think PFC's question was not directed towards modeling your > application, but about helping us understand what is going wrong > (so we can fix it). It seemed like a good idea to me. OK, I can modify the code to do that, and I will post it on the web. > The startup transient probably corresponds to the extra I/O needed to > repopulate shared buffers with a useful subset of your indexes. But > just to be perfectly clear: you tried this, and after the startup > transient it returned to the *original* trend line? In particular, > the performance goes into the tank after about 5000 total iterations, > and not 5000 iterations after the postmaster restart? This is correct, the TOTAL is what matters, not the specific instance count. I did an earlier run with larger batch sizes, and it hit at a similar row count, so it's definately row-count/size related. > I'm suddenly wondering if the performance dropoff corresponds to the > point where the indexes have grown large enough to not fit in shared > buffers anymore. If I understand correctly, the 5000-iterations mark > corresponds to 2.5 million total rows in the table; with 5 indexes > you'd have 12.5 million index entries or probably a couple hundred MB > total. If the insertion pattern is sufficiently random that the entire > index ranges are "hot" then you might not have enough RAM. This is entirely possible, currently: shared_buffers = 1000 work_mem = 65535 maintenance_work_mem = 16384 max_stack_depth = 2048 > Again, experimenting with different values of shared_buffers seems like > a very worthwhile thing to do. I miss-understood shared_buffers then, as I thought work_mem was where indexes were kept. If this is where index manipulations happen, then I can up it quite a bit. The machine this is running on has 2GB of RAM. My concern isn't absolute performance, as this is not representative hardware, but instead is the evenness of behavior. Chris -- | Christopher Petrilli | petrilli@gmail.com
On 7/19/05, PFC <lists@boutiquenumerique.com> wrote: > > > > I think PFC's question was not directed towards modeling your > > application, but about helping us understand what is going wrong > > (so we can fix it). > > Exactly, I was wondering if this delay would allow things to get flushed, > for instance, which would give information about the problem (if giving it > a few minutes of rest resumed normal operation, it would mean that some > buffer somewhere is getting filled faster than it can be flushed). > > So, go ahead with a few minutes even if it's unrealistic, that is not the > point, you have to tweak it in various possible manners to understand the > causes. Totally understand, and appologize if I sounded dismissive. I definately appreciate the insight and input. > And instead of a pause, why not just set the duration of your test to > 6000 iterations and run it two times without dropping the test table ? This I can do. I'll probably set it for 5,000 for the first, and then start the second. In non-benchmark experience, however, this didn't seem to make much difference. > I'm going into wild guesses, but first you should want to know if the > problem is because the table is big, or if it's something else. So you run > the complete test, stopping a bit after it starts to make a mess, then > instead of dumping the table and restarting the test anew, you leave it as > it is, do something, then run a new test, but on this table which already > has data. > > 'something' could be one of those : > disconnect, reconnect (well you'll have to do that if you run the test > twice anyway) > just wait > restart postgres > unmount and remount the volume with the logs/data on it > reboot the machine > analyze > vacuum > vacuum analyze > cluster > vacuum full > reindex > defrag your files on disk (stopping postgres and copying the database > from your disk to anotherone and back will do) > or even dump'n'reload the whole database > > I think useful information can be extracted that way. If one of these > fixes your problem it'l give hints. > This could take a while :-) Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm suddenly wondering if the performance dropoff corresponds to the >> point where the indexes have grown large enough to not fit in shared >> buffers anymore. If I understand correctly, the 5000-iterations mark >> corresponds to 2.5 million total rows in the table; with 5 indexes >> you'd have 12.5 million index entries or probably a couple hundred MB >> total. If the insertion pattern is sufficiently random that the entire >> index ranges are "hot" then you might not have enough RAM. > This is entirely possible, currently: > shared_buffers = 1000 Ah-hah --- with that setting, you could be seeing shared-buffer thrashing even if only a fraction of the total index ranges need to be touched. I'd try some runs with shared_buffers at 10000, 50000, 100000. You might also try strace'ing the backend and see if behavior changes noticeably when the performance tanks. FWIW I have seen similar behavior while playing with MySQL's sql-bench test --- the default 1000 shared_buffers is not large enough to hold the "hot" part of the indexes in some of their insertion tests, and so performance tanks --- you can see this happening in strace because the kernel request mix goes from almost all writes to a significant part reads. On a pure data insertion benchmark you'd like to see nothing but writes. regards, tom lane
As I'm doing this, I'm noticing something *VERY* disturbing to me: postmaster backend: 20.3% CPU psql frontend: 61.2% CPU WTF? The only thing going through the front end is the COPY command, and it's sent to the backend to read from a file? Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > As I'm doing this, I'm noticing something *VERY* disturbing to me: > postmaster backend: 20.3% CPU > psql frontend: 61.2% CPU > WTF? The only thing going through the front end is the COPY command, > and it's sent to the backend to read from a file? Are you sure the backend is reading directly from the file, and not through psql? (\copy, or COPY FROM STDIN, would go through psql.) But even so that seems awfully high, considering how little work psql has to do compared to the backend. Has anyone ever profiled psql doing this sort of thing? I know I've spent all my time looking at the backend ... regards, tom lane
On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Christopher Petrilli <petrilli@gmail.com> writes: > > As I'm doing this, I'm noticing something *VERY* disturbing to me: > > postmaster backend: 20.3% CPU > > psql frontend: 61.2% CPU > > > WTF? The only thing going through the front end is the COPY command, > > and it's sent to the backend to read from a file? > > Are you sure the backend is reading directly from the file, and not > through psql? (\copy, or COPY FROM STDIN, would go through psql.) The exact command is: COPY test (columnlist...) FROM '/tmp/loadfile'; > But even so that seems awfully high, considering how little work psql > has to do compared to the backend. Has anyone ever profiled psql doing > this sort of thing? I know I've spent all my time looking at the > backend ... Linux 2.6, ext3, data=writeback It's flipped now (stil lrunning), and it's 48% postmaster, 36% psql, but anything more than 1-2% seems absurd. Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: >> Are you sure the backend is reading directly from the file, and not >> through psql? (\copy, or COPY FROM STDIN, would go through psql.) > The exact command is: > COPY test (columnlist...) FROM '/tmp/loadfile'; I tried to replicate this by putting a ton of COPY commands like that into a file and doing "psql -f file ...". I don't see more than about 0.3% CPU going to psql. So there's something funny about your test conditions. How *exactly* are you invoking psql? regards, tom lane
On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Christopher Petrilli <petrilli@gmail.com> writes: > >> Are you sure the backend is reading directly from the file, and not > >> through psql? (\copy, or COPY FROM STDIN, would go through psql.) > > > The exact command is: > > COPY test (columnlist...) FROM '/tmp/loadfile'; > > I tried to replicate this by putting a ton of COPY commands like that > into a file and doing "psql -f file ...". I don't see more than about > 0.3% CPU going to psql. So there's something funny about your test > conditions. How *exactly* are you invoking psql? It is a subprocess of a Python process, driven using a pexpect interchange. I send the COPY command, then wait for the '=#' to come back. Chris -- | Christopher Petrilli | petrilli@gmail.com
Christopher Petrilli <petrilli@gmail.com> writes: > On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> How *exactly* are you invoking psql? > It is a subprocess of a Python process, driven using a pexpect > interchange. I send the COPY command, then wait for the '=#' to come > back. Some weird interaction with pexpect maybe? Try adding "-n" (disable readline) to the psql command switches. regards, tom lane
> It is a subprocess of a Python process, driven using a pexpect > interchange. I send the COPY command, then wait for the '=#' to come > back. did you try sending the COPY as a normal query through psycopg ?
On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Christopher Petrilli <petrilli@gmail.com> writes: > > On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> How *exactly* are you invoking psql? > > > It is a subprocess of a Python process, driven using a pexpect > > interchange. I send the COPY command, then wait for the '=#' to come > > back. > > Some weird interaction with pexpect maybe? Try adding "-n" (disable > readline) to the psql command switches. Um... WOW! ==> pgsql_benchmark_803_bigbuffers10000_noreadline.txt <== 0 0.0319459438324 0.0263829231262 1 0.0303978919983 0.0263390541077 2 0.0306499004364 0.0273139476776 3 0.0306959152222 0.0270659923553 4 0.0307791233063 0.0278429985046 5 0.0306351184845 0.0278820991516 6 0.0307800769806 0.0335869789124 7 0.0408310890198 0.0370559692383 8 0.0371310710907 0.0344209671021 9 0.0372560024261 0.0334041118622 ==> pgsql_benchmark_803_bigbuffers10000.txt <== 0 0.0352520942688 0.149132013321 1 0.0320160388947 0.146126031876 2 0.0307128429413 0.139330863953 3 0.0306718349457 0.139590978622 4 0.0307030677795 0.140225172043 5 0.0306420326233 0.140012979507 6 0.0307261943817 0.139672994614 7 0.0307750701904 0.140661001205 8 0.0307800769806 0.141661167145 9 0.0306720733643 0.141198158264 First column is iteration, second is "gen time" to generate the load file, and 3rd is "load time". It doesn't stay QUITE that low, but it stays lower... quite a bit. We'll see what happens over time. Chris -- | Christopher Petrilli | petrilli@gmail.com
On 7/19/05, Christopher Petrilli <petrilli@gmail.com> wrote: > On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Christopher Petrilli <petrilli@gmail.com> writes: > > > On 7/19/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> How *exactly* are you invoking psql? > > > > > It is a subprocess of a Python process, driven using a pexpect > > > interchange. I send the COPY command, then wait for the '=#' to come > > > back. > > > > Some weird interaction with pexpect maybe? Try adding "-n" (disable > > readline) to the psql command switches. > > Um... WOW! > It doesn't stay QUITE that low, but it stays lower... quite a bit. > We'll see what happens over time. here's a look at the difference: http://blog.amber.org/diagrams/pgsql_readline_impact.png I'm running additional comparisons AFTER clustering and analyzing the tables... Chris -- | Christopher Petrilli | petrilli@gmail.com
On 7/19/05, Christopher Petrilli <petrilli@gmail.com> wrote: > It looks like the CVS HEAD is definately "better," but not by a huge > amount. The only difference is I wasn't run autovacuum in the > background (default settings), but I don't think this explains it. > Here's a graph of the differences and density of behavior: > > http://blog.amber.org/diagrams/pgsql_copy_803_cvs.png > > I can provide the raw data. Each COPY was 500 rows. Note that fsync > is turned off here. Maybe it'd be more stable with it turned on? I've updated this with trend-lines. Chris -- | Christopher Petrilli | petrilli@gmail.com