Thread: PostgreSQL 8.4 performance tuning questions
Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune before and everything worked fine for me. And now i have ~93% cpu load. Here's changed values of config: default_statistics_target = 50 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 192MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7680MB max_connections = 80 My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only postgresql working on it. For connection pooling i'm using pgbouncer's latest version with pool_size 20 (used 30 before, but now lowered) and 10k connections. What parameters i should give more attention on?
> Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune > before and everything worked fine for me. > > And now i have ~93% cpu load. Here's changed values of config: > > default_statistics_target = 50 > maintenance_work_mem = 1GB > constraint_exclusion = on > checkpoint_completion_target = 0.9 > effective_cache_size = 22GB > work_mem = 192MB > wal_buffers = 8MB > checkpoint_segments = 16 > shared_buffers = 7680MB > max_connections = 80 > > > My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only > postgresql working on it. > > For connection pooling i'm using pgbouncer's latest version with > pool_size 20 (used 30 before, but now lowered) and 10k connections. > > What parameters i should give more attention on? > All the values seem quite reasonable to me. What about the _cost variables? I guess one or more queries are evaluated using a different execution plan, probably sequential scan instead of index scan, hash join instead of merge join, or something like that. Try to log the "slow" statements - see "log_min_statement_duration". That might give you slow queries (although not necessarily the ones causing problems), and you can analyze them. What is the general I/O activity? Is there a lot of data read/written to the disks, is there a lot of I/O wait? regards Tomas PS: Was the database analyzed recently?
> Unfortunately had to downgrade back to 8.3. Now having troubles with > that and still solving them. > > For future upgrade, what is the basic steps? 1. create database 2. dump the data from the old database 3. load the data into the new database 4. analyze etc. (I prefer to do this manually at the beginning) 5. check that everything is working (that the correct execution plans are used, etc.) You may even run the (2) and (3) at once - use pipe instead of a file. > >>Was the database analyzed recently? > Hm... there was smth like auto analyzer in serverlog when i started it > first time, but i didn't mention that. > Should I analyze whole db? How to do it? Just execute 'ANALYZE' and the whole database will be analyzed, but when the autovacuum daemon is running this should be performed automatically (I guess - check the pg_stat_user_tables, there's information about last manual/automatic vacuuming and/or analysis). > And how should I change _cost variables? I haven't noticed you've not modified those variables, so don't change them. > I/O was very high. at first memory usage grew up and then began to full > swap. OK, this seems to be the cause. What were the original values of the config variables? If you've lowered the work_mem and you need to sort a lot of data, this may be a problem. What amounts of data are you working with? If the data were not analyzed recently, the execution plans will be inefficient and this may be the result. regards Tomas
Unfortunately had to downgrade back to 8.3. Now having troubles with that and still solving them. For future upgrade, what is the basic steps? >Was the database analyzed recently? Hm... there was smth like auto analyzer in serverlog when i started it first time, but i didn't mention that. Should I analyze whole db? How to do it? And how should I change _cost variables? I/O was very high. at first memory usage grew up and then began to full swap. 2009/7/30 <tv@fuzzy.cz>: >> Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune >> before and everything worked fine for me. >> >> And now i have ~93% cpu load. Here's changed values of config: >> >> default_statistics_target = 50 >> maintenance_work_mem = 1GB >> constraint_exclusion = on >> checkpoint_completion_target = 0.9 >> effective_cache_size = 22GB >> work_mem = 192MB >> wal_buffers = 8MB >> checkpoint_segments = 16 >> shared_buffers = 7680MB >> max_connections = 80 >> >> >> My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only >> postgresql working on it. >> >> For connection pooling i'm using pgbouncer's latest version with >> pool_size 20 (used 30 before, but now lowered) and 10k connections. >> >> What parameters i should give more attention on? >> > > All the values seem quite reasonable to me. What about the _cost variables? > > I guess one or more queries are evaluated using a different execution > plan, probably sequential scan instead of index scan, hash join instead of > merge join, or something like that. > > Try to log the "slow" statements - see "log_min_statement_duration". That > might give you slow queries (although not necessarily the ones causing > problems), and you can analyze them. > > What is the general I/O activity? Is there a lot of data read/written to > the disks, is there a lot of I/O wait? > > regards > Tomas > > PS: Was the database analyzed recently? > >
> OK, this seems to be the cause. What were the original values of the > config variables? If you've lowered the work_mem and you need to sort a > lot of data, this may be a problem. What amounts of data are you working > with? If the data were not analyzed recently, the execution plans will be > inefficient and this may be the result. The reason is that i'm using config that i used before and it worked perfect. work_mem is 192mb. I tried ANALYZE, but it didn't change anything. Amounts of data... at least, backup is over ~2.2Gb. I tried to use EXPLAIN ANALYZE for slow queries that i get from serverlog, but it also didn't change anything. 2009/7/30 <tv@fuzzy.cz>: >> Unfortunately had to downgrade back to 8.3. Now having troubles with >> that and still solving them. >> >> For future upgrade, what is the basic steps? > > 1. create database > 2. dump the data from the old database > 3. load the data into the new database > 4. analyze etc. (I prefer to do this manually at the beginning) > 5. check that everything is working (that the correct execution plans are > used, etc.) > > You may even run the (2) and (3) at once - use pipe instead of a file. > >> >>>Was the database analyzed recently? >> Hm... there was smth like auto analyzer in serverlog when i started it >> first time, but i didn't mention that. >> Should I analyze whole db? How to do it? > > Just execute 'ANALYZE' and the whole database will be analyzed, but when > the autovacuum daemon is running this should be performed automatically (I > guess - check the pg_stat_user_tables, there's information about last > manual/automatic vacuuming and/or analysis). > >> And how should I change _cost variables? > > I haven't noticed you've not modified those variables, so don't change them. > >> I/O was very high. at first memory usage grew up and then began to full >> swap. > > OK, this seems to be the cause. What were the original values of the > config variables? If you've lowered the work_mem and you need to sort a > lot of data, this may be a problem. What amounts of data are you working > with? If the data were not analyzed recently, the execution plans will be > inefficient and this may be the result. > > regards > Tomas > >
My additional comments:
tv@fuzzy.cz wrote:
Cheers,
Steve
tv@fuzzy.cz wrote:
0. Create test database - work out bugs and performance issues before going live.... For future upgrade, what is the basic steps?
...cluster. You only need to create the individual database if the options you select for the dump do not create the database(s).1. create database
...using the dump tools from the *new* version. With several cores, you might want to consider using the binary dump options in pg_dump if you want to use the new parallel restore feature in pg_restore with a possible dramatic increase in restore speed (benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup so it's worth the effort). The manual suggests that setting --jobs to the number of cores on the server is a good first approximation. See the -Fc options on pg_dump and the --jobs option in pg_restore for details.2. dump the data from the old database
Cheers,
Steve
Steve Crawford <scrawford@pinpointresearch.com> wrote: > benchmarks I've seen suggest that with 8 cores you may even see an > almost 8x restore speedup I'm curious what sort of data in what environment showed that ratio. -Kevin
Kevin Grittner wrote:
Cheers,
Steve
Was going on memory from a presentation I watched. Reports on the web have shown anything from a 3x increase using 8 cores to other non-detailed reports of "up to" 8x improvement. If you have one big table, don't expect much if any improvement. If you have lots of smaller tables/indexes then parallel restore will probably benefit you. This is all based on the not-atypical assumption that your restore will be CPU bound. I don't think parallel restore will be much use beyond the point you hit IO limits.Steve Crawford <scrawford@pinpointresearch.com> wrote:benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedupI'm curious what sort of data in what environment showed that ratio.
Cheers,
Steve
On Thu, 30 Jul 2009, Kevin Grittner wrote: > Steve Crawford <scrawford@pinpointresearch.com> wrote: >> benchmarks I've seen suggest that with 8 cores you may even see an >> almost 8x restore speedup > > I'm curious what sort of data in what environment showed that ratio. It depends on a lot of things. However, certainly for index creation, tests on servers over here have indicated that running four "CREATE INDEX" statements at the time runs four times as fast, assuming the table fits in maintenance_work_mem. Matthew -- I have an inferiority complex. But it's not a very good one.
Matthew Wakeling <matthew@flymine.org> wrote: > tests on servers over here have indicated that running four "CREATE > INDEX" statements at the time runs four times as fast, assuming the > table fits in maintenance_work_mem. I'm benchmarking a patch to the parallel restore, and just out of curiosity I've been comparing the multi-job approach, with various numbers of jobs, to a restore within a single database transaction; and I'm seeing (on serious production-quality servers) the parallel restore run in 55% to 75% of the time of a restore running off the same dump file using the -1 switch. The 16 processor machine got the best results, running with anywhere from 12 to 20 jobs. The 2 processor machine got the lesser benefit, running with 2 to 4 jobs. (The exact number of jobs really didn't make a difference big enough to emerge from the noise.) I've got 431 user tables with 578 indexes in a database which, freshly restored, is 70GB. (That's 91GB with the fragmentation and reasonable dead space we have in production.) Real production data; nothing synthetic. Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. I'm sure there are cases where people don't have the option to pipe things through, or that there may sometime be a big enough savings in the multiple jobs to pay off, even without overlapping the dump and restore, and with the necessity to write and read the data an extra time; but there are clearly situations where the piped approach is faster. We may want to try to characterize the conditions under which each is a win, so we can better target our advice.... -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Since the dump to custom format ran longer than the full pg_dump > piped directly to psql would have taken, the overall time to use this > technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmmm ... AFAIR there isn't a good reason for dump to custom format > to take longer than plain text dump, except for applying > compression. Maybe -Z0 would be worth testing? Or is the problem > that you have to write the data to a disk file rather than just > piping it? I'm not sure without benchmarking that. I was writing to the same RAID as the database I was dumping, so contention was probably a significant issue. But it would be interesting to compare different permutations to see what impact each has alone and in combination. I'm OK with setting up a benchmark run each night for a while, to shake out what I can, on this and the artificial cases. -Kevin
Tom Lane wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Since the dump to custom format ran longer than the full pg_dump >> piped directly to psql would have taken, the overall time to use this >> technique is clearly longer for our databases on our hardware. > > Hmmm ... AFAIR there isn't a good reason for dump to custom format to > take longer than plain text dump, except for applying compression. > Maybe -Z0 would be worth testing? Or is the problem that you have to > write the data to a disk file rather than just piping it? I always dump with -Z0(and compress afterwards or even in a pipe to get two cores busy) because otherwise custom dump times are simply ridiculous. However Kevin is on something here - on the typical 4-8 core box I tested I managed to an around cores/2 speedup for the restore which means that for a pure upgrade or testing similiar to what kevin is doing custom dumps + parallel restore might result in no win or even a loss. On on of our datasets I did some benchmarking a while ago (for those who attended bruce pg_migrator talk @pgcon these are same numbers): * 150GB Database (on-disk - ~100GB as a plain text dump) time to dump(-C0): 120min time to restore(single threaded): 180min time to restore(-j 16): 59min however the problem is that this does not actually mean that parallel restore shaves you ~120min in dump/restore time because you get the following real runtimes: plain text dump + single threaded restore in a pipe: 188min custom dump to file + parallel restore: 179min this is without compression, with the default custom dump + parallel restore is way slower than the simple approach on reasonable hardware. Stefan
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Since the dump to custom format ran longer than the full pg_dump >> piped directly to psql would have taken, the overall time to use >> this technique is clearly longer for our databases on our hardware. > > Hmmm ... AFAIR there isn't a good reason for dump to custom format > to take longer than plain text dump, except for applying > compression. Maybe -Z0 would be worth testing? Or is the problem > that you have to write the data to a disk file rather than just > piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. All of this is on a RAID 5 array with 5 drives using xfs with noatime,nobarrier and a 256MB BBU controller. -Kevin
Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> Since the dump to custom format ran longer than the full pg_dump >>> piped directly to psql would have taken, the overall time to use >>> this technique is clearly longer for our databases on our hardware. >> Hmmm ... AFAIR there isn't a good reason for dump to custom format >> to take longer than plain text dump, except for applying >> compression. Maybe -Z0 would be worth testing? Or is the problem >> that you have to write the data to a disk file rather than just >> piping it? > > I did some checking with the DBA who normally copies these around for > development and test environments. He confirmed that when the source > and target are on the same machine, a pg_dump piped to psql takes > about two hours. If he pipes across the network, it runs more like > three hours. > > My pg_dump to custom format ran for six hours. The single-transaction > restore from that dump file took two hours, with both on the same > machine. I can confirm with benchmarks, but this guy generally knows > what he's talking about (and we do create a lot of development and > test databases this way). > > Either the compression is tripling the dump time, or there is > something inefficient about how pg_dump writes to the disk. seems about right - compression in pg_dump -Fc is a serious bottleneck and unless can significantly speed it up or make it use of multiple cores (either for the dump itself - which would be awsome - or for the compression) I would recommend to not use it at all. Stefan
On 7/30/09 11:14 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> Since the dump to custom format ran longer than the full pg_dump >>> piped directly to psql would have taken, the overall time to use >>> this technique is clearly longer for our databases on our hardware. >> >> Hmmm ... AFAIR there isn't a good reason for dump to custom format >> to take longer than plain text dump, except for applying >> compression. Maybe -Z0 would be worth testing? Or is the problem >> that you have to write the data to a disk file rather than just >> piping it? > > I did some checking with the DBA who normally copies these around for > development and test environments. He confirmed that when the source > and target are on the same machine, a pg_dump piped to psql takes > about two hours. If he pipes across the network, it runs more like > three hours. > > My pg_dump to custom format ran for six hours. The single-transaction > restore from that dump file took two hours, with both on the same > machine. I can confirm with benchmarks, but this guy generally knows > what he's talking about (and we do create a lot of development and > test databases this way). > > Either the compression is tripling the dump time, or there is > something inefficient about how pg_dump writes to the disk. > > All of this is on a RAID 5 array with 5 drives using xfs with > noatime,nobarrier and a 256MB BBU controller. > Of course Compression has a HUGE effect if your I/O system is half-decent. Max GZIP compression speed with the newest Intel CPU's is something like 50MB/sec (it is data dependant, obviously -- it is usually closer to 30MB/sec). Max gzip decompression ranges from 50 to 150MB/sec (it can get really high only if the ratio is extremely large, like if you compress a repeating sequence of 256 bytes). The new parallel restore is nice and all, but we're still limited by the week it takes to dump the whole thing compressed. Parallel restore is a lot faster when restoring compressed dumps though, even without any indexes to make, since all that decompression is CPU hungry. > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Scott Carey <scott@richrelevance.com> wrote: > Max GZIP compression speed with the newest Intel CPU's is something > like 50MB/sec (it is data dependant, obviously -- it is usually > closer to 30MB/sec). Applying 30MB/sec to the 70GB accounts for 40 minutes. If those numbers are good, there's something else at play here. -Kevin
On 7/30/09 11:14 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote: > Tom Lane wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> Since the dump to custom format ran longer than the full pg_dump >>> piped directly to psql would have taken, the overall time to use this >>> technique is clearly longer for our databases on our hardware. >> >> Hmmm ... AFAIR there isn't a good reason for dump to custom format to >> take longer than plain text dump, except for applying compression. >> Maybe -Z0 would be worth testing? Or is the problem that you have to >> write the data to a disk file rather than just piping it? > > I always dump with -Z0(and compress afterwards or even in a pipe to get > two cores busy) because otherwise custom dump times are simply ridiculous. > However Kevin is on something here - on the typical 4-8 core box I > tested I managed to an around cores/2 speedup for the restore which > means that for a pure upgrade or testing similiar to what kevin is doing > custom dumps + parallel restore might result in no win or even a loss. > > On on of our datasets I did some benchmarking a while ago (for those who > attended bruce pg_migrator talk @pgcon these are same numbers): > > > * 150GB Database (on-disk - ~100GB as a plain text dump) > > time to dump(-C0): 120min > time to restore(single threaded): 180min > time to restore(-j 16): 59min Note also that with ext3 and XFS (untuned) parallel restore = HORRIBLY FRAGMENTED tables, to the point of sequential scans being rather slow. At least, they're mostly just interleaved with each other so there is little seeking backwards, but still... Beware. XFS with allocsize=64m or so interleaves them in reasonably large chunks though and prevents significant fragmentation. > > however the problem is that this does not actually mean that parallel > restore shaves you ~120min in dump/restore time because you get the > following real runtimes: > > plain text dump + single threaded restore in a pipe: 188min > custom dump to file + parallel restore: 179min On the other hand, I find that the use case where one DB is dumped to a backup, and then this backup is restored on several others -- that parallel restore is extremely useful there. Dump needs to be parallelized or at least pipelined to use more cores. COPY on one thread, compression on another? One trick with a dump, that works only if you have tables or schemas that can safely dump in different transactions, is to dump concurrently on different slices of the DB manually. This makes a huge difference if that is possible. > > > this is without compression, with the default custom dump + parallel > restore is way slower than the simple approach on reasonable hardware. > > > Stefan > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 7/30/09 11:24 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote: > Kevin Grittner wrote: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>>> Since the dump to custom format ran longer than the full pg_dump >>>> piped directly to psql would have taken, the overall time to use >>>> this technique is clearly longer for our databases on our hardware. >>> Hmmm ... AFAIR there isn't a good reason for dump to custom format >>> to take longer than plain text dump, except for applying >>> compression. Maybe -Z0 would be worth testing? Or is the problem >>> that you have to write the data to a disk file rather than just >>> piping it? >> >> I did some checking with the DBA who normally copies these around for >> development and test environments. He confirmed that when the source >> and target are on the same machine, a pg_dump piped to psql takes >> about two hours. If he pipes across the network, it runs more like >> three hours. >> >> My pg_dump to custom format ran for six hours. The single-transaction >> restore from that dump file took two hours, with both on the same >> machine. I can confirm with benchmarks, but this guy generally knows >> what he's talking about (and we do create a lot of development and >> test databases this way). >> >> Either the compression is tripling the dump time, or there is >> something inefficient about how pg_dump writes to the disk. > > seems about right - compression in pg_dump -Fc is a serious bottleneck > and unless can significantly speed it up or make it use of multiple > cores (either for the dump itself - which would be awsome - or for the > compression) I would recommend to not use it at all. > That's not an option when a dump compressed is 200GB and uncompressed is 1.3TB, for example. > > Stefan > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 30-7-2009 20:46 Scott Carey wrote: > Of course Compression has a HUGE effect if your I/O system is half-decent. > Max GZIP compression speed with the newest Intel CPU's is something like > 50MB/sec (it is data dependant, obviously -- it is usually closer to > 30MB/sec). Max gzip decompression ranges from 50 to 150MB/sec (it can get > really high only if the ratio is extremely large, like if you compress a > repeating sequence of 256 bytes). I just ran some quick numbers on our lightly loaded Nehalem X5570 (2.93+ Ghz depending on turbo-mode). I compressed a 192MB text file I had at hand using gzip -1, -2, -3, -6 and -9 and outputted its results to /dev/null. The file was in the kernels file cache all the time and I did the tests 3 times. Gzip -1 reached 54MB/s, -2 got 47MB/s, -3 got 32MB/s, -6 got 18MB/s and -9 got to 12MB/s. Just running cat on the file made it do 6400MB/s (i.e. it took 0.030 seconds to copy the file from memory to nowhere). Those files where respectively 69MB, 66MB, 64MB, 59MB and 58MB. Gunzip on the -1 file took 1.66 seconds, i.e. it read data at 41MB/s and outputted it to /dev/null at 115MB/s. The -9 file took 1.46s, so it read 40MB/s and wrote 131MB/s. Best regards, Arjen
Scott Carey <scott@richrelevance.com> writes: > Dump needs to be parallelized or at least pipelined to use more cores. COPY > on one thread, compression on another? We already do that (since compression happens on the pg_dump side). regards, tom lane
On 7/30/09 11:58 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Scott Carey <scott@richrelevance.com> wrote: > >> Max GZIP compression speed with the newest Intel CPU's is something >> like 50MB/sec (it is data dependant, obviously -- it is usually >> closer to 30MB/sec). > > Applying 30MB/sec to the 70GB accounts for 40 minutes. If those > numbers are good, there's something else at play here. It is rather data dependant, try gzip on command line as a test on some data. On a random tarball on my Nehalem system, I just got 23MB/sec compression rate on an uncompressable file. Decompression with gunzip was 145MB/sec. On a text file that I manually created with randommly placed repeating segments that compresses 200x to 1, compression was 115MB/sec (bytes in per sec), and decompression (bytes out per sec) was 265MB/sec. The array in this machine will do 800MB/sec reads/sec with 'dd' and 700MB/sec writes. One core has no chance. Now, what needs to be known with the pg_dump is not just how fast compression can go (assuming its gzip) but also what the duty cycle time of the compression is. If it is single threaded, there is all the network and disk time to cut out of this, as well as all the CPU time that pg_dump does without compression. > -Kevin >
On 7/30/09 1:15 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Scott Carey <scott@richrelevance.com> writes: >> Dump needs to be parallelized or at least pipelined to use more cores. COPY >> on one thread, compression on another? > > We already do that (since compression happens on the pg_dump side). > > regards, tom lane > Well, that isn't what I meant. pg_dump uses CPU outside of compression doing various things, If that Cpu is 10% as much as the compression, then splitting them up would yield ~10% gain when CPU bound.
Scott Carey <scott@richrelevance.com> wrote: > Now, what needs to be known with the pg_dump is not just how fast > compression can go (assuming its gzip) but also what the duty cycle > time of the compression is. If it is single threaded, there is all > the network and disk time to cut out of this, as well as all the CPU > time that pg_dump does without compression. Well, I established a couple messages back on this thread that pg_dump piped to psql to a database on the same machine writes the 70GB database to disk in two hours, while pg_dump to a custom format file at default compression on the same machine writes the 50GB file in six hours. No network involved, less disk space written. I'll try it tonight at -Z0. One thing I've been wondering about is what, exactly, is compressed in custom format. Is it like a .tar.gz file, where the compression is a layer over the top, or are individual entries compressed? If the latter, what's the overhead on setting up each compression stream? Is there some minimum size before that kicks in? (I know, I should go check the code myself. Maybe in a bit. Of course, if someone already knows, it would be quicker....) -Kevin
On 7/30/09 1:58 PM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Scott Carey <scott@richrelevance.com> wrote: > >> Now, what needs to be known with the pg_dump is not just how fast >> compression can go (assuming its gzip) but also what the duty cycle >> time of the compression is. If it is single threaded, there is all >> the network and disk time to cut out of this, as well as all the CPU >> time that pg_dump does without compression. > > Well, I established a couple messages back on this thread that pg_dump > piped to psql to a database on the same machine writes the 70GB > database to disk in two hours, while pg_dump to a custom format file > at default compression on the same machine writes the 50GB file in six > hours. No network involved, less disk space written. I'll try it > tonight at -Z0. So, I'm not sure what the pg_dump custom format overhead is minus the compression -- there is probably some non-compression overhead from that format other than the compression. -Z1 might be interesting too, but obviously it takes some time. Interesting that your uncompressed case is only 40% larger. For me, the compressed dump is in the range of 20% the size of the uncompressed one. > > One thing I've been wondering about is what, exactly, is compressed in > custom format. Is it like a .tar.gz file, where the compression is a > layer over the top, or are individual entries compressed? It is instructive to open up a compressed custom format file in 'less' or another text viewer. Basically, it is the same as the uncompressed dump with all the DDL uncompressed, but the binary chunks compressed. It would seem (educated guess, looking at the raw file, and not the code) that the table data is compressed and the DDL points to an index in the file where the compressed blob for the copy lives. > If the > latter, what's the overhead on setting up each compression stream? Is > there some minimum size before that kicks in? (I know, I should go > check the code myself. Maybe in a bit. Of course, if someone already > knows, it would be quicker....) Gzip does have some quirky performance behavior depending on the chunk size of data you stream into it. > > -Kevin >
Scott Carey <scott@richrelevance.com> wrote: > Gzip does have some quirky performance behavior depending on the > chunk size of data you stream into it. Yeah, I've run into that before. If we're sending each individual datum to a gzip function rather than waiting until we've got a decent-size buffer, that could explain it. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > One thing I've been wondering about is what, exactly, is compressed in > custom format. Is it like a .tar.gz file, where the compression is a > layer over the top, or are individual entries compressed? Individual entries. Eyeball examination of a dump file shows that we only compress table-data entries, and don't for example waste time firing up the compressor to process a function body. It's possible that it'd be worth trying to have some lower limit on the amount of data in a table before we bother to compress it, but I bet that it wouldn't make any difference on your databases ... regards, tom lane
Scott Carey <scott@richrelevance.com> writes: > Gzip does have some quirky performance behavior depending on the chunk size > of data you stream into it. Can you enlarge on that comment? I'm not sure that pg_dump is aware that there's anything to worry about there. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Scott Carey <scott@richrelevance.com> writes: >> Gzip does have some quirky performance behavior depending on the >> chunk size of data you stream into it. > > Can you enlarge on that comment? I'm not sure that pg_dump is aware > that there's anything to worry about there. If the library used here is anything like the native library used by Java, it'd be worth putting a buffer layer ahead of the calls to gzip, so it isn't dealing with each individual value as a separate call. I seem to remember running into that issue in Java, where throwing a BufferedOutputStream in there fixed the performance issue. -Kevin
Hey guyz, thanks for help. I solved the problems. The reason was in bad query, that i've accidentally committed right after upgrading. PostgreSQL 8.4 is perfect! Analyze works like a charm, and MUCH better than in 8.3. 2009/7/31 Kevin Grittner <Kevin.Grittner@wicourts.gov>: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Scott Carey <scott@richrelevance.com> writes: >>> Gzip does have some quirky performance behavior depending on the >>> chunk size of data you stream into it. >> >> Can you enlarge on that comment? I'm not sure that pg_dump is aware >> that there's anything to worry about there. > > If the library used here is anything like the native library used by > Java, it'd be worth putting a buffer layer ahead of the calls to gzip, > so it isn't dealing with each individual value as a separate call. I > seem to remember running into that issue in Java, where throwing a > BufferedOutputStream in there fixed the performance issue. > > -Kevin > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On 7/30/09 2:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Scott Carey <scott@richrelevance.com> writes: >> Gzip does have some quirky performance behavior depending on the chunk size >> of data you stream into it. > > Can you enlarge on that comment? I'm not sure that pg_dump is aware > that there's anything to worry about there. > > regards, tom lane > For example, one of the things that gzip does is calculate the crc of the item being compressed. Calculating that incrementally is less efficient than doing it in bulk. For whatever reason, some other internals of gzip tend to perform much better if submitting say, 4k or 8k or 16k chunks rather than little bits at a time. But I'm sure some of that also depends on what library you're using since they all vary somewhat.
Scott Carey <scott@richrelevance.com> writes: > On 7/30/09 2:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> Scott Carey <scott@richrelevance.com> writes: >>> Gzip does have some quirky performance behavior depending on the chunk size >>> of data you stream into it. >> >> Can you enlarge on that comment? I'm not sure that pg_dump is aware >> that there's anything to worry about there. > For whatever reason, some other internals of gzip tend to perform much > better if submitting say, 4k or 8k or 16k chunks rather than little bits at > a time. But I'm sure some of that also depends on what library you're using > since they all vary somewhat. AFAIK there is only one widely-used implementation of zlib, and it hasn't changed much in a long time. I did some tracing and verified that pg_dump passes data to deflate() one table row at a time. I'm not sure about the performance implications of that, but it does seem like it might be something to look into. regards, tom lane
On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I did some tracing and verified that pg_dump passes data to deflate() > one table row at a time. I'm not sure about the performance > implications of that, but it does seem like it might be something to > look into. I suspect if this was a problem the zlib people would have added internal buffering ages ago. I find it hard to believe we're not the first application to use it this way. I suppose it wouldn't be the first time a problem like this went unfixed though. Is the zlib software actively maintained or was your earlier comment implying it's currently an orphaned codebase? -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> wrote: > I suspect if this was a problem the zlib people would have added > internal buffering ages ago. I find it hard to believe we're not the > first application to use it this way. I think that most uses of this library are on entire files or streams. They may have felt that adding another layer of buffering would just hurt performance for the typical use case, and anyone using it in some other way could always use their own buffering layer. In Java adding that layer took 30 characters of code, so it didn't make a very big impression on me -- it took a while to even remember I'd had to do it. -Kevin
On Thu, 30 Jul 2009, Rauan Maemirov wrote: > maintenance_work_mem = 1GB > work_mem = 192MB > shared_buffers = 7680MB > max_connections = 80 > My box is Nehalem 2xQuad 2.8 with RAM 32Gb While it looks like you sorted out your issue downthread, I wanted to point out that your setting for work_mem could be dangerously high here and contribute to problems with running out memory or using swap. If each of your 80 clients was doing a sort at the same time, you'd be using 80 * 192MB + 7680MB = 15360GB of RAM just for the server. The problem is that each client could do multiple sorts, so usage might even got higher. Unless you have a big data warehouse setup, more common work_mem settings are in the 16-64MB range rather than going this high. Just something to keep an eye on if you find a lot of memory is being used by the database processes. I really need to refine the pgtune model to more carefully account for this particular problem, it's a bit too aggressive here for people who aren't proactively watching the server's RAM after changing the settings. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, Jul 30, 2009 at 10:10 PM, Greg Smith<gsmith@gregsmith.com> wrote: > On Thu, 30 Jul 2009, Rauan Maemirov wrote: > >> maintenance_work_mem = 1GB >> work_mem = 192MB >> shared_buffers = 7680MB >> max_connections = 80 >> My box is Nehalem 2xQuad 2.8 with RAM 32Gb > > While it looks like you sorted out your issue downthread, I wanted to point > out that your setting for work_mem could be dangerously high here and > contribute to problems The real danger here is that you can set up your pg server to fail ONLY under heavy load, when it runs out of memory and goes into a swap storm. So, without proper load testing and profiling, you may not know you're headed for danger until your server goes unresponsive midday at the most critical of times. And restarting it will just lead to the same failure again as the clients all reconnect and pummel your server. Meanwhile, going from 192 to 16MB might result in a total slowdown measured in a fraction of a percentage overall, and prevent this kind of failure. If there's one single request you can justify big work_mem for then set it for just that one query. It's not uncommon to have a reporting user limited to a few connections and with "alter user reportinguser set work_mem='512MB';" so that it can run fast but not deplete your server's resources on accident during heavy load.
Scott Carey wrote: > > > On 7/30/09 11:24 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote: > >> Kevin Grittner wrote: >>> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>>>> Since the dump to custom format ran longer than the full pg_dump >>>>> piped directly to psql would have taken, the overall time to use >>>>> this technique is clearly longer for our databases on our hardware. >>>> Hmmm ... AFAIR there isn't a good reason for dump to custom format >>>> to take longer than plain text dump, except for applying >>>> compression. Maybe -Z0 would be worth testing? Or is the problem >>>> that you have to write the data to a disk file rather than just >>>> piping it? >>> I did some checking with the DBA who normally copies these around for >>> development and test environments. He confirmed that when the source >>> and target are on the same machine, a pg_dump piped to psql takes >>> about two hours. If he pipes across the network, it runs more like >>> three hours. >>> >>> My pg_dump to custom format ran for six hours. The single-transaction >>> restore from that dump file took two hours, with both on the same >>> machine. I can confirm with benchmarks, but this guy generally knows >>> what he's talking about (and we do create a lot of development and >>> test databases this way). >>> >>> Either the compression is tripling the dump time, or there is >>> something inefficient about how pg_dump writes to the disk. >> seems about right - compression in pg_dump -Fc is a serious bottleneck >> and unless can significantly speed it up or make it use of multiple >> cores (either for the dump itself - which would be awsome - or for the >> compression) I would recommend to not use it at all. >> > > That's not an option when a dump compressed is 200GB and uncompressed is > 1.3TB, for example. yeah that was not meant as "don't use compression at all" but rather as "use a different way to compress than what pg_dump provides internally". Stefan
That's true. I tried to lower work_mem from 192 to 64, and it caused total slowdown. By the way, is there any performance tips for tuning joins? I noticed, that my joins on 8.4 slowed down, on 8.3 it was faster a bit. 2009/7/31 Scott Marlowe <scott.marlowe@gmail.com>: > On Thu, Jul 30, 2009 at 10:10 PM, Greg Smith<gsmith@gregsmith.com> wrote: >> On Thu, 30 Jul 2009, Rauan Maemirov wrote: >> >>> maintenance_work_mem = 1GB >>> work_mem = 192MB >>> shared_buffers = 7680MB >>> max_connections = 80 >>> My box is Nehalem 2xQuad 2.8 with RAM 32Gb >> >> While it looks like you sorted out your issue downthread, I wanted to point >> out that your setting for work_mem could be dangerously high here and >> contribute to problems > > The real danger here is that you can set up your pg server to fail > ONLY under heavy load, when it runs out of memory and goes into a swap > storm. So, without proper load testing and profiling, you may not > know you're headed for danger until your server goes unresponsive > midday at the most critical of times. And restarting it will just > lead to the same failure again as the clients all reconnect and pummel > your server. > > Meanwhile, going from 192 to 16MB might result in a total slowdown > measured in a fraction of a percentage overall, and prevent this kind > of failure. > > If there's one single request you can justify big work_mem for then > set it for just that one query. It's not uncommon to have a reporting > user limited to a few connections and with "alter user reportinguser > set work_mem='512MB';" so that it can run fast but not deplete your > server's resources on accident during heavy load. >
On Thu, Jul 30, 2009 at 10:07 AM, Rauan Maemirov<rauan@maemirov.com> wrote: > Unfortunately had to downgrade back to 8.3. Now having troubles with > that and still solving them. > > For future upgrade, what is the basic steps? > >>Was the database analyzed recently? > Hm... there was smth like auto analyzer in serverlog when i started it > first time, but i didn't mention that. > Should I analyze whole db? How to do it? > > And how should I change _cost variables? > > I/O was very high. at first memory usage grew up and then began to full swap. There is at least one known case of memory leak 8.4.0. Possibly you got hit by that or another early adopter bug. I think in your case it's probably to soon to have upgraded...with 10k connections even minor annoyances can be real nasty. I'd wait a few months while in the meantime stage your app on a testing database and double check the important query plans to make sure there are no big performance regressions. Each version of pg has a couple for various reasons. merlin
Greg Stark <gsstark@mit.edu> writes: > On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I did some tracing and verified that pg_dump passes data to deflate() >> one table row at a time. �I'm not sure about the performance >> implications of that, but it does seem like it might be something to >> look into. > I suspect if this was a problem the zlib people would have added > internal buffering ages ago. I find it hard to believe we're not the > first application to use it this way. I dug into this a bit more. zlib *does* have internal buffering --- it has to, because it needs a minimum lookahead of several hundred bytes to ensure that compression works properly. The per-call overhead of deflate() looks a bit higher than one could wish when submitting short chunks, but oprofile shows that "pg_dump -Fc" breaks down about like this: samples % image name symbol name 1103922 74.7760 libz.so.1.2.3 longest_match 215433 14.5927 libz.so.1.2.3 deflate_slow 55368 3.7504 libz.so.1.2.3 compress_block 41715 2.8256 libz.so.1.2.3 fill_window 17535 1.1878 libc-2.9.so memcpy 13663 0.9255 libz.so.1.2.3 adler32 4613 0.3125 libc-2.9.so _int_malloc 2942 0.1993 libc-2.9.so free 2552 0.1729 libc-2.9.so malloc 2155 0.1460 libz.so.1.2.3 pqdownheap 2128 0.1441 libc-2.9.so _int_free 1702 0.1153 libz.so.1.2.3 deflate 1648 0.1116 libc-2.9.so mempcpy longest_match is the core lookahead routine and is not going to be affected by submission sizes, because it isn't called unless adequate data (ie, the longest possible match length) is available in zlib's internal buffer. It's possible that doing more buffering on our end would reduce the deflate_slow component somewhat, but it looks like the most we could hope to get that way is in the range of 10% speedup. So I'm wondering if anyone can provide concrete evidence of large wins from buffering zlib's input. regards, tom lane
On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> I did some tracing and verified that pg_dump passes data to deflate() >>> one table row at a time. I'm not sure about the performance >>> implications of that, but it does seem like it might be something to >>> look into. > >> I suspect if this was a problem the zlib people would have added >> internal buffering ages ago. I find it hard to believe we're not the >> first application to use it this way. > > I dug into this a bit more. zlib *does* have internal buffering --- it > has to, because it needs a minimum lookahead of several hundred bytes > to ensure that compression works properly. The per-call overhead of > deflate() looks a bit higher than one could wish when submitting short > chunks, but oprofile shows that "pg_dump -Fc" breaks down about like > this: During dump (size of dump is 2.6 GB), No Compression : - postgres at 70-100% CPU and pg_dump at something like 10-20% - dual core is useful (a bit...) - dump size 2.6G - dump time 2m25.288s Compression Level 1 : - postgres at 70-100% CPU and pg_dump at 20%-100% - dual core is definitely useful - dump size 544MB - dump time 2m33.337s Since this box is mostly idle right now, eating CPU for compression is no problem... Adding an option to use LZO instead of gzip could be useful... Compressing the uncompressed 2.6GB dump : - gzip -1 : - compressed size : 565 MB - compression throughput : 28.5 MB/s - decompression throughput : 74 MB/s - LZO -1 : - compressed size : 696M - compression throughput : 86 MB/s - decompression throughput : 247 MB/s Conclusion : LZO could help for fast disks (RAID) or slow disks on a CPU-starved server...
On 7/31/09 4:01 PM, "PFC" <lists@peufeu.com> wrote: > On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Greg Stark <gsstark@mit.edu> writes: >>> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> I did some tracing and verified that pg_dump passes data to deflate() >>>> one table row at a time. I'm not sure about the performance >>>> implications of that, but it does seem like it might be something to >>>> look into. >> >>> I suspect if this was a problem the zlib people would have added >>> internal buffering ages ago. I find it hard to believe we're not the >>> first application to use it this way. >> >> I dug into this a bit more. zlib *does* have internal buffering --- it >> has to, because it needs a minimum lookahead of several hundred bytes >> to ensure that compression works properly. The per-call overhead of >> deflate() looks a bit higher than one could wish when submitting short >> chunks, but oprofile shows that "pg_dump -Fc" breaks down about like >> this: > > During dump (size of dump is 2.6 GB), > > No Compression : > - postgres at 70-100% CPU and pg_dump at something like 10-20% > - dual core is useful (a bit...) > - dump size 2.6G > - dump time 2m25.288s > > Compression Level 1 : > - postgres at 70-100% CPU and pg_dump at 20%-100% > - dual core is definitely useful > - dump size 544MB > - dump time 2m33.337s > > Since this box is mostly idle right now, eating CPU for compression is no > problem... > I get very different (contradictory) behavior. Server with fast RAID, 32GB RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 8.3.6 No disk wait time during any test. One test beforehand was used to prime the disk cache. 100% CPU in the below means one core fully used. 800% means the system is fully loaded. pg_dump > file (on a subset of the DB with lots of tables with small tuples) 6m 27s, 4.9GB; 12.9MB/sec 50% CPU in postgres, 50% CPU in pg_dump pg_dump -Fc > file.gz 9m6s, output is 768M (6.53x compression); 9.18MB/sec 30% CPU in postgres, 70% CPU in pg_dump pg_dump | gzip > file.2.gz 6m22s, 13MB/sec. 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip The default (5) compression level was used. So, when using pg_dump alone, I could not get significantly more than one core of CPU (all on the same box). No matter how I tried, pg_dump plus the postgres process dumping data always totaled about 102% -- it would flulctuate in top, give or take 15% at times, but the two always were very close (within 3%) of this total. Piping the whole thing to gzip gets some speedup. This indicates that perhaps the implementation or use of gzip is inappropriate on pg_dump's side or the library version is older or slower. Alternatively, the use of gzip inside pg_dump fails to pipeline CPU useage as well as piping it does, as the above shows 50% more CPU utilization when piping. I can do the same test with a single table that is 10GB later (which does dump much faster than 13MB/sec and has rows that average about 500 bytes in size). But overall I have found pg_dump's performace sorely lacking, and this is a data risk in the big picture. Postgres is very good about not losing data, but that only goes up to the limits of the hardware and OS, which is not good enough. Because of long disaster recovery times and poor replication/contingency features, it is a fairly unsafe place for data once it gets beyond a certain size and a BC plan requires minimal downtime. > Adding an option to use LZO instead of gzip could be useful... > > Compressing the uncompressed 2.6GB dump : > > - gzip -1 : > > - compressed size : 565 MB > - compression throughput : 28.5 MB/s > - decompression throughput : 74 MB/s > > - LZO -1 : > - compressed size : 696M > - compression throughput : 86 MB/s > - decompression throughput : 247 MB/s > > Conclusion : LZO could help for fast disks (RAID) or slow disks on a > CPU-starved server... > LZO would be a great option, it is very fast, especially decompression. With gzip, one rarely gains by going below gzip -3 or above gzip -6. > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Scott Carey <scott@richrelevance.com> writes: > I get very different (contradictory) behavior. Server with fast RAID, 32GB > RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 > 8.3.6 > No disk wait time during any test. One test beforehand was used to prime > the disk cache. > 100% CPU in the below means one core fully used. 800% means the system is > fully loaded. > pg_dump > file (on a subset of the DB with lots of tables with small > tuples) > 6m 27s, 4.9GB; 12.9MB/sec > 50% CPU in postgres, 50% CPU in pg_dump > pg_dump -Fc > file.gz > 9m6s, output is 768M (6.53x compression); 9.18MB/sec > 30% CPU in postgres, 70% CPU in pg_dump > pg_dump | gzip > file.2.gz > 6m22s, 13MB/sec. > 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip I don't see anything very contradictory here. What you're demonstrating is that it's nice to be able to throw a third CPU at the compression part of the problem. That's likely to remain true if we shift to a different compression algorithm. I suspect if you substituted lzo for gzip in the third case, the picture wouldn't change very much. regards, tom lane
On 8/3/09 11:56 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Scott Carey <scott@richrelevance.com> writes: >> I get very different (contradictory) behavior. Server with fast RAID, 32GB >> RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 >> 8.3.6 >> No disk wait time during any test. One test beforehand was used to prime >> the disk cache. >> 100% CPU in the below means one core fully used. 800% means the system is >> fully loaded. > >> pg_dump > file (on a subset of the DB with lots of tables with small >> tuples) >> 6m 27s, 4.9GB; 12.9MB/sec >> 50% CPU in postgres, 50% CPU in pg_dump > >> pg_dump -Fc > file.gz >> 9m6s, output is 768M (6.53x compression); 9.18MB/sec >> 30% CPU in postgres, 70% CPU in pg_dump > >> pg_dump | gzip > file.2.gz >> 6m22s, 13MB/sec. >> 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip > > I don't see anything very contradictory here. The other poster got nearly 2 CPUs of work from just pg_dump + postgres. That contradicts my results (but could be due to data differences or postgres version differences). In the other use case, compression was not slower, but just used more CPU (also contradicting my results). > What you're demonstrating > is that it's nice to be able to throw a third CPU at the compression > part of the problem. No, 1.5 CPU. A full use of a second would even be great. I'm also demonstrating that there is some artificial bottleneck somewhere preventing postgres and pg_dump to operate concurrently. Instead, one waits while the other does work. Your claim earlier in this thread was that there was already pipelined work being done due to pg_dump + postgresql -- which seems to be true for the other test case but not mine. As a consequence, adding compression throttles the postgres process even though the compression hasn't caused 100% CPU (or close) on any task involved. > That's likely to remain true if we shift to a > different compression algorithm. I suspect if you substituted lzo for > gzip in the third case, the picture wouldn't change very much. > That is exactly the point. LZO would be nice (and help mitigate this problem), but it doesn't solve the real problem here. Pg_dump is slow and artificially throttles without even getting 100% CPU from itself or postgres. The problem still remains: dumping with -Fc can be significantly slower than raw piped to a compression utility, even if no task is CPU or I/O bound. Dumping and piping to gzip is faster. But parallel restore won't work without custom or raw format. > regards, tom lane >
> I get very different (contradictory) behavior. Server with fast RAID, > 32GB > RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2 > 8.3.6 That's a very different serup from my (much less powerful) box, so that would explain it... > No disk wait time during any test. One test beforehand was used to prime > the disk cache. > 100% CPU in the below means one core fully used. 800% means the system > is > fully loaded. > > pg_dump > file (on a subset of the DB with lots of tables with small > tuples) > 6m 27s, 4.9GB; 12.9MB/sec > 50% CPU in postgres, 50% CPU in pg_dump If there is no disk wait time, then why do you get 50/50 and not 100/100 or at least 1 core maxed out ? That's interesting... COPY annonces TO '/dev/null'; COPY 413526 Temps : 13871,093 ms \copy annonces to '/dev/null' Temps : 14037,946 ms time pg_dump -Fc -t annonces -U annonces --compress=0 annonces >/dev/null real 0m14.596s user 0m0.700s sys 0m0.372s In all 3 cases postgres maxes out one core (I've repeated the test until all data was cached, so there is no disk access at all in vmstat). Size of dump is 312MB.
On Mon, Aug 3, 2009 at 2:56 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I don't see anything very contradictory here. What you're demonstrating > is that it's nice to be able to throw a third CPU at the compression > part of the problem. That's likely to remain true if we shift to a > different compression algorithm. I suspect if you substituted lzo for > gzip in the third case, the picture wouldn't change very much. lzo is much, much, (much) faster than zlib. Note, I've tried several times to contact the author to get clarification on licensing terms and have been unable to get a response. [root@devdb merlin]# time lzop -c dump.sql > /dev/null real 0m16.683s user 0m15.573s sys 0m0.939s [root@devdb merlin]# time gzip -c dump.sql > /dev/null real 3m43.090s user 3m41.471s sys 0m1.036s merlin
> lzo is much, much, (much) faster than zlib. Note, I've tried several decompression speed is even more awesome... > times to contact the author to get clarification on licensing terms > and have been unable to get a response. lzop and the LZO library are distributed under the terms of the GNU General Public License (GPL). source : http://www.lzop.org/lzop_man.php
On Mon, Aug 3, 2009 at 5:30 PM, PFC<lists@peufeu.com> wrote: > >> lzo is much, much, (much) faster than zlib. Note, I've tried several > > decompression speed is even more awesome... > >> times to contact the author to get clarification on licensing terms >> and have been unable to get a response. > > lzop and the LZO library are distributed under the terms of the GNU General > Public License (GPL). > source : http://www.lzop.org/lzop_man.php yeah...I have another project I'm working on that is closed source, plus I was curious if something could be worked out for pg...lzo seems ideal for database usage. The author is MIA or too busy hacking to answer his email :-). merlin
Merlin Moncure escribió: > On Mon, Aug 3, 2009 at 5:30 PM, PFC<lists@peufeu.com> wrote: > > > >> lzo is much, much, (much) faster than zlib. Note, I've tried several > > > > decompression speed is even more awesome... > > > >> times to contact the author to get clarification on licensing terms > >> and have been unable to get a response. > > > > lzop and the LZO library are distributed under the terms of the GNU General > > Public License (GPL). > > source : http://www.lzop.org/lzop_man.php > > yeah...I have another project I'm working on that is closed source, > plus I was curious if something could be worked out for pg...lzo seems > ideal for database usage. I think this was already discussed here. It turns out that a specific exception for PG wouldn't be acceptable because of the multiple commercial derivates. LZO would have to become BSD, which presumably the author just doesn't want to do. Maybe we could have a --enable-lzo switch similar to what we do with readline. Of course, a non-LZO-enabled build would not be able to read a dump from such a build. (We could also consider LZO for TOAST compression). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 8/4/09 8:30 AM, "Alvaro Herrera" <alvherre@commandprompt.com> wrote: > Merlin Moncure escribió: >> On Mon, Aug 3, 2009 at 5:30 PM, PFC<lists@peufeu.com> wrote: >>> >>>> lzo is much, much, (much) faster than zlib. Note, I've tried several >>> >>> decompression speed is even more awesome... >>> >>>> times to contact the author to get clarification on licensing terms >>>> and have been unable to get a response. >>> >>> lzop and the LZO library are distributed under the terms of the GNU General >>> Public License (GPL). >>> source : http://www.lzop.org/lzop_man.php >> >> yeah...I have another project I'm working on that is closed source, >> plus I was curious if something could be worked out for pg...lzo seems >> ideal for database usage. > > I think this was already discussed here. It turns out that a specific > exception for PG wouldn't be acceptable because of the multiple > commercial derivates. LZO would have to become BSD, which presumably > the author just doesn't want to do. > > Maybe we could have a --enable-lzo switch similar to what we do with > readline. Of course, a non-LZO-enabled build would not be able to read > a dump from such a build. (We could also consider LZO for TOAST > compression). > There are a handful of other compression algorithms very similar to LZO in performance / compression level under various licenses. LZO is just the best known and most widely used. http://www.fastlz.org/ (MIT) http://www.quicklz.com/ (GPL again) http://oldhome.schmorp.de/marc/liblzf.html (BSD -ish) ZFS uses LZJB (CDDL) source code here: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/ compress.c (a good read for one of the most simple LZ compression algorithms in terms of lines of code -- about 100 lines) Fastlz, with its MIT license, is probably the most obvious choice. > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Scott Carey <scott@richrelevance.com> writes: > There are a handful of other compression algorithms very similar to LZO in > performance / compression level under various licenses. > LZO is just the best known and most widely used. And after we get done with the license question, we need to ask about patents. The compression area is just a minefield of patents. gzip is known to avoid all older patents (and would be pretty solid prior art against newer ones). I'm far less confident about lesser-known systems. regards, tom lane
On Tue, Aug 4, 2009 at 4:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Scott Carey <scott@richrelevance.com> writes: >> There are a handful of other compression algorithms very similar to LZO in >> performance / compression level under various licenses. >> LZO is just the best known and most widely used. > > And after we get done with the license question, we need to ask about > patents. The compression area is just a minefield of patents. gzip is > known to avoid all older patents (and would be pretty solid prior art > against newer ones). I'm far less confident about lesser-known systems. I did a little bit of research. LZO and friends are variants of LZW. The main LZW patent died in 2003, and AFAIK there has been no patent enforcement cases brought against LZO or it's cousins (LZO dates to 1996). OK, I'm no attorney, etc, but the internet seems to believe that the algorithms are patent free. LZO is quite widely used, in both open source and some relatively high profile commercial projects. I downloaded the libraries and did some tests. 2.5 G sql dump: compression time: zlib: 4m 1s lzo: 17s fastlz: 28.8s liblzf: 26.7s compression size: zlib: 609M 75% lzo: 948M 62% fastlz: 936M 62.5% liblzf: 916M 63.5% A couple of quick notes: liblzf produces (possibly) architecture dependent archives according to its header, and fastlz is not declared 'stable' according to its website. merlin
On 8/5/09 7:12 AM, "Merlin Moncure" <mmoncure@gmail.com> wrote: > On Tue, Aug 4, 2009 at 4:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Scott Carey <scott@richrelevance.com> writes: >>> There are a handful of other compression algorithms very similar to LZO in >>> performance / compression level under various licenses. >>> LZO is just the best known and most widely used. >> >> And after we get done with the license question, we need to ask about >> patents. The compression area is just a minefield of patents. gzip is >> known to avoid all older patents (and would be pretty solid prior art >> against newer ones). I'm far less confident about lesser-known systems. > > I did a little bit of research. LZO and friends are variants of LZW. > The main LZW patent died in 2003, and AFAIK there has been no patent > enforcement cases brought against LZO or it's cousins (LZO dates to > 1996). OK, I'm no attorney, etc, but the internet seems to believe > that the algorithms are patent free. LZO is quite widely used, in > both open source and some relatively high profile commercial projects. > That doesn't sound right to me, LZW is patent protected in a few ways, and is a LZ78 scheme. LZO, zlib, and the others here are LZ77 schemes which avoid the LZW patents. There are some other patents in the territory with respect to how the hash lookups are done for the LZ77 'sliding window' approach. Most notably, using a tree is patented, and a couple other (obvious) tricks that are generally avoided anyway for any algorithms that are trying to be fast rather than produce the highest compression. http://en.wikipedia.org/wiki/Lossless_data_compression#Historical_legal_issu es http://en.wikipedia.org/wiki/LZ77_and_LZ78 http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch http://www.faqs.org/faqs/compression-faq/part1/section-7.html http://www.ross.net/compression/patents.html Note, US patents are either 17 years after grant, or 20 years after filing. A very large chunk of those in this space have expired, but a few were filed/granted in the early 90's -- though those are generally more specific and easy to avoid. Or very obvious duplicates of previous patents. More notably, one of these, if interpreted broadly, would apply to zlib as well (Gibson and Graybill) but the patent mentions LZRW1, and any broader scope would have prior art conflicts with ones that are now long expired. Its 17 years after grant on that, but not 20 years after filing. > I downloaded the libraries and did some tests. > 2.5 G sql dump: > > compression time: > zlib: 4m 1s > lzo: 17s > fastlz: 28.8s > liblzf: 26.7s > > compression size: > zlib: 609M 75% > lzo: 948M 62% > fastlz: 936M 62.5% > liblzf: 916M 63.5% > Interesting how that conflicts with some other benchmarks out there (where LZO ad the others are about the same). But, they're all an order of magnitude faster than gzip/zlib. > A couple of quick notes: liblzf produces (possibly) architecture > dependent archives according to its header, and fastlz is not declared > 'stable' according to its website. > > merlin >