Thread: PostgreSQL 8.4 performance tuning questions

PostgreSQL 8.4 performance tuning questions

From
Rauan Maemirov
Date:
Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
before and everything worked fine for me.

And now i have ~93% cpu load. Here's changed values of config:

default_statistics_target = 50
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 192MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7680MB
max_connections = 80


My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
postgresql working on it.

For connection pooling i'm using pgbouncer's latest version with
pool_size 20 (used 30 before, but now lowered) and 10k connections.

What parameters i should give more attention on?

Re: PostgreSQL 8.4 performance tuning questions

From
tv@fuzzy.cz
Date:
> Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
> before and everything worked fine for me.
>
> And now i have ~93% cpu load. Here's changed values of config:
>
> default_statistics_target = 50
> maintenance_work_mem = 1GB
> constraint_exclusion = on
> checkpoint_completion_target = 0.9
> effective_cache_size = 22GB
> work_mem = 192MB
> wal_buffers = 8MB
> checkpoint_segments = 16
> shared_buffers = 7680MB
> max_connections = 80
>
>
> My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
> postgresql working on it.
>
> For connection pooling i'm using pgbouncer's latest version with
> pool_size 20 (used 30 before, but now lowered) and 10k connections.
>
> What parameters i should give more attention on?
>

All the values seem quite reasonable to me. What about the _cost variables?

I guess one or more queries are evaluated using a different execution
plan, probably sequential scan instead of index scan, hash join instead of
merge join, or something like that.

Try to log the "slow" statements - see "log_min_statement_duration". That
might give you slow queries (although not necessarily the ones causing
problems), and you can analyze them.

What is the general I/O activity? Is there a lot of data read/written to
the disks, is there a lot of I/O wait?

regards
Tomas

PS: Was the database analyzed recently?


Re: PostgreSQL 8.4 performance tuning questions

From
tv@fuzzy.cz
Date:
> Unfortunately had to downgrade back to 8.3. Now having troubles with
> that and still solving them.
>
> For future upgrade, what is the basic steps?

1. create database
2. dump the data from the old database
3. load the data into the new database
4. analyze etc. (I prefer to do this manually at the beginning)
5. check that everything is working (that the correct execution plans are
used, etc.)

You may even run the (2) and (3) at once - use pipe instead of a file.

>
>>Was the database analyzed recently?
> Hm... there was smth like auto analyzer in serverlog when i started it
> first time, but i didn't mention that.
> Should I analyze whole db? How to do it?

Just execute 'ANALYZE' and the whole database will be analyzed, but when
the autovacuum daemon is running this should be performed automatically (I
guess - check the pg_stat_user_tables, there's information about last
manual/automatic vacuuming and/or analysis).

> And how should I change _cost variables?

I haven't noticed you've not modified those variables, so don't change them.

> I/O was very high. at first memory usage grew up and then began to full
> swap.

OK, this seems to be the cause. What were the original values of the
config variables? If you've lowered the work_mem and you need to sort a
lot of data, this may be a problem. What amounts of data are you working
with? If the data were not analyzed recently, the execution plans will be
inefficient and this may be the result.

regards
Tomas


Re: PostgreSQL 8.4 performance tuning questions

From
Rauan Maemirov
Date:
Unfortunately had to downgrade back to 8.3. Now having troubles with
that and still solving them.

For future upgrade, what is the basic steps?

>Was the database analyzed recently?
Hm... there was smth like auto analyzer in serverlog when i started it
first time, but i didn't mention that.
Should I analyze whole db? How to do it?

And how should I change _cost variables?

I/O was very high. at first memory usage grew up and then began to full swap.

2009/7/30  <tv@fuzzy.cz>:
>> Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune
>> before and everything worked fine for me.
>>
>> And now i have ~93% cpu load. Here's changed values of config:
>>
>> default_statistics_target = 50
>> maintenance_work_mem = 1GB
>> constraint_exclusion = on
>> checkpoint_completion_target = 0.9
>> effective_cache_size = 22GB
>> work_mem = 192MB
>> wal_buffers = 8MB
>> checkpoint_segments = 16
>> shared_buffers = 7680MB
>> max_connections = 80
>>
>>
>> My box is Nehalem 2xQuad 2.8 with RAM 32Gb, and there's only
>> postgresql working on it.
>>
>> For connection pooling i'm using pgbouncer's latest version with
>> pool_size 20 (used 30 before, but now lowered) and 10k connections.
>>
>> What parameters i should give more attention on?
>>
>
> All the values seem quite reasonable to me. What about the _cost variables?
>
> I guess one or more queries are evaluated using a different execution
> plan, probably sequential scan instead of index scan, hash join instead of
> merge join, or something like that.
>
> Try to log the "slow" statements - see "log_min_statement_duration". That
> might give you slow queries (although not necessarily the ones causing
> problems), and you can analyze them.
>
> What is the general I/O activity? Is there a lot of data read/written to
> the disks, is there a lot of I/O wait?
>
> regards
> Tomas
>
> PS: Was the database analyzed recently?
>
>

Re: PostgreSQL 8.4 performance tuning questions

From
Rauan Maemirov
Date:
> OK, this seems to be the cause. What were the original values of the
> config variables? If you've lowered the work_mem and you need to sort a
> lot of data, this may be a problem. What amounts of data are you working
> with? If the data were not analyzed recently, the execution plans will be
> inefficient and this may be the result.

The reason is that i'm using config that i used before and it worked
perfect. work_mem is 192mb.
I tried ANALYZE, but it didn't change anything.

Amounts of data... at least, backup is over ~2.2Gb.
I tried to use EXPLAIN ANALYZE for slow queries that i get from
serverlog, but it also didn't change anything.


2009/7/30  <tv@fuzzy.cz>:
>> Unfortunately had to downgrade back to 8.3. Now having troubles with
>> that and still solving them.
>>
>> For future upgrade, what is the basic steps?
>
> 1. create database
> 2. dump the data from the old database
> 3. load the data into the new database
> 4. analyze etc. (I prefer to do this manually at the beginning)
> 5. check that everything is working (that the correct execution plans are
> used, etc.)
>
> You may even run the (2) and (3) at once - use pipe instead of a file.
>
>>
>>>Was the database analyzed recently?
>> Hm... there was smth like auto analyzer in serverlog when i started it
>> first time, but i didn't mention that.
>> Should I analyze whole db? How to do it?
>
> Just execute 'ANALYZE' and the whole database will be analyzed, but when
> the autovacuum daemon is running this should be performed automatically (I
> guess - check the pg_stat_user_tables, there's information about last
> manual/automatic vacuuming and/or analysis).
>
>> And how should I change _cost variables?
>
> I haven't noticed you've not modified those variables, so don't change them.
>
>> I/O was very high. at first memory usage grew up and then began to full
>> swap.
>
> OK, this seems to be the cause. What were the original values of the
> config variables? If you've lowered the work_mem and you need to sort a
> lot of data, this may be a problem. What amounts of data are you working
> with? If the data were not analyzed recently, the execution plans will be
> inefficient and this may be the result.
>
> regards
> Tomas
>
>

Re: PostgreSQL 8.4 performance tuning questions

From
Steve Crawford
Date:
My additional comments:

tv@fuzzy.cz wrote:
...
For future upgrade, what is the basic steps?   
 
0. Create test database - work out bugs and performance issues before going live.
1. create database 
...cluster. You only need to create the individual database if the options you select for the dump do not create the database(s).
2. dump the data from the old database 
...using the dump tools from the *new* version. With several cores, you might want to consider using the binary dump options in pg_dump if you want to use the new parallel restore feature in pg_restore with a possible dramatic increase in restore speed (benchmarks I've seen suggest that with 8 cores you may even see an almost 8x restore speedup so it's worth the effort). The manual suggests that setting --jobs to the number of cores on the server is a good first approximation. See the -Fc options on pg_dump and the --jobs option in pg_restore for details.

Cheers,
Steve

Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Steve Crawford <scrawford@pinpointresearch.com> wrote:

> benchmarks I've seen suggest that with 8 cores you may even see an
> almost 8x restore speedup

I'm curious what sort of data in what environment showed that ratio.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Steve Crawford
Date:
Kevin Grittner wrote:
Steve Crawford <scrawford@pinpointresearch.com> wrote:  
benchmarks I've seen suggest that with 8 cores you may even see an
almost 8x restore speedup   
 
I'm curious what sort of data in what environment showed that ratio. 
Was going on memory from a presentation I watched. Reports on the web have shown anything from a 3x increase using 8 cores to other non-detailed reports of "up to" 8x improvement. If you have one big table, don't expect much if any improvement. If you have lots of smaller tables/indexes then parallel restore will probably benefit you. This is all based on the not-atypical assumption that your restore will be CPU bound. I don't think parallel restore will be much use beyond the point you hit IO limits.

Cheers,
Steve

Re: PostgreSQL 8.4 performance tuning questions

From
Matthew Wakeling
Date:
On Thu, 30 Jul 2009, Kevin Grittner wrote:
> Steve Crawford <scrawford@pinpointresearch.com> wrote:
>> benchmarks I've seen suggest that with 8 cores you may even see an
>> almost 8x restore speedup
>
> I'm curious what sort of data in what environment showed that ratio.

It depends on a lot of things. However, certainly for index creation,
tests on servers over here have indicated that running four "CREATE INDEX"
statements at the time runs four times as fast, assuming the table fits in
maintenance_work_mem.

Matthew

--
 I have an inferiority complex. But it's not a very good one.

Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Matthew Wakeling <matthew@flymine.org> wrote:

> tests on servers over here have indicated that running four "CREATE
> INDEX" statements at the time runs four times as fast, assuming the
> table fits in maintenance_work_mem.

I'm benchmarking a patch to the parallel restore, and just out of
curiosity I've been comparing the multi-job approach, with various
numbers of jobs, to a restore within a single database transaction;
and I'm seeing (on serious production-quality servers) the parallel
restore run in 55% to 75% of the time of a restore running off the
same dump file using the -1 switch.  The 16 processor machine got the
best results, running with anywhere from 12 to 20 jobs.  The 2
processor machine got the lesser benefit, running with 2 to 4 jobs.
(The exact number of jobs really didn't make a difference big enough
to emerge from the noise.)

I've got 431 user tables with 578 indexes in a database which, freshly
restored, is 70GB.  (That's 91GB with the fragmentation and reasonable
dead space we have in production.)  Real production data; nothing
synthetic.

Since the dump to custom format ran longer than the full pg_dump
piped directly to psql would have taken, the overall time to use this
technique is clearly longer for our databases on our hardware.  I'm
sure there are cases where people don't have the option to pipe things
through, or that there may sometime be a big enough savings in the
multiple jobs to pay off, even without overlapping the dump and
restore, and with the necessity to write and read the data an extra
time; but there are clearly situations where the piped approach is
faster.

We may want to try to characterize the conditions under which each is
a win, so we can better target our advice....

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Since the dump to custom format ran longer than the full pg_dump
> piped directly to psql would have taken, the overall time to use this
> technique is clearly longer for our databases on our hardware.

Hmmm ... AFAIR there isn't a good reason for dump to custom format to
take longer than plain text dump, except for applying compression.
Maybe -Z0 would be worth testing?  Or is the problem that you have to
write the data to a disk file rather than just piping it?

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hmmm ... AFAIR there isn't a good reason for dump to custom format
> to take longer than plain text dump, except for applying
> compression.  Maybe -Z0 would be worth testing?  Or is the problem
> that you have to write the data to a disk file rather than just
> piping it?

I'm not sure without benchmarking that.  I was writing to the same
RAID as the database I was dumping, so contention was probably a
significant issue.  But it would be interesting to compare different
permutations to see what impact each has alone and in combination.

I'm OK with setting up a benchmark run each night for a while, to
shake out what I can, on this and the artificial cases.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Since the dump to custom format ran longer than the full pg_dump
>> piped directly to psql would have taken, the overall time to use this
>> technique is clearly longer for our databases on our hardware.
>
> Hmmm ... AFAIR there isn't a good reason for dump to custom format to
> take longer than plain text dump, except for applying compression.
> Maybe -Z0 would be worth testing?  Or is the problem that you have to
> write the data to a disk file rather than just piping it?

I always dump with -Z0(and compress afterwards or even in a pipe to get
two cores busy) because otherwise custom dump times are simply ridiculous.
However Kevin is on something here - on the typical 4-8 core box I
tested I managed to an around cores/2 speedup for the restore which
means that for a pure upgrade or testing similiar to what kevin is doing
custom dumps + parallel restore might result in no win or even a loss.

On on of our datasets I did some benchmarking a while ago (for those who
attended bruce pg_migrator talk @pgcon these are same numbers):


* 150GB Database (on-disk - ~100GB as a plain text dump)

time to dump(-C0):                 120min
time to restore(single threaded):    180min
time to restore(-j 16):            59min

however the problem is that this does not actually mean that parallel
restore shaves you ~120min in dump/restore time because you get the
following real runtimes:

plain text dump + single threaded restore in a pipe: 188min
custom dump to file + parallel restore:    179min


this is without compression, with the default custom dump + parallel
restore is way slower than the simple approach on reasonable hardware.


Stefan

Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Since the dump to custom format ran longer than the full pg_dump
>> piped directly to psql would have taken, the overall time to use
>> this technique is clearly longer for our databases on our hardware.
>
> Hmmm ... AFAIR there isn't a good reason for dump to custom format
> to take longer than plain text dump, except for applying
> compression.  Maybe -Z0 would be worth testing?  Or is the problem
> that you have to write the data to a disk file rather than just
> piping it?

I did some checking with the DBA who normally copies these around for
development and test environments.  He confirmed that when the source
and target are on the same machine, a pg_dump piped to psql takes
about two hours.  If he pipes across the network, it runs more like
three hours.

My pg_dump to custom format ran for six hours.  The single-transaction
restore from that dump file took two hours, with both on the same
machine.  I can confirm with benchmarks, but this guy generally knows
what he's talking about (and we do create a lot of development and
test databases this way).

Either the compression is tripling the dump time, or there is
something inefficient about how pg_dump writes to the disk.

All of this is on a RAID 5 array with 5 drives using xfs with
noatime,nobarrier and a 256MB BBU controller.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Stefan Kaltenbrunner
Date:
Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> Since the dump to custom format ran longer than the full pg_dump
>>> piped directly to psql would have taken, the overall time to use
>>> this technique is clearly longer for our databases on our hardware.
>> Hmmm ... AFAIR there isn't a good reason for dump to custom format
>> to take longer than plain text dump, except for applying
>> compression.  Maybe -Z0 would be worth testing?  Or is the problem
>> that you have to write the data to a disk file rather than just
>> piping it?
>
> I did some checking with the DBA who normally copies these around for
> development and test environments.  He confirmed that when the source
> and target are on the same machine, a pg_dump piped to psql takes
> about two hours.  If he pipes across the network, it runs more like
> three hours.
>
> My pg_dump to custom format ran for six hours.  The single-transaction
> restore from that dump file took two hours, with both on the same
> machine.  I can confirm with benchmarks, but this guy generally knows
> what he's talking about (and we do create a lot of development and
> test databases this way).
>
> Either the compression is tripling the dump time, or there is
> something inefficient about how pg_dump writes to the disk.

seems about right - compression in pg_dump -Fc is a serious bottleneck
and unless can significantly speed it up or make it use of multiple
cores (either for the dump itself - which would be awsome - or for the
compression) I would recommend to not use it at all.


Stefan

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:

On 7/30/09 11:14 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> Since the dump to custom format ran longer than the full pg_dump
>>> piped directly to psql would have taken, the overall time to use
>>> this technique is clearly longer for our databases on our hardware.
>>
>> Hmmm ... AFAIR there isn't a good reason for dump to custom format
>> to take longer than plain text dump, except for applying
>> compression.  Maybe -Z0 would be worth testing?  Or is the problem
>> that you have to write the data to a disk file rather than just
>> piping it?
>
> I did some checking with the DBA who normally copies these around for
> development and test environments.  He confirmed that when the source
> and target are on the same machine, a pg_dump piped to psql takes
> about two hours.  If he pipes across the network, it runs more like
> three hours.
>
> My pg_dump to custom format ran for six hours.  The single-transaction
> restore from that dump file took two hours, with both on the same
> machine.  I can confirm with benchmarks, but this guy generally knows
> what he's talking about (and we do create a lot of development and
> test databases this way).
>
> Either the compression is tripling the dump time, or there is
> something inefficient about how pg_dump writes to the disk.
>
> All of this is on a RAID 5 array with 5 drives using xfs with
> noatime,nobarrier and a 256MB BBU controller.
>

Of course Compression has a HUGE effect if your I/O system is half-decent.
Max GZIP compression speed with the newest Intel CPU's is something like
50MB/sec (it is data dependant, obviously -- it is usually closer to
30MB/sec).  Max gzip decompression ranges from 50 to 150MB/sec (it can get
really high only if the ratio is extremely large, like if you compress a
repeating sequence of 256 bytes).

The new parallel restore is nice and all, but we're still limited by the
week it takes to dump the whole thing compressed.   Parallel restore is a
lot faster when restoring compressed dumps though, even without any indexes
to make, since all that decompression is CPU hungry.

> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Scott Carey <scott@richrelevance.com> wrote:

> Max GZIP compression speed with the newest Intel CPU's is something
> like 50MB/sec (it is data dependant, obviously -- it is usually
> closer to 30MB/sec).

Applying 30MB/sec to the 70GB accounts for 40 minutes.  If those
numbers are good, there's something else at play here.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:
On 7/30/09 11:14 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote:

> Tom Lane wrote:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> Since the dump to custom format ran longer than the full pg_dump
>>> piped directly to psql would have taken, the overall time to use this
>>> technique is clearly longer for our databases on our hardware.
>>
>> Hmmm ... AFAIR there isn't a good reason for dump to custom format to
>> take longer than plain text dump, except for applying compression.
>> Maybe -Z0 would be worth testing?  Or is the problem that you have to
>> write the data to a disk file rather than just piping it?
>
> I always dump with -Z0(and compress afterwards or even in a pipe to get
> two cores busy) because otherwise custom dump times are simply ridiculous.
> However Kevin is on something here - on the typical 4-8 core box I
> tested I managed to an around cores/2 speedup for the restore which
> means that for a pure upgrade or testing similiar to what kevin is doing
> custom dumps + parallel restore might result in no win or even a loss.
>
> On on of our datasets I did some benchmarking a while ago (for those who
> attended bruce pg_migrator talk @pgcon these are same numbers):
>
>
> * 150GB Database (on-disk - ~100GB as a plain text dump)
>
> time to dump(-C0):                              120min
> time to restore(single threaded):       180min
> time to restore(-j 16):                 59min


Note also that with ext3 and XFS (untuned) parallel restore = HORRIBLY
FRAGMENTED tables, to the point of sequential scans being rather slow.  At
least, they're mostly just interleaved with each other so there is little
seeking backwards, but still... Beware.

XFS with allocsize=64m or so interleaves them in reasonably large chunks
though and prevents significant fragmentation.

>
> however the problem is that this does not actually mean that parallel
> restore shaves you ~120min in dump/restore time because you get the
> following real runtimes:
>
> plain text dump + single threaded restore in a pipe: 188min
> custom dump to file + parallel restore: 179min

On the other hand, I find that the use case where one DB is dumped to a
backup, and then this backup is restored on several others -- that parallel
restore is extremely useful there.

Dump needs to be parallelized or at least pipelined to use more cores.  COPY
on one thread, compression on another?

One trick with a dump, that works only if you have tables or schemas that
can safely dump in different transactions, is to dump concurrently on
different slices of the DB manually.  This makes a huge difference if that
is possible.

>
>
> this is without compression, with the default custom dump + parallel
> restore is way slower than the simple approach on reasonable hardware.
>
>
> Stefan
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:


On 7/30/09 11:24 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote:

> Kevin Grittner wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>>> Since the dump to custom format ran longer than the full pg_dump
>>>> piped directly to psql would have taken, the overall time to use
>>>> this technique is clearly longer for our databases on our hardware.
>>> Hmmm ... AFAIR there isn't a good reason for dump to custom format
>>> to take longer than plain text dump, except for applying
>>> compression.  Maybe -Z0 would be worth testing?  Or is the problem
>>> that you have to write the data to a disk file rather than just
>>> piping it?
>>
>> I did some checking with the DBA who normally copies these around for
>> development and test environments.  He confirmed that when the source
>> and target are on the same machine, a pg_dump piped to psql takes
>> about two hours.  If he pipes across the network, it runs more like
>> three hours.
>>
>> My pg_dump to custom format ran for six hours.  The single-transaction
>> restore from that dump file took two hours, with both on the same
>> machine.  I can confirm with benchmarks, but this guy generally knows
>> what he's talking about (and we do create a lot of development and
>> test databases this way).
>>
>> Either the compression is tripling the dump time, or there is
>> something inefficient about how pg_dump writes to the disk.
>
> seems about right - compression in pg_dump -Fc is a serious bottleneck
> and unless can significantly speed it up or make it use of multiple
> cores (either for the dump itself - which would be awsome - or for the
> compression) I would recommend to not use it at all.
>

That's not an option when a dump compressed is 200GB and uncompressed is
1.3TB, for example.


>
> Stefan
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: PostgreSQL 8.4 performance tuning questions

From
Arjen van der Meijden
Date:
On 30-7-2009 20:46 Scott Carey wrote:
> Of course Compression has a HUGE effect if your I/O system is half-decent.
> Max GZIP compression speed with the newest Intel CPU's is something like
> 50MB/sec (it is data dependant, obviously -- it is usually closer to
> 30MB/sec).  Max gzip decompression ranges from 50 to 150MB/sec (it can get
> really high only if the ratio is extremely large, like if you compress a
> repeating sequence of 256 bytes).

I just ran some quick numbers on our lightly loaded Nehalem X5570 (2.93+
Ghz depending on turbo-mode). I compressed a 192MB text file I had at
hand using gzip -1, -2, -3, -6 and -9 and outputted its results to
/dev/null. The file was in the kernels file cache all the time and I did
the tests 3 times.

Gzip -1 reached 54MB/s, -2 got 47MB/s, -3 got 32MB/s, -6 got 18MB/s and
-9 got to 12MB/s. Just running cat on the file made it do 6400MB/s (i.e.
it took 0.030 seconds to copy the file from memory to nowhere).
Those files where respectively 69MB, 66MB, 64MB, 59MB and 58MB.

Gunzip on the -1 file took 1.66 seconds, i.e. it read data at 41MB/s and
outputted it to /dev/null at 115MB/s. The -9 file took 1.46s, so it read
40MB/s and wrote 131MB/s.

Best regards,

Arjen

Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
Scott Carey <scott@richrelevance.com> writes:
> Dump needs to be parallelized or at least pipelined to use more cores.  COPY
> on one thread, compression on another?

We already do that (since compression happens on the pg_dump side).

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:


On 7/30/09 11:58 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> Scott Carey <scott@richrelevance.com> wrote:
>
>> Max GZIP compression speed with the newest Intel CPU's is something
>> like 50MB/sec (it is data dependant, obviously -- it is usually
>> closer to 30MB/sec).
>
> Applying 30MB/sec to the 70GB accounts for 40 minutes.  If those
> numbers are good, there's something else at play here.

It is rather data dependant, try gzip on command line as a test on some
data.  On a random tarball on my Nehalem system, I just got  23MB/sec
compression rate on an uncompressable file.
Decompression with gunzip was 145MB/sec.

On a text file that I manually created with randommly placed repeating
segments that compresses 200x to 1,  compression was 115MB/sec (bytes in per
sec), and decompression (bytes out per sec) was 265MB/sec.

The array in this machine will do 800MB/sec reads/sec with 'dd' and
700MB/sec writes.

One core has no chance.


Now, what needs to be known with the pg_dump is not just how fast
compression can go (assuming its gzip) but also what the duty cycle time of
the compression is.  If it is single threaded, there is all the network and
disk time to cut out of this, as well as all the CPU time that pg_dump does
without compression.

> -Kevin
>


Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:
On 7/30/09 1:15 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Scott Carey <scott@richrelevance.com> writes:
>> Dump needs to be parallelized or at least pipelined to use more cores.  COPY
>> on one thread, compression on another?
>
> We already do that (since compression happens on the pg_dump side).
>
>                         regards, tom lane
>

Well, that isn't what I meant.  pg_dump uses CPU outside of compression
doing various things, If that Cpu is 10% as much as the compression, then
splitting them up would yield ~10% gain when CPU bound.


Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Scott Carey <scott@richrelevance.com> wrote:

> Now, what needs to be known with the pg_dump is not just how fast
> compression can go (assuming its gzip) but also what the duty cycle
> time of the compression is.  If it is single threaded, there is all
> the network and disk time to cut out of this, as well as all the CPU
> time that pg_dump does without compression.

Well, I established a couple messages back on this thread that pg_dump
piped to psql to a database on the same machine writes the 70GB
database to disk in two hours, while pg_dump to a custom format file
at default compression on the same machine writes the 50GB file in six
hours.  No network involved, less disk space written.  I'll try it
tonight at -Z0.

One thing I've been wondering about is what, exactly, is compressed in
custom format.  Is it like a .tar.gz file, where the compression is a
layer over the top, or are individual entries compressed?  If the
latter, what's the overhead on setting up each compression stream?  Is
there some minimum size before that kicks in?  (I know, I should go
check the code myself.  Maybe in a bit.  Of course, if someone already
knows, it would be quicker....)

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:
On 7/30/09 1:58 PM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> Scott Carey <scott@richrelevance.com> wrote:
>
>> Now, what needs to be known with the pg_dump is not just how fast
>> compression can go (assuming its gzip) but also what the duty cycle
>> time of the compression is.  If it is single threaded, there is all
>> the network and disk time to cut out of this, as well as all the CPU
>> time that pg_dump does without compression.
>
> Well, I established a couple messages back on this thread that pg_dump
> piped to psql to a database on the same machine writes the 70GB
> database to disk in two hours, while pg_dump to a custom format file
> at default compression on the same machine writes the 50GB file in six
> hours.  No network involved, less disk space written.  I'll try it
> tonight at -Z0.

So, I'm not sure what the pg_dump custom format overhead is minus the
compression -- there is probably some non-compression overhead from that
format other than the compression.

-Z1 might be interesting too, but obviously it takes some time.  Interesting
that your uncompressed case is only 40% larger.  For me, the compressed dump
is in the range of 20% the size of the uncompressed one.


>
> One thing I've been wondering about is what, exactly, is compressed in
> custom format.  Is it like a .tar.gz file, where the compression is a
> layer over the top, or are individual entries compressed?

It is instructive to open up a compressed custom format file in 'less' or
another text viewer.

Basically, it is the same as the uncompressed dump with all the DDL
uncompressed, but the binary chunks compressed.  It would seem (educated
guess, looking at the raw file, and not the code) that the table data is
compressed and the DDL points to an index in the file where the compressed
blob for the copy lives.


> If the
> latter, what's the overhead on setting up each compression stream?  Is
> there some minimum size before that kicks in?  (I know, I should go
> check the code myself.  Maybe in a bit.  Of course, if someone already
> knows, it would be quicker....)

Gzip does have some quirky performance behavior depending on the chunk size
of data you stream into it.

>
> -Kevin
>


Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Scott Carey <scott@richrelevance.com> wrote:

> Gzip does have some quirky performance behavior depending on the
> chunk size of data you stream into it.

Yeah, I've run into that before.  If we're sending each individual
datum to a gzip function rather than waiting until we've got a
decent-size buffer, that could explain it.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> One thing I've been wondering about is what, exactly, is compressed in
> custom format.  Is it like a .tar.gz file, where the compression is a
> layer over the top, or are individual entries compressed?

Individual entries.  Eyeball examination of a dump file shows that we
only compress table-data entries, and don't for example waste time
firing up the compressor to process a function body.  It's possible
that it'd be worth trying to have some lower limit on the amount of
data in a table before we bother to compress it, but I bet that it
wouldn't make any difference on your databases ...

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
Scott Carey <scott@richrelevance.com> writes:
> Gzip does have some quirky performance behavior depending on the chunk size
> of data you stream into it.

Can you enlarge on that comment?  I'm not sure that pg_dump is aware
that there's anything to worry about there.

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Carey <scott@richrelevance.com> writes:
>> Gzip does have some quirky performance behavior depending on the
>> chunk size of data you stream into it.
>
> Can you enlarge on that comment?  I'm not sure that pg_dump is aware
> that there's anything to worry about there.

If the library used here is anything like the native library used by
Java, it'd be worth putting a buffer layer ahead of the calls to gzip,
so it isn't dealing with each individual value as a separate call.  I
seem to remember running into that issue in Java, where throwing a
BufferedOutputStream in there fixed the performance issue.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Rauan Maemirov
Date:
Hey guyz, thanks for help. I solved the problems. The reason was in
bad query, that i've accidentally committed right after upgrading.
PostgreSQL 8.4 is perfect! Analyze works like a charm, and MUCH better
than in 8.3.

2009/7/31 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Scott Carey <scott@richrelevance.com> writes:
>>> Gzip does have some quirky performance behavior depending on the
>>> chunk size of data you stream into it.
>>
>> Can you enlarge on that comment?  I'm not sure that pg_dump is aware
>> that there's anything to worry about there.
>
> If the library used here is anything like the native library used by
> Java, it'd be worth putting a buffer layer ahead of the calls to gzip,
> so it isn't dealing with each individual value as a separate call.  I
> seem to remember running into that issue in Java, where throwing a
> BufferedOutputStream in there fixed the performance issue.
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:
On 7/30/09 2:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Scott Carey <scott@richrelevance.com> writes:
>> Gzip does have some quirky performance behavior depending on the chunk size
>> of data you stream into it.
>
> Can you enlarge on that comment?  I'm not sure that pg_dump is aware
> that there's anything to worry about there.
>
>                         regards, tom lane
>

For example, one of the things that gzip does is calculate the crc of the
item being compressed.  Calculating that incrementally is less efficient
than doing it in bulk.
For whatever reason, some other internals of gzip tend to perform much
better if submitting say, 4k or 8k or 16k chunks rather than little bits at
a time.  But I'm sure some of that also depends on what library you're using
since they all vary somewhat.


Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
Scott Carey <scott@richrelevance.com> writes:
> On 7/30/09 2:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> Scott Carey <scott@richrelevance.com> writes:
>>> Gzip does have some quirky performance behavior depending on the chunk size
>>> of data you stream into it.
>>
>> Can you enlarge on that comment?  I'm not sure that pg_dump is aware
>> that there's anything to worry about there.

> For whatever reason, some other internals of gzip tend to perform much
> better if submitting say, 4k or 8k or 16k chunks rather than little bits at
> a time.  But I'm sure some of that also depends on what library you're using
> since they all vary somewhat.

AFAIK there is only one widely-used implementation of zlib, and it
hasn't changed much in a long time.

I did some tracing and verified that pg_dump passes data to deflate()
one table row at a time.  I'm not sure about the performance
implications of that, but it does seem like it might be something to
look into.

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
Greg Stark
Date:
On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I did some tracing and verified that pg_dump passes data to deflate()
> one table row at a time.  I'm not sure about the performance
> implications of that, but it does seem like it might be something to
> look into.

I suspect if this was a problem the zlib people would have added
internal buffering ages ago. I find it hard to believe we're not the
first application to use it this way.

I suppose it wouldn't be the first time a problem like this went
unfixed though. Is the zlib software actively maintained or was your
earlier comment implying it's currently an orphaned codebase?


--
greg
http://mit.edu/~gsstark/resume.pdf

Re: PostgreSQL 8.4 performance tuning questions

From
"Kevin Grittner"
Date:
Greg Stark <gsstark@mit.edu> wrote:

> I suspect if this was a problem the zlib people would have added
> internal buffering ages ago. I find it hard to believe we're not the
> first application to use it this way.

I think that most uses of this library are on entire files or streams.
They may have felt that adding another layer of buffering would just
hurt performance for the typical use case, and anyone using it in some
other way could always use their own buffering layer.  In Java adding
that layer took 30 characters of code, so it didn't make a very big
impression on me -- it took a while to even remember I'd had to do it.

-Kevin

Re: PostgreSQL 8.4 performance tuning questions

From
Greg Smith
Date:
On Thu, 30 Jul 2009, Rauan Maemirov wrote:

> maintenance_work_mem = 1GB
> work_mem = 192MB
> shared_buffers = 7680MB
> max_connections = 80
> My box is Nehalem 2xQuad 2.8 with RAM 32Gb

While it looks like you sorted out your issue downthread, I wanted to
point out that your setting for work_mem could be dangerously high here
and contribute to problems with running out memory or using swap.  If each
of your 80 clients was doing a sort at the same time, you'd be using 80 *
192MB + 7680MB = 15360GB of RAM just for the server.  The problem is that
each client could do multiple sorts, so usage might even got higher.
Unless you have a big data warehouse setup, more common work_mem settings
are in the 16-64MB range rather than going this high.  Just something to
keep an eye on if you find a lot of memory is being used by the database
processes.  I really need to refine the pgtune model to more carefully
account for this particular problem, it's a bit too aggressive here for
people who aren't proactively watching the server's RAM after changing the
settings.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Marlowe
Date:
On Thu, Jul 30, 2009 at 10:10 PM, Greg Smith<gsmith@gregsmith.com> wrote:
> On Thu, 30 Jul 2009, Rauan Maemirov wrote:
>
>> maintenance_work_mem = 1GB
>> work_mem = 192MB
>> shared_buffers = 7680MB
>> max_connections = 80
>> My box is Nehalem 2xQuad 2.8 with RAM 32Gb
>
> While it looks like you sorted out your issue downthread, I wanted to point
> out that your setting for work_mem could be dangerously high here and
> contribute to problems

The real danger here is that you can set up your pg server to fail
ONLY under heavy load, when it runs out of memory and goes into a swap
storm.  So, without proper load testing and profiling, you may not
know you're headed for danger until your server goes unresponsive
midday at the most critical of times.  And restarting it will just
lead to the same failure again as the clients all reconnect and pummel
your server.

Meanwhile, going from 192 to 16MB might result in a total slowdown
measured in a fraction of a percentage overall, and prevent this kind
of failure.

If there's one single request you can justify big work_mem for then
set it for just that one query.  It's not uncommon to have a reporting
user limited to a few connections and with "alter user reportinguser
set work_mem='512MB';" so that it can run fast but not deplete your
server's resources on accident during heavy load.

Re: PostgreSQL 8.4 performance tuning questions

From
Stefan Kaltenbrunner
Date:
Scott Carey wrote:
>
>
> On 7/30/09 11:24 AM, "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> wrote:
>
>> Kevin Grittner wrote:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>>>> Since the dump to custom format ran longer than the full pg_dump
>>>>> piped directly to psql would have taken, the overall time to use
>>>>> this technique is clearly longer for our databases on our hardware.
>>>> Hmmm ... AFAIR there isn't a good reason for dump to custom format
>>>> to take longer than plain text dump, except for applying
>>>> compression.  Maybe -Z0 would be worth testing?  Or is the problem
>>>> that you have to write the data to a disk file rather than just
>>>> piping it?
>>> I did some checking with the DBA who normally copies these around for
>>> development and test environments.  He confirmed that when the source
>>> and target are on the same machine, a pg_dump piped to psql takes
>>> about two hours.  If he pipes across the network, it runs more like
>>> three hours.
>>>
>>> My pg_dump to custom format ran for six hours.  The single-transaction
>>> restore from that dump file took two hours, with both on the same
>>> machine.  I can confirm with benchmarks, but this guy generally knows
>>> what he's talking about (and we do create a lot of development and
>>> test databases this way).
>>>
>>> Either the compression is tripling the dump time, or there is
>>> something inefficient about how pg_dump writes to the disk.
>> seems about right - compression in pg_dump -Fc is a serious bottleneck
>> and unless can significantly speed it up or make it use of multiple
>> cores (either for the dump itself - which would be awsome - or for the
>> compression) I would recommend to not use it at all.
>>
>
> That's not an option when a dump compressed is 200GB and uncompressed is
> 1.3TB, for example.

yeah that was not meant as "don't use compression at all" but rather as
"use a different way to compress than what pg_dump provides internally".


Stefan

Re: PostgreSQL 8.4 performance tuning questions

From
Rauan Maemirov
Date:
That's true. I tried to lower work_mem from 192 to 64, and it caused
total slowdown.
By the way, is there any performance tips for tuning joins? I noticed,
that my joins on 8.4 slowed down, on 8.3 it was faster a bit.

2009/7/31 Scott Marlowe <scott.marlowe@gmail.com>:
> On Thu, Jul 30, 2009 at 10:10 PM, Greg Smith<gsmith@gregsmith.com> wrote:
>> On Thu, 30 Jul 2009, Rauan Maemirov wrote:
>>
>>> maintenance_work_mem = 1GB
>>> work_mem = 192MB
>>> shared_buffers = 7680MB
>>> max_connections = 80
>>> My box is Nehalem 2xQuad 2.8 with RAM 32Gb
>>
>> While it looks like you sorted out your issue downthread, I wanted to point
>> out that your setting for work_mem could be dangerously high here and
>> contribute to problems
>
> The real danger here is that you can set up your pg server to fail
> ONLY under heavy load, when it runs out of memory and goes into a swap
> storm.  So, without proper load testing and profiling, you may not
> know you're headed for danger until your server goes unresponsive
> midday at the most critical of times.  And restarting it will just
> lead to the same failure again as the clients all reconnect and pummel
> your server.
>
> Meanwhile, going from 192 to 16MB might result in a total slowdown
> measured in a fraction of a percentage overall, and prevent this kind
> of failure.
>
> If there's one single request you can justify big work_mem for then
> set it for just that one query.  It's not uncommon to have a reporting
> user limited to a few connections and with "alter user reportinguser
> set work_mem='512MB';" so that it can run fast but not deplete your
> server's resources on accident during heavy load.
>

Re: PostgreSQL 8.4 performance tuning questions

From
Merlin Moncure
Date:
On Thu, Jul 30, 2009 at 10:07 AM, Rauan Maemirov<rauan@maemirov.com> wrote:
> Unfortunately had to downgrade back to 8.3. Now having troubles with
> that and still solving them.
>
> For future upgrade, what is the basic steps?
>
>>Was the database analyzed recently?
> Hm... there was smth like auto analyzer in serverlog when i started it
> first time, but i didn't mention that.
> Should I analyze whole db? How to do it?
>
> And how should I change _cost variables?
>
> I/O was very high. at first memory usage grew up and then began to full swap.

There is at least one known case of memory leak 8.4.0.  Possibly you
got hit by that or another early adopter bug.  I think in your case
it's probably to soon to have upgraded...with 10k connections even
minor annoyances can be real nasty.

I'd wait a few months while in the meantime stage your app on a
testing database and double check the important query plans to make
sure there are no big performance regressions.  Each version of pg has
a couple for various reasons.

merlin

Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I did some tracing and verified that pg_dump passes data to deflate()
>> one table row at a time. �I'm not sure about the performance
>> implications of that, but it does seem like it might be something to
>> look into.

> I suspect if this was a problem the zlib people would have added
> internal buffering ages ago. I find it hard to believe we're not the
> first application to use it this way.

I dug into this a bit more.  zlib *does* have internal buffering --- it
has to, because it needs a minimum lookahead of several hundred bytes
to ensure that compression works properly.  The per-call overhead of
deflate() looks a bit higher than one could wish when submitting short
chunks, but oprofile shows that "pg_dump -Fc" breaks down about like
this:

samples  %        image name               symbol name
1103922  74.7760  libz.so.1.2.3            longest_match
215433   14.5927  libz.so.1.2.3            deflate_slow
55368     3.7504  libz.so.1.2.3            compress_block
41715     2.8256  libz.so.1.2.3            fill_window
17535     1.1878  libc-2.9.so              memcpy
13663     0.9255  libz.so.1.2.3            adler32
4613      0.3125  libc-2.9.so              _int_malloc
2942      0.1993  libc-2.9.so              free
2552      0.1729  libc-2.9.so              malloc
2155      0.1460  libz.so.1.2.3            pqdownheap
2128      0.1441  libc-2.9.so              _int_free
1702      0.1153  libz.so.1.2.3            deflate
1648      0.1116  libc-2.9.so              mempcpy

longest_match is the core lookahead routine and is not going to be
affected by submission sizes, because it isn't called unless adequate
data (ie, the longest possible match length) is available in zlib's
internal buffer.  It's possible that doing more buffering on our end
would reduce the deflate_slow component somewhat, but it looks like
the most we could hope to get that way is in the range of 10% speedup.
So I'm wondering if anyone can provide concrete evidence of large
wins from buffering zlib's input.

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
PFC
Date:
On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Greg Stark <gsstark@mit.edu> writes:
>> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> I did some tracing and verified that pg_dump passes data to deflate()
>>> one table row at a time.  I'm not sure about the performance
>>> implications of that, but it does seem like it might be something to
>>> look into.
>
>> I suspect if this was a problem the zlib people would have added
>> internal buffering ages ago. I find it hard to believe we're not the
>> first application to use it this way.
>
> I dug into this a bit more.  zlib *does* have internal buffering --- it
> has to, because it needs a minimum lookahead of several hundred bytes
> to ensure that compression works properly.  The per-call overhead of
> deflate() looks a bit higher than one could wish when submitting short
> chunks, but oprofile shows that "pg_dump -Fc" breaks down about like
> this:

    During dump (size of dump is 2.6 GB),

No Compression :
- postgres at 70-100% CPU and pg_dump at something like 10-20%
- dual core is useful (a bit...)
- dump size 2.6G
- dump time 2m25.288s

Compression Level 1 :
- postgres at 70-100% CPU and pg_dump at 20%-100%
- dual core is definitely useful
- dump size 544MB
- dump time 2m33.337s

Since this box is mostly idle right now, eating CPU for compression is no
problem...

Adding an option to use LZO instead of gzip could be useful...

Compressing the uncompressed 2.6GB dump :

- gzip -1 :

- compressed size : 565 MB
- compression throughput : 28.5 MB/s
- decompression throughput : 74 MB/s

- LZO -1 :
- compressed size : 696M
- compression throughput : 86 MB/s
- decompression throughput : 247 MB/s

Conclusion : LZO could help for fast disks (RAID) or slow disks on a
CPU-starved server...

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:
On 7/31/09 4:01 PM, "PFC" <lists@peufeu.com> wrote:

> On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Greg Stark <gsstark@mit.edu> writes:
>>> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>>> I did some tracing and verified that pg_dump passes data to deflate()
>>>> one table row at a time.  I'm not sure about the performance
>>>> implications of that, but it does seem like it might be something to
>>>> look into.
>>
>>> I suspect if this was a problem the zlib people would have added
>>> internal buffering ages ago. I find it hard to believe we're not the
>>> first application to use it this way.
>>
>> I dug into this a bit more.  zlib *does* have internal buffering --- it
>> has to, because it needs a minimum lookahead of several hundred bytes
>> to ensure that compression works properly.  The per-call overhead of
>> deflate() looks a bit higher than one could wish when submitting short
>> chunks, but oprofile shows that "pg_dump -Fc" breaks down about like
>> this:
>
>         During dump (size of dump is 2.6 GB),
>
> No Compression :
> - postgres at 70-100% CPU and pg_dump at something like 10-20%
> - dual core is useful (a bit...)
> - dump size 2.6G
> - dump time 2m25.288s
>
> Compression Level 1 :
> - postgres at 70-100% CPU and pg_dump at 20%-100%
> - dual core is definitely useful
> - dump size 544MB
> - dump time 2m33.337s
>
> Since this box is mostly idle right now, eating CPU for compression is no
> problem...
>

I get very different (contradictory) behavior.  Server with fast RAID, 32GB
RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
8.3.6
No disk wait time during any test.  One test beforehand was used to prime
the disk cache.
100% CPU in the below means one core fully used.  800% means the system is
fully loaded.

pg_dump > file  (on a subset of the DB with lots of tables with small
tuples)
6m 27s, 4.9GB;  12.9MB/sec
50% CPU in postgres, 50% CPU in pg_dump

pg_dump -Fc > file.gz
9m6s, output is 768M  (6.53x compression); 9.18MB/sec
30% CPU in postgres, 70% CPU in pg_dump

pg_dump | gzip > file.2.gz
6m22s, 13MB/sec.
50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip

The default (5) compression level was used.

So, when using pg_dump alone, I could not get significantly more than one
core of CPU (all on the same box).  No matter how I tried, pg_dump plus the
postgres process dumping data always totaled about 102% -- it would
flulctuate in top, give or take 15% at times, but the two always were very
close (within 3%) of this total.

Piping the whole thing to gzip gets some speedup.  This indicates that
perhaps the implementation or use of gzip is inappropriate on pg_dump's side
or the library version is older or slower.  Alternatively, the use of gzip
inside pg_dump fails to pipeline CPU useage as well as piping it does, as
the above shows 50% more CPU utilization when piping.

I can do the same test with a single table that is 10GB later (which does
dump much faster than 13MB/sec and has rows that average about 500 bytes in
size).  But overall I have found pg_dump's performace sorely lacking, and
this is a data risk in the big picture.  Postgres is very good about not
losing data, but that only goes up to the limits of the hardware and OS,
which is not good enough.  Because of long disaster recovery times and poor
replication/contingency features, it is a fairly unsafe place for data once
it gets beyond a certain size and a BC plan requires minimal downtime.

> Adding an option to use LZO instead of gzip could be useful...
>
> Compressing the uncompressed 2.6GB dump :
>
> - gzip -1 :
>
> - compressed size : 565 MB
> - compression throughput : 28.5 MB/s
> - decompression throughput : 74 MB/s
>
> - LZO -1 :
> - compressed size : 696M
> - compression throughput : 86 MB/s
> - decompression throughput : 247 MB/s
>
> Conclusion : LZO could help for fast disks (RAID) or slow disks on a
> CPU-starved server...
>

LZO would be a great option, it is very fast, especially decompression.
With gzip, one rarely gains by going below gzip -3 or above gzip -6.


> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
Scott Carey <scott@richrelevance.com> writes:
> I get very different (contradictory) behavior.  Server with fast RAID, 32GB
> RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
> 8.3.6
> No disk wait time during any test.  One test beforehand was used to prime
> the disk cache.
> 100% CPU in the below means one core fully used.  800% means the system is
> fully loaded.

> pg_dump > file  (on a subset of the DB with lots of tables with small
> tuples)
> 6m 27s, 4.9GB;  12.9MB/sec
> 50% CPU in postgres, 50% CPU in pg_dump

> pg_dump -Fc > file.gz
> 9m6s, output is 768M  (6.53x compression); 9.18MB/sec
> 30% CPU in postgres, 70% CPU in pg_dump

> pg_dump | gzip > file.2.gz
> 6m22s, 13MB/sec.
> 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip

I don't see anything very contradictory here.  What you're demonstrating
is that it's nice to be able to throw a third CPU at the compression
part of the problem.  That's likely to remain true if we shift to a
different compression algorithm.  I suspect if you substituted lzo for
gzip in the third case, the picture wouldn't change very much.

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:


On 8/3/09 11:56 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Scott Carey <scott@richrelevance.com> writes:
>> I get very different (contradictory) behavior.  Server with fast RAID, 32GB
>> RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
>> 8.3.6
>> No disk wait time during any test.  One test beforehand was used to prime
>> the disk cache.
>> 100% CPU in the below means one core fully used.  800% means the system is
>> fully loaded.
>
>> pg_dump > file  (on a subset of the DB with lots of tables with small
>> tuples)
>> 6m 27s, 4.9GB;  12.9MB/sec
>> 50% CPU in postgres, 50% CPU in pg_dump
>
>> pg_dump -Fc > file.gz
>> 9m6s, output is 768M  (6.53x compression); 9.18MB/sec
>> 30% CPU in postgres, 70% CPU in pg_dump
>
>> pg_dump | gzip > file.2.gz
>> 6m22s, 13MB/sec.
>> 50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip
>
> I don't see anything very contradictory here.

The other poster got nearly 2 CPUs of work from just pg_dump + postgres.
That contradicts my results (but could be due to data differences or
postgres version differences).
In the other use case, compression was not slower, but just used more CPU
(also contradicting my results).


> What you're demonstrating
> is that it's nice to be able to throw a third CPU at the compression
> part of the problem.

No, 1.5 CPU. A full use of a second would even be great.

I'm also demonstrating that there is some artificial bottleneck somewhere
preventing postgres and pg_dump to operate concurrently.  Instead, one waits
while the other does work.

Your claim earlier in this thread was that there was already pipelined work
being done due to pg_dump + postgresql -- which seems to be true for the
other test case but not mine.

As a consequence, adding compression throttles the postgres process even
though the compression hasn't caused 100% CPU (or close) on any task
involved.

>  That's likely to remain true if we shift to a
> different compression algorithm.  I suspect if you substituted lzo for
> gzip in the third case, the picture wouldn't change very much.
>

That is exactly the point.  LZO would be nice (and help mitigate this
problem), but it doesn't solve the real problem here.  Pg_dump is slow and
artificially throttles without even getting 100% CPU from itself or
postgres.

The problem still remains:  dumping with -Fc can be significantly slower
than raw piped to a compression utility, even if no task is CPU or I/O
bound. Dumping and piping to gzip is faster.  But parallel restore won't
work without custom or raw format.



>                         regards, tom lane
>


Re: PostgreSQL 8.4 performance tuning questions

From
PFC
Date:
> I get very different (contradictory) behavior.  Server with fast RAID,
> 32GB
> RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs.  CentOS 5.2
> 8.3.6

    That's a very different serup from my (much less powerful) box, so that
would explain it...

> No disk wait time during any test.  One test beforehand was used to prime
> the disk cache.
> 100% CPU in the below means one core fully used.  800% means the system
> is
> fully loaded.
>
> pg_dump > file  (on a subset of the DB with lots of tables with small
> tuples)
> 6m 27s, 4.9GB;  12.9MB/sec
> 50% CPU in postgres, 50% CPU in pg_dump

    If there is no disk wait time, then why do you get 50/50 and not 100/100
or at least 1 core maxed out ? That's interesting...

COPY annonces TO '/dev/null';
COPY 413526
Temps : 13871,093 ms

\copy annonces to '/dev/null'
Temps : 14037,946 ms

time pg_dump -Fc -t annonces -U annonces --compress=0 annonces >/dev/null
real    0m14.596s
user    0m0.700s
sys     0m0.372s

    In all 3 cases postgres maxes out one core (I've repeated the test until
all data was cached, so there is no disk access at all in vmstat).
    Size of dump is 312MB.




Re: PostgreSQL 8.4 performance tuning questions

From
Merlin Moncure
Date:
On Mon, Aug 3, 2009 at 2:56 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I don't see anything very contradictory here.  What you're demonstrating
> is that it's nice to be able to throw a third CPU at the compression
> part of the problem.  That's likely to remain true if we shift to a
> different compression algorithm.  I suspect if you substituted lzo for
> gzip in the third case, the picture wouldn't change very much.

lzo is much, much, (much) faster than zlib.  Note, I've tried several
times to contact the author to get clarification on licensing terms
and have been unable to get a response.

[root@devdb merlin]# time lzop -c dump.sql > /dev/null

real    0m16.683s
user    0m15.573s
sys    0m0.939s
[root@devdb merlin]# time gzip -c dump.sql > /dev/null

real    3m43.090s
user    3m41.471s
sys    0m1.036s

merlin

Re: PostgreSQL 8.4 performance tuning questions

From
PFC
Date:
> lzo is much, much, (much) faster than zlib.  Note, I've tried several

decompression speed is even more awesome...

> times to contact the author to get clarification on licensing terms
> and have been unable to get a response.

lzop and the LZO library are distributed under the terms of the GNU
General Public License (GPL).
source : http://www.lzop.org/lzop_man.php

Re: PostgreSQL 8.4 performance tuning questions

From
Merlin Moncure
Date:
On Mon, Aug 3, 2009 at 5:30 PM, PFC<lists@peufeu.com> wrote:
>
>> lzo is much, much, (much) faster than zlib.  Note, I've tried several
>
> decompression speed is even more awesome...
>
>> times to contact the author to get clarification on licensing terms
>> and have been unable to get a response.
>
> lzop and the LZO library are distributed under the terms of the GNU General
> Public License (GPL).
> source : http://www.lzop.org/lzop_man.php

yeah...I have another project I'm working on that is closed source,
plus I was curious if something could be worked out for pg...lzo seems
ideal for database usage. The author is MIA or too busy hacking to
answer his email :-).

merlin

Re: PostgreSQL 8.4 performance tuning questions

From
Alvaro Herrera
Date:
Merlin Moncure escribió:
> On Mon, Aug 3, 2009 at 5:30 PM, PFC<lists@peufeu.com> wrote:
> >
> >> lzo is much, much, (much) faster than zlib.  Note, I've tried several
> >
> > decompression speed is even more awesome...
> >
> >> times to contact the author to get clarification on licensing terms
> >> and have been unable to get a response.
> >
> > lzop and the LZO library are distributed under the terms of the GNU General
> > Public License (GPL).
> > source : http://www.lzop.org/lzop_man.php
>
> yeah...I have another project I'm working on that is closed source,
> plus I was curious if something could be worked out for pg...lzo seems
> ideal for database usage.

I think this was already discussed here.  It turns out that a specific
exception for PG wouldn't be acceptable because of the multiple
commercial derivates.  LZO would have to become BSD, which presumably
the author just doesn't want to do.

Maybe we could have a --enable-lzo switch similar to what we do with
readline.  Of course, a non-LZO-enabled build would not be able to read
a dump from such a build.  (We could also consider LZO for TOAST
compression).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:
On 8/4/09 8:30 AM, "Alvaro Herrera" <alvherre@commandprompt.com> wrote:

> Merlin Moncure escribió:
>> On Mon, Aug 3, 2009 at 5:30 PM, PFC<lists@peufeu.com> wrote:
>>>
>>>> lzo is much, much, (much) faster than zlib.  Note, I've tried several
>>>
>>> decompression speed is even more awesome...
>>>
>>>> times to contact the author to get clarification on licensing terms
>>>> and have been unable to get a response.
>>>
>>> lzop and the LZO library are distributed under the terms of the GNU General
>>> Public License (GPL).
>>> source : http://www.lzop.org/lzop_man.php
>>
>> yeah...I have another project I'm working on that is closed source,
>> plus I was curious if something could be worked out for pg...lzo seems
>> ideal for database usage.
>
> I think this was already discussed here.  It turns out that a specific
> exception for PG wouldn't be acceptable because of the multiple
> commercial derivates.  LZO would have to become BSD, which presumably
> the author just doesn't want to do.
>
> Maybe we could have a --enable-lzo switch similar to what we do with
> readline.  Of course, a non-LZO-enabled build would not be able to read
> a dump from such a build.  (We could also consider LZO for TOAST
> compression).
>

There are a handful of other compression algorithms very similar to LZO in
performance / compression level under various licenses.

LZO is just the best known and most widely used.

http://www.fastlz.org/  (MIT)
http://www.quicklz.com/  (GPL again)
http://oldhome.schmorp.de/marc/liblzf.html  (BSD -ish)

ZFS uses LZJB (CDDL) source code here:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/
compress.c
(a good read for one of the most simple LZ compression algorithms in terms
of lines of code -- about 100 lines)

Fastlz, with its MIT license, is probably the most obvious choice.

> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: PostgreSQL 8.4 performance tuning questions

From
Tom Lane
Date:
Scott Carey <scott@richrelevance.com> writes:
> There are a handful of other compression algorithms very similar to LZO in
> performance / compression level under various licenses.
> LZO is just the best known and most widely used.

And after we get done with the license question, we need to ask about
patents.  The compression area is just a minefield of patents.  gzip is
known to avoid all older patents (and would be pretty solid prior art
against newer ones).  I'm far less confident about lesser-known systems.

            regards, tom lane

Re: PostgreSQL 8.4 performance tuning questions

From
Merlin Moncure
Date:
On Tue, Aug 4, 2009 at 4:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Scott Carey <scott@richrelevance.com> writes:
>> There are a handful of other compression algorithms very similar to LZO in
>> performance / compression level under various licenses.
>> LZO is just the best known and most widely used.
>
> And after we get done with the license question, we need to ask about
> patents.  The compression area is just a minefield of patents.  gzip is
> known to avoid all older patents (and would be pretty solid prior art
> against newer ones).  I'm far less confident about lesser-known systems.

I did a little bit of research.  LZO and friends are variants of LZW.
The main LZW patent died in 2003, and AFAIK there has been no patent
enforcement cases brought against LZO or it's cousins (LZO dates to
1996).  OK, I'm no attorney, etc, but the internet seems to believe
that the algorithms are patent free.  LZO is quite widely used, in
both open source and some relatively high profile commercial projects.

I downloaded the libraries and did some tests.
2.5 G sql dump:

compression time:
zlib: 4m 1s
lzo: 17s
fastlz: 28.8s
liblzf: 26.7s

compression size:
zlib: 609M 75%
lzo: 948M 62%
fastlz: 936M 62.5%
liblzf: 916M 63.5%

A couple of quick notes: liblzf produces (possibly) architecture
dependent archives according to its header, and fastlz is not declared
'stable' according to its website.

merlin

Re: PostgreSQL 8.4 performance tuning questions

From
Scott Carey
Date:


On 8/5/09 7:12 AM, "Merlin Moncure" <mmoncure@gmail.com> wrote:

> On Tue, Aug 4, 2009 at 4:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> Scott Carey <scott@richrelevance.com> writes:
>>> There are a handful of other compression algorithms very similar to LZO in
>>> performance / compression level under various licenses.
>>> LZO is just the best known and most widely used.
>>
>> And after we get done with the license question, we need to ask about
>> patents.  The compression area is just a minefield of patents.  gzip is
>> known to avoid all older patents (and would be pretty solid prior art
>> against newer ones).  I'm far less confident about lesser-known systems.
>
> I did a little bit of research.  LZO and friends are variants of LZW.
> The main LZW patent died in 2003, and AFAIK there has been no patent
> enforcement cases brought against LZO or it's cousins (LZO dates to
> 1996).  OK, I'm no attorney, etc, but the internet seems to believe
> that the algorithms are patent free.  LZO is quite widely used, in
> both open source and some relatively high profile commercial projects.
>

That doesn't sound right to me, LZW is patent protected in a few ways, and
is a LZ78 scheme.

LZO, zlib, and the others here are LZ77 schemes which avoid the LZW patents.
There are some other patents in the territory with respect to how the hash
lookups are done for the LZ77 'sliding window' approach.  Most notably,
using a tree is patented, and a couple other (obvious) tricks that are
generally avoided anyway for any algorithms that are trying to be fast
rather than produce the highest compression.

http://en.wikipedia.org/wiki/Lossless_data_compression#Historical_legal_issu
es
http://en.wikipedia.org/wiki/LZ77_and_LZ78
http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch
http://www.faqs.org/faqs/compression-faq/part1/section-7.html
http://www.ross.net/compression/patents.html

Note, US patents are either 17 years after grant, or 20 years after filing.
A very large chunk of those in this space have expired, but a few were
filed/granted in the early 90's -- though those are generally more specific
and easy to avoid.  Or very obvious duplicates of previous patents.

More notably, one of these, if interpreted broadly, would apply to zlib as
well (Gibson and Graybill) but the patent mentions LZRW1, and any broader
scope would have prior art conflicts with ones that are now long expired.
Its 17 years after grant on that, but not 20 years after filing.




> I downloaded the libraries and did some tests.
> 2.5 G sql dump:
>
> compression time:
> zlib: 4m 1s
> lzo: 17s
> fastlz: 28.8s
> liblzf: 26.7s
>
> compression size:
> zlib: 609M 75%
> lzo: 948M 62%
> fastlz: 936M 62.5%
> liblzf: 916M 63.5%
>

Interesting how that conflicts with some other benchmarks out there (where
LZO ad the others are about the same).  But, they're all an order of
magnitude faster than gzip/zlib.


> A couple of quick notes: liblzf produces (possibly) architecture
> dependent archives according to its header, and fastlz is not declared
> 'stable' according to its website.
>


> merlin
>