Thread: Slow dump with pg_dump/pg_restore ? How to improve ?

Slow dump with pg_dump/pg_restore ? How to improve ?

From
Soeren Gerlach
Date:
Hi,

some weeks ago I started to develop an application using Postgresql the
first time. I'm running 7.4.3 on a Linux box with a plain 2.6.7 kernel, the
storage is handled by 5 SATA disks, managed by a 3ware controller and using
a xfs filesystem. The DB server is a 3 Ghz P4 with 4 Gig of Ram, so the
machine is quite fast for most purposes I need it; the DB server is a pure
DB server, theres no application running on it.

I dump and restore it like this:

pg_dump -v ziptrader -o -F t > foo
pg_restore -v -c -F t -d ziptrader_expost foo

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.

My main memory settings are:

shared_buffers = 30000
sort_mem = 1000
vacuum_mem = 16384


What is the best starting point to look at? I'm sure I'm doing something
badly wrong but neither the docs nor some older mails from this group seem
to offer a hint to me.



Thanks a lot,
Soeren Gerlach



Re: Slow dump with pg_dump/pg_restore ? How to improve

From
Richard Huxton
Date:
Soeren Gerlach wrote:
>
> Dumping needs to be done at least once a day (for copying as well as
> archiving it), while restoring to two other databases will be done twice a
> day. The dump is currently only 80 MB which I consider as very small (the
> expected growth of the database will be 20-30 MB a day later). Currently
> the database has just 6 tables whith 2 tables beeing responsible for 95% of
> the database size.
>
> So...the dump in the above format needs some 14 minutes, the restore 10
> minutes. This seems to be very slow as it means something like 100K/sec for
> dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
> not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
> CPU cycles as it's running at nearly 100% while dumping and restoring. But
> I frighten the day when the database grows >1Gig as it would take then
> hours to complete which is not acceptable to me.

This is never going to work for you. Increasing the sort_mem during the
restore will probably help, but if it takes too long now then it
certainly won't cope when you have 10 times as much data.

The CPU is probably maxed as pg_restore rebuilds your indexes etc.
You'll find the disk-space occupied by the live system will be much more
than your current 80MB.

I think you want to look at some form of replication. There are a number
of options, the newest being Slony (http://www.slony.org/).

--
   Richard Huxton
   Archonet Ltd

Re: Slow dump with pg_dump/pg_restore ? How to improve ?

From
Tom Lane
Date:
Soeren Gerlach <soeren@all-about-shift.com> writes:
> So...the dump in the above format needs some 14 minutes, the restore 10
> minutes. This seems to be very slow as it means something like 100K/sec for
> dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
> not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
> CPU cycles as it's running at nearly 100% while dumping and restoring.

What datatypes have you got in the large tables?  Also, what character
set encoding are you using?

The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink.  This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODING set in its environment).

Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?

            regards, tom lane

Re: Slow dump with pg_dump/pg_restore ? How to improve ?

From
Soeren Gerlach
Date:
> > So...the dump in the above format needs some 14 minutes, the restore 10
> > minutes. This seems to be very slow as it means something like 100K/sec
> > for dumping and restoring. The drive is cappable of 40 Meg/seconds, so
> > thats not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max
> > out the CPU cycles as it's running at nearly 100% while dumping and
> > restoring.
>
> What datatypes have you got in the large tables?  Also, what character
> set encoding are you using?

datestyle = 'ISO,European' ?

> The only reason I can think of for dump to be that slow is if conversion
> of the data to text is a big time sink.  This would involve the
> datatype's own output routine plus possibly a character set conversion.
> You should at least make sure that no character set conversion needs to
> happen (offhand I think this would only be an issue if pg_dump is
> invoked with PGCLIENTENCODING set in its environment).

This are the schemas of the two big tables mentioned:

-------------------------------------------------------------------------
CREATE TABLE public.agent_action_history
(
  aah_tag_id bigserial NOT NULL,
  aah_action_type int2 NOT NULL DEFAULT 0,
  aah_quantity float4 NOT NULL DEFAULT 0,
  aah_price float4 NOT NULL DEFAULT 0,
  aah_sim_flag bool NOT NULL DEFAULT false,
  aah_timestamp timestamp NOT NULL,
  aah_action_pl float4 NOT NULL DEFAULT 0
) WITHOUT OIDS;

CREATE TABLE public.tick_history
(
  tkh_id serial NOT NULL,
  tkh_cdt_id int8 NOT NULL,
  tkh_price float4 NOT NULL,
  tkh_price_type int2 NOT NULL,
  tkh_volume float4 NOT NULL,
  tkh_system_time timestamp(6) NOT NULL,
  tkh_exchange_time timestamp(6) NOT NULL,
  CONSTRAINT "PK_tick_history" PRIMARY KEY (tkh_id)
) WITHOUT OIDS;
-------------------------------------------------------------------------

> Also I trust you are using dump with the default COPY-style output,
> not dump-as-INSERTs?

Yes. I'm dumping using "-c -Fc -v" as options. I just rerun the whole:

 * Total rows: 904,000
 * Dumping plain format: 21min == 1260 sec == 717 rows/sec
 * Dumping "-c -Fc" format: 26min == 1560 sec == 580 rows/sec
 * pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
   dumps
 * the resulting "-Fc" dump is 1/10 of the size of the plain dump

It would be interesting to know from people with DBs > 1 Gig how fast they
can dump in terms of rows/second.

Another info: I just noticed that I've 7.4.1 running not 7.4.3. So I'll
retest this tomorrow again and provide you with numbers for 7.4.3 too.


I'm likely to install Sybase as a reference to Postgres next week. But I can
remember to got there something like 5-10,000 rows/sec some one year ago
when I tested a 11.5 version for this issue too.



Thanks,
Soeren Gerlach


Re: Slow dump with pg_dump/pg_restore ? How to improve ?

From
Tom Lane
Date:
Soeren Gerlach <soeren@all-about-shift.com> writes:
>  * pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
>    dumps

Really!?  Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller).  But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see.  There may be some simple performance
glitch involved there.  Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?

            regards, tom lane

Re: Slow dump with pg_dump/pg_restore ? How to improve ?

From
Soeren Gerlach
Date:
> Soeren Gerlach <soeren@all-about-shift.com> writes:
> >  * pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
> > both dumps
>
> Really!?  Hmm, that seems fairly backwards ... thinks ...
>
> In the -Fc case this makes some amount of sense because pg_dump runs
> gzip-style compression on the data (which is why the output file is
> so much smaller).  But for plain text dump, pg_dump should be just
> pushing the data straight through to stdout; it really ought not take
> much CPU as far as I can see.  There may be some simple performance
> glitch involved there.  Are you interested in recompiling with -pg
> and getting a gprof profile of pg_dump?

Yes I'am but I'm a little short on time ,-)) In fact I'm glad to d'l a
ready-to-run archive for Debian Woody. In two weeks I'll have some time to
check this issue with my own compiled versions, until then I'm just
interested ,-))
Today I upgraded to 7.4.3 from 7.4.1 but this did not change anything real.
Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?


Regards,
Soeren


Re: [real] Re: Slow dump with pg_dump/pg_restore ? How

From
Don Isgitt
Date:

Soeren Gerlach wrote:

>>Soeren Gerlach <soeren@all-about-shift.com> writes:
>>
>>
>>> * pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
>>>both dumps
>>>
>Do you have numbers in respect to speed (rows per second) for comparison
>available. I.e. dump on a single CPU machine which quite fast drives?
>
>
>Regards,
>Soeren
>
>
>
>
Hi Soeren,

I just used pg_dump on my database with your settings to give you a
reference point; however, my machine is a quad-Xeon, otherwise
comparable to yours. (2G ram, 15K SCSI) Top showed nominal CPU usage of
~20-25% during the run, so that agrees with your high cpu utilization.
But, the dump took 140 secs (elapsed time) for a 2.5GB (on disk) db. The
.dmp file was 650MB, and total rows was ~3M. So, that makes ~20000
rows/sec, which certainly suggests something is amiss with your hardware
or software or configuration or ...

Don

p.s. I am running PgSQL 7.4 on Red Hat 3.2.3 (Linux kernel 2.4.21-9 SMP)