Thread: restore time: sort_mem vs. checkpoing_segments

restore time: sort_mem vs. checkpoing_segments

From
Vivek Khera
Date:
And the winner is... checkpoint_segments.

Restore of a significanly big database (~19.8GB restored) shows nearly
no time difference depending on sort_mem when checkpoint_segments is
large.  There are quite a number of tables and indexes.  The restore
was done from a pg_dump -Fc dump of one database.

All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG
7.4b2 on FreeBSD 4.8.

3 checkpoint_segments restore time: 14983 seconds
50 checkpoint_segments restore time: 11537 seconds
50 checkpoint_segments, sort_mem 131702 restore time: 11262 seconds

There's an initdb between each test.

For reference, the restore with 8k page size, 60k buffers, 8192
sort_mem and 3 checkpoint buffers was 14777 seconds.

It seems for restore that a larger number of checkpoint buffers is the
key, especially when dealing with large numbers of rows in a table.

I notice during the restore that the disk throughput triples during
the checkpoint.

The postgres data partition is on a 14-spindle hardware RAID5 on U320
SCSI disks.

Re: restore time: sort_mem vs. checkpoing_segments

From
Josh Berkus
Date:
Vivek,

> And the winner is... checkpoint_segments.
>
> Restore of a significanly big database (~19.8GB restored) shows nearly
> no time difference depending on sort_mem when checkpoint_segments is
> large.  There are quite a number of tables and indexes.  The restore
> was done from a pg_dump -Fc dump of one database.

Cool!   Thank you for posting this.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: restore time: sort_mem vs. checkpoing_segments

From
Tom Lane
Date:
Vivek Khera <khera@kcilink.com> writes:
> Restore of a significanly big database (~19.8GB restored) shows nearly
> no time difference depending on sort_mem when checkpoint_segments is
> large.  There are quite a number of tables and indexes.  The restore
> was done from a pg_dump -Fc dump of one database.

I was just bugging Marc for some useful data, so I'll ask you too:
could you provide a trace of the pg_restore execution?  log_statement
plus log_duration output would do it.  I am curious to understand
exactly which steps in the restore are significant time sinks.

> I notice during the restore that the disk throughput triples during
> the checkpoint.

Hm, better make sure the log includes some indication of when
checkpoints happen.

            regards, tom lane

Re: restore time: sort_mem vs. checkpoing_segments

From
Vivek Khera
Date:
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> I was just bugging Marc for some useful data, so I'll ask you too:
TL> could you provide a trace of the pg_restore execution?  log_statement
TL> plus log_duration output would do it.  I am curious to understand
TL> exactly which steps in the restore are significant time sinks.

Sure... machine isn't gonna do much of anything until 7.4 is released
(or I hear a promise of no more dump/reload).

>> I notice during the restore that the disk throughput triples during
>> the checkpoint.

TL> Hm, better make sure the log includes some indication of when
TL> checkpoints happen.

That it does.

I'll post the results in the next couple of days, as each run takes
about 4 hours ;-)

Re: restore time: sort_mem vs. checkpoing_segments

From
Robert Treat
Date:
On Mon, 2003-09-15 at 15:15, Vivek Khera wrote:
> And the winner is... checkpoint_segments.
>
> Restore of a significanly big database (~19.8GB restored) shows nearly
> no time difference depending on sort_mem when checkpoint_segments is
> large.  There are quite a number of tables and indexes.  The restore
> was done from a pg_dump -Fc dump of one database.
>
> All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG
> 7.4b2 on FreeBSD 4.8.

hmm... i wonder what would happen if you pushed your sort_mem higher...
on some of our development boxes and upgrade scripts, i push the
sort_mem to 102400 and sometimes even higher depending on the box. this
really speeds up my restores quit a bit (and is generally safe as i make
sure there isn't any other activity going on at the time)

another thing i like to do is turn of fsync, as if the system crashes in
the middle of reload i'm pretty sure i'd be starting all over anyway...

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: restore time: sort_mem vs. checkpoing_segments

From
Vivek Khera
Date:
>>>>> "RT" == Robert Treat <xzilla@users.sourceforge.net> writes:

RT> hmm... i wonder what would happen if you pushed your sort_mem higher...
RT> on some of our development boxes and upgrade scripts, i push the
RT> sort_mem to 102400 and sometimes even higher depending on the box. this
RT> really speeds up my restores quit a bit (and is generally safe as i make
RT> sure there isn't any other activity going on at the time)

Ok... just two more tests to run, no big deal ;-)


RT> another thing i like to do is turn of fsync, as if the system crashes in
RT> the middle of reload i'm pretty sure i'd be starting all over anyway...

I'll test it and see what happens.  I suspect not a big improvement on
a hardware RAID card with 128Mb backed up cache, though.  But never
say never!

Re: restore time: sort_mem vs. checkpoing_segments

From
Vivek Khera
Date:
>>>>> "RT" == Robert Treat <xzilla@users.sourceforge.net> writes:

RT> hmm... i wonder what would happen if you pushed your sort_mem higher...
RT> on some of our development boxes and upgrade scripts, i push the
RT> sort_mem to 102400 and sometimes even higher depending on the box. this
RT> really speeds up my restores quit a bit (and is generally safe as i make
RT> sure there isn't any other activity going on at the time)

I was just checking, and I already ran test with larger sort_mem.  the
checkpoint segments made more of a difference...

Re: restore time: sort_mem vs. checkpoing_segments

From
Bruce Momjian
Date:
Vivek Khera wrote:
> And the winner is... checkpoint_segments.
>
> Restore of a significanly big database (~19.8GB restored) shows nearly
> no time difference depending on sort_mem when checkpoint_segments is
> large.  There are quite a number of tables and indexes.  The restore
> was done from a pg_dump -Fc dump of one database.
>
> All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG
> 7.4b2 on FreeBSD 4.8.
>
> 3 checkpoint_segments restore time: 14983 seconds
> 50 checkpoint_segments restore time: 11537 seconds
> 50 checkpoint_segments, sort_mem 131702 restore time: 11262 seconds

With the new warning about too-frequent checkpoints, people have actual
feedback to encourage them to increase checkpoint_segments.  One issue
is that it is likely to recommend increasing checkpoint_segments during
restore, even if there is no value to it being large during normal
server operation.  Should that be decumented?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: restore time: sort_mem vs. checkpoing_segments

From
Vivek Khera
Date:
>>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:

BM> restore, even if there is no value to it being large during normal
BM> server operation.  Should that be decumented?


Yes, right alongside the recommendation to bump sort_mem, even though
in my tests sort_mem made no significant difference in restore time
going from 8m to 128m.

Re: restore time: sort_mem vs. checkpoing_segments

From
Dennis Bjorklund
Date:
On Tue, 23 Sep 2003, Bruce Momjian wrote:

> With the new warning about too-frequent checkpoints, people have actual
> feedback to encourage them to increase checkpoint_segments.  One issue
> is that it is likely to recommend increasing checkpoint_segments during
> restore, even if there is no value to it being large during normal
> server operation.  Should that be decumented?

One could have a variable that turns off that warning, and have pg_dump
insert a statement to turn it off. That is, if one never want these
warnings from a restore (from a new dump).

In any case, documentation is good and still needed.

--
/Dennis