Thread: restore time: sort_mem vs. checkpoing_segments
And the winner is... checkpoint_segments. Restore of a significanly big database (~19.8GB restored) shows nearly no time difference depending on sort_mem when checkpoint_segments is large. There are quite a number of tables and indexes. The restore was done from a pg_dump -Fc dump of one database. All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG 7.4b2 on FreeBSD 4.8. 3 checkpoint_segments restore time: 14983 seconds 50 checkpoint_segments restore time: 11537 seconds 50 checkpoint_segments, sort_mem 131702 restore time: 11262 seconds There's an initdb between each test. For reference, the restore with 8k page size, 60k buffers, 8192 sort_mem and 3 checkpoint buffers was 14777 seconds. It seems for restore that a larger number of checkpoint buffers is the key, especially when dealing with large numbers of rows in a table. I notice during the restore that the disk throughput triples during the checkpoint. The postgres data partition is on a 14-spindle hardware RAID5 on U320 SCSI disks.
Vivek, > And the winner is... checkpoint_segments. > > Restore of a significanly big database (~19.8GB restored) shows nearly > no time difference depending on sort_mem when checkpoint_segments is > large. There are quite a number of tables and indexes. The restore > was done from a pg_dump -Fc dump of one database. Cool! Thank you for posting this. -- -Josh Berkus Aglio Database Solutions San Francisco
Vivek Khera <khera@kcilink.com> writes: > Restore of a significanly big database (~19.8GB restored) shows nearly > no time difference depending on sort_mem when checkpoint_segments is > large. There are quite a number of tables and indexes. The restore > was done from a pg_dump -Fc dump of one database. I was just bugging Marc for some useful data, so I'll ask you too: could you provide a trace of the pg_restore execution? log_statement plus log_duration output would do it. I am curious to understand exactly which steps in the restore are significant time sinks. > I notice during the restore that the disk throughput triples during > the checkpoint. Hm, better make sure the log includes some indication of when checkpoints happen. regards, tom lane
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: TL> I was just bugging Marc for some useful data, so I'll ask you too: TL> could you provide a trace of the pg_restore execution? log_statement TL> plus log_duration output would do it. I am curious to understand TL> exactly which steps in the restore are significant time sinks. Sure... machine isn't gonna do much of anything until 7.4 is released (or I hear a promise of no more dump/reload). >> I notice during the restore that the disk throughput triples during >> the checkpoint. TL> Hm, better make sure the log includes some indication of when TL> checkpoints happen. That it does. I'll post the results in the next couple of days, as each run takes about 4 hours ;-)
On Mon, 2003-09-15 at 15:15, Vivek Khera wrote: > And the winner is... checkpoint_segments. > > Restore of a significanly big database (~19.8GB restored) shows nearly > no time difference depending on sort_mem when checkpoint_segments is > large. There are quite a number of tables and indexes. The restore > was done from a pg_dump -Fc dump of one database. > > All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG > 7.4b2 on FreeBSD 4.8. hmm... i wonder what would happen if you pushed your sort_mem higher... on some of our development boxes and upgrade scripts, i push the sort_mem to 102400 and sometimes even higher depending on the box. this really speeds up my restores quit a bit (and is generally safe as i make sure there isn't any other activity going on at the time) another thing i like to do is turn of fsync, as if the system crashes in the middle of reload i'm pretty sure i'd be starting all over anyway... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
>>>>> "RT" == Robert Treat <xzilla@users.sourceforge.net> writes: RT> hmm... i wonder what would happen if you pushed your sort_mem higher... RT> on some of our development boxes and upgrade scripts, i push the RT> sort_mem to 102400 and sometimes even higher depending on the box. this RT> really speeds up my restores quit a bit (and is generally safe as i make RT> sure there isn't any other activity going on at the time) Ok... just two more tests to run, no big deal ;-) RT> another thing i like to do is turn of fsync, as if the system crashes in RT> the middle of reload i'm pretty sure i'd be starting all over anyway... I'll test it and see what happens. I suspect not a big improvement on a hardware RAID card with 128Mb backed up cache, though. But never say never!
>>>>> "RT" == Robert Treat <xzilla@users.sourceforge.net> writes: RT> hmm... i wonder what would happen if you pushed your sort_mem higher... RT> on some of our development boxes and upgrade scripts, i push the RT> sort_mem to 102400 and sometimes even higher depending on the box. this RT> really speeds up my restores quit a bit (and is generally safe as i make RT> sure there isn't any other activity going on at the time) I was just checking, and I already ran test with larger sort_mem. the checkpoint segments made more of a difference...
Vivek Khera wrote: > And the winner is... checkpoint_segments. > > Restore of a significanly big database (~19.8GB restored) shows nearly > no time difference depending on sort_mem when checkpoint_segments is > large. There are quite a number of tables and indexes. The restore > was done from a pg_dump -Fc dump of one database. > > All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG > 7.4b2 on FreeBSD 4.8. > > 3 checkpoint_segments restore time: 14983 seconds > 50 checkpoint_segments restore time: 11537 seconds > 50 checkpoint_segments, sort_mem 131702 restore time: 11262 seconds With the new warning about too-frequent checkpoints, people have actual feedback to encourage them to increase checkpoint_segments. One issue is that it is likely to recommend increasing checkpoint_segments during restore, even if there is no value to it being large during normal server operation. Should that be decumented? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes: BM> restore, even if there is no value to it being large during normal BM> server operation. Should that be decumented? Yes, right alongside the recommendation to bump sort_mem, even though in my tests sort_mem made no significant difference in restore time going from 8m to 128m.
On Tue, 23 Sep 2003, Bruce Momjian wrote: > With the new warning about too-frequent checkpoints, people have actual > feedback to encourage them to increase checkpoint_segments. One issue > is that it is likely to recommend increasing checkpoint_segments during > restore, even if there is no value to it being large during normal > server operation. Should that be decumented? One could have a variable that turns off that warning, and have pg_dump insert a statement to turn it off. That is, if one never want these warnings from a restore (from a new dump). In any case, documentation is good and still needed. -- /Dennis