Thread: pg 7.4.x - pg_restore impossibly slow
Greetings, I have 395M pg_dump from a PostgreSQL 7.4.2 database. This dump is from one of our customer's servers. There is a web-based administration UI which has been reported to be extremely slow and unusable. To see what's going on with their data I have grabbed a copy of their nightly pg_dump output and attempting to restore it on my development box, running PostgreSQL 7.4.12. My dev box is much slower hardware than the customer's server. Even with that difference I expected to be able to pg_restore the database within one day. But no. After leaving pg_restore running for about 2 days, I ctrl-C'ed out of it (see copy/paste below along with other info). I must say, that data was being restored, as I could do select count(*) on tables which had their data restored and I would get valid counts back. The database contains 34 tables. The pg_restore seems to restore the first 13 tables pretty quickly, but they do not have many records. The largest amongst them with ~ 17,000 rows. Then restore gets stuck on a table with 2,175,050 rows. Following this table another table exists with 2,160,616 rows. One thing worth mentioning is that the PostgreSQL package that got deployed lacked compression, as in: $ pg_dump -Fc dbname > dbname.DUMP pg_dump: [archiver] WARNING: requested compression not available in this installation -- archive will be uncompressed Any suggestions as to what may be the problem here? I doubt that the minor version mis-match is what's causing this problem. (I am try this test on another machine with the same version of PostgreSQL installed on it, and right now, it is stuck on the first of the two huge tables, and it has already been going for more than 2 hrs). I'm open to any ideas and/or suggestions (within reason) :) Best regards, --patrick me@devbox:/tmp$ date Mon Apr 10 15:13:19 PDT 2006 me@devbox:/tmp$ pg_restore -ad dbname customer_db.DUMP ; date ^C me@devbox:/tmp$ date Wed Apr 12 10:40:19 PDT 2006 me@devbox:/tmp$ uname -a Linux devbox 2.4.31 #6 Sun Jun 5 19:04:47 PDT 2005 i686 unknown unknown GNU/Linux me@devbox:/tmp$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 6 cpu MHz : 731.477 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1461.45 me@devbox:/tmp/$ cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 527499264 523030528 4468736 0 10301440 384454656 Swap: 1579204608 552960 1578651648 MemTotal: 515136 kB MemFree: 4364 kB MemShared: 0 kB Buffers: 10060 kB Cached: 374984 kB SwapCached: 460 kB Active: 79004 kB Inactive: 306560 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 515136 kB LowFree: 4364 kB SwapTotal: 1542192 kB SwapFree: 1541652 kB postgresql.conf changes on devbox: checkpoint_segments = 10 log_pid = true log_timestamp = true The checkpoint_segments was changed to 10 after seeing many "HINT"s in PostgreSQL log file about it. Doesn't seem to have affected pg_restore performance.
"patrick keshishian" <pkeshish@gmail.com> writes: > My dev box is much slower hardware than the customer's > server. Even with that difference I expected to be able to > pg_restore the database within one day. But no. Seems a bit odd. Can you narrow down more closely which step of the restore is taking the time? (Try enabling log_statements.) One thought is that kicking up work_mem and vacuum_mem is likely to help for some steps (esp. CREATE INDEX and foreign-key checking). And be sure you've done the usual tuning for write-intensive activity, such as bumping up checkpoint_segments. Turning off fsync wouldn't be a bad idea either. regards, tom lane
Hi Tom, et.al., So I changed the following settings in postgresql.conf file and restarted PostgreSQL and then proceeded with pg_restore: # new changes for this test-run log_statement = true sort_mem = 10240 # default 1024 vacuum_mem = 20480 # default 8192 # from before checkpoint_segments = 10 log_pid = true log_timestamp = true With these settings and running: pg_restore -vaOd dbname dbname.DUMP Things seem to progress better. The first of the large tables got COPY'ed within 1 hr 40 min: start: 2006-04-13 11:44:19 finish: 2006-04-13 13:25:36 I ended up ctrl-C'ing out of the pg_restore as the second large table was taking over 3 hours and the last PostgreSQL log entry was from over 2.5hrs ago, with message: 2006-04-13 14:09:29 [3049] LOG: recycled transaction log file "000000060000006B" Time for something different. Before attempting the same procedure with fsync off, I ran the following sequence of commands: $ dropdb dbname $ createdb dbname $ pg_restore -vsOd dbname dbname.DUMP $ date > db.restore ; pg_restore -vcOd dbname \ dbname.DUMP ; date >> db.restore $ cat db.restore Thu Apr 13 18:02:51 PDT 2006 Thu Apr 13 18:17:16 PDT 2006 That's just over 14 minutes! Ideas? Is this because the -c option drops all foreign keys and so the restore goes faster? Should this be the preferred, recommended and documented method to run pg_restore? Any drawbacks to this method? Thanks, --patrick On 4/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "patrick keshishian" <pkeshish@gmail.com> writes: > > My dev box is much slower hardware than the customer's > > server. Even with that difference I expected to be able to > > pg_restore the database within one day. But no. > > Seems a bit odd. Can you narrow down more closely which step of the > restore is taking the time? (Try enabling log_statements.) > > One thought is that kicking up work_mem and vacuum_mem is likely to > help for some steps (esp. CREATE INDEX and foreign-key checking). > And be sure you've done the usual tuning for write-intensive activity, > such as bumping up checkpoint_segments. Turning off fsync wouldn't > be a bad idea either. > > regards, tom lane
"patrick keshishian" <pkeshish@gmail.com> writes: > With these settings and running: > pg_restore -vaOd dbname dbname.DUMP If you had mentioned you were using random nondefault switches, we'd have told you not to. -a in particular is a horrid idea performancewise --- a standard schema-plus-data restore goes way faster because it's doing index builds and foreign key checks wholesale instead of incrementally. > Is this because the -c option drops all foreign keys and > so the restore goes faster? Should this be the preferred, > recommended and documented method to run pg_restore? It is documented in recent versions of the documentation: see http://www.postgresql.org/docs/8.1/static/populate.html particularly the last section. regards, tom lane
On Thu, Apr 13, 2006 at 06:26:00PM -0700, patrick keshishian wrote: > $ dropdb dbname > $ createdb dbname > $ pg_restore -vsOd dbname dbname.DUMP That step is pointless, because the next pg_restore will create the schema for you anyway. > $ date > db.restore ; pg_restore -vcOd dbname \ > dbname.DUMP ; date >> db.restore -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 4/13/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "patrick keshishian" <pkeshish@gmail.com> writes: > > With these settings and running: > > pg_restore -vaOd dbname dbname.DUMP > > If you had mentioned you were using random nondefault switches, we'd Random? With all due respect, I did. I specified the PostgreSQL version of the pg_dump source server. I specified the version of my dev PostgreSQL server. I provided specific information about which postgresql.conf entries I had changed and to what specific values they were changed to. I pasted the _exact_ command used (including so called "random nondefault switches") to do the dump and the exact command used (again, with said "random nondefault switches") to restore from the dump'ed data. I believe I tried my best to be as thorough as possible with my post(s). archived at: http://archives.postgresql.org/pgsql-performance/2006-04/msg00287.php > have told you not to. -a in particular is a horrid idea performancewise > --- a standard schema-plus-data restore goes way faster because it's > doing index builds and foreign key checks wholesale instead of > incrementally. Duly noted. Option "-a" bad. > > Is this because the -c option drops all foreign keys and > > so the restore goes faster? Should this be the preferred, > > recommended and documented method to run pg_restore? > > It is documented in recent versions of the documentation: see > http://www.postgresql.org/docs/8.1/static/populate.html > particularly the last section. As a general rule of thumb, I have always assumed, documentation for any software, from one major version to another, would not necessarily apply cross revisions (e.g., version 7.4 vs 8.1). But thanks for your time and help, --patrick
On 4/13/06, Jim C. Nasby <jnasby@pervasive.com> wrote: > On Thu, Apr 13, 2006 at 06:26:00PM -0700, patrick keshishian wrote: > > $ dropdb dbname > > $ createdb dbname > > $ pg_restore -vsOd dbname dbname.DUMP > > That step is pointless, because the next pg_restore will create the > schema for you anyway. Yes, I noticed this with the verbose output (random non-standard option "-v"). I was providing all information (read: exact steps taken) which may have been relevant to my post/question, so that, I would avoid being guilty of omitting any possibly significant, yet random information. Thanks for the insight, --patrick