Thread: Ways to speed up dump&reload
Hello! Finally the time has come for us to upgrade our PostgreSQL 7.4.5-servers to 8.0.1 - and though I'm very much looking forwardto some of the new features, the dump&reload process is worrying me a bit this time. I've got one cluster in particularwhich is roughly 9GB in size and features some dreadfully large indices of the GiST-type for Tsearch2. I have already scheduled a nightshift for this upgrade, but I'm not all too certain that I'll be up and running again inthe morning, so I wondered if there might be some preparations that would allow for some speed-up during the reload process?What kind of tuning could be done in postgresql.conf in respect to just this particular workload (COPY and finallythe CREATE INDEX stuff) with no other concurrent access going on? The machine in question features a RAID10 disk array which hosts the DB-cluster, some 2GB RAM and four processors and I'vetuned postgresql.conf for our ordinary everyday workload with lots and lots of concurrent reads and writes, but I'vegot the feeling that this configuration might not be the optimum for dump&reload, so it might make some sense to usesome sort of "maintenance"-configuration. What kind of tuning would you recommend? Thank you very much for your advice and lots of cheers to the developers for 8.0! Kind regards Markus
Hi Markus, Have you considered installing slony and replicate from the old version to the new one ? It can do that, and when the replica is up to date, the switchover will take minutes I guess. Note that I have never done that so please ask others about that too, or research for yourself ;-) Cheers, Csaba. On Mon, 2005-02-21 at 07:11, Markus Wollny wrote: > Hello! > > Finally the time has come for us to upgrade our PostgreSQL 7.4.5-servers to 8.0.1 - and though I'm very much looking forwardto some of the new features, the dump&reload process is worrying me a bit this time. I've got one cluster in particularwhich is roughly 9GB in size and features some dreadfully large indices of the GiST-type for Tsearch2. > > I have already scheduled a nightshift for this upgrade, but I'm not all too certain that I'll be up and running again inthe morning, so I wondered if there might be some preparations that would allow for some speed-up during the reload process?What kind of tuning could be done in postgresql.conf in respect to just this particular workload (COPY and finallythe CREATE INDEX stuff) with no other concurrent access going on? > > The machine in question features a RAID10 disk array which hosts the DB-cluster, some 2GB RAM and four processors and I'vetuned postgresql.conf for our ordinary everyday workload with lots and lots of concurrent reads and writes, but I'vegot the feeling that this configuration might not be the optimum for dump&reload, so it might make some sense to usesome sort of "maintenance"-configuration. What kind of tuning would you recommend? > > Thank you very much for your advice and lots of cheers to the developers for 8.0! > > Kind regards > > Markus > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Markus, have you read updated http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html about regprocedure_7.4.patch.gz and regprocedure_update.sql ? Oleg On Mon, 21 Feb 2005, Markus Wollny wrote: > Hello! > > Finally the time has come for us to upgrade our PostgreSQL 7.4.5-servers to 8.0.1 - and though I'm very much looking forwardto some of the new features, the dump&reload process is worrying me a bit this time. I've got one cluster in particularwhich is roughly 9GB in size and features some dreadfully large indices of the GiST-type for Tsearch2. > > I have already scheduled a nightshift for this upgrade, but I'm not all too certain that I'll be up and running again inthe morning, so I wondered if there might be some preparations that would allow for some speed-up during the reload process?What kind of tuning could be done in postgresql.conf in respect to just this particular workload (COPY and finallythe CREATE INDEX stuff) with no other concurrent access going on? > > The machine in question features a RAID10 disk array which hosts the DB-cluster, some 2GB RAM and four processors and I'vetuned postgresql.conf for our ordinary everyday workload with lots and lots of concurrent reads and writes, but I'vegot the feeling that this configuration might not be the optimum for dump&reload, so it might make some sense to usesome sort of "maintenance"-configuration. What kind of tuning would you recommend? > > Thank you very much for your advice and lots of cheers to the developers for 8.0! > > Kind regards > > Markus > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> Hello! > > Finally the time has come for us to upgrade our PostgreSQL > 7.4.5-servers to 8.0.1 - and though I'm very much looking > forward to some of the new features, the dump&reload process > is worrying me a bit this time. I've got one cluster in > particular which is roughly 9GB in size and features some > dreadfully large indices of the GiST-type for Tsearch2. > > I have already scheduled a nightshift for this upgrade, but > I'm not all too certain that I'll be up and running again in > the morning, so I wondered if there might be some > preparations that would allow for some speed-up during the > reload process? What kind of tuning could be done in > postgresql.conf in respect to just this particular workload > (COPY and finally the CREATE INDEX stuff) with no other > concurrent access going on? If you'll be around to babysit the system during the reload, turn off fsync during the load. If you have the RAM, consider temporarily moving the WAL logs to a ramdrive. Increase sort_mem and maintenance_sort_mem, since you will only have a single connection. If you're on hyperthreading CPUs, disable hyperthreading. Since you load in a single connectino, only one CPU will be used. Be very careful to remember to turn this back on after the load! Not sure if there is something specific you can do for tsearch, but this should help with the general stuff //Magnus
Hello Oleg, Yes, thanks - I have installed this patch as soon as it became available. Concerning my largest database, I haven't yet migratedthat to tsearch2 - it's still tsearch1-based, so I intend to drop ts1 before the last dump, reload the database withPostgreSQL 8.0.1 and deploy tsearch2 after that. I was thinking more in terms of what would be the optimum settings forthings like workmem during reload and creation of the indices - as there's not much else going on at this point of time,it might be worthwhile to use some settings which allocate all available resources to just this task. Kind regards Markus -----Original Message----- From: Oleg Bartunov [mailto:oleg@sai.msu.su] Sent: Mon 2/21/2005 11:15 To: Markus Wollny Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Ways to speed up dump&reload Markus, have you read updated http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html about regprocedure_7.4.patch.gz and regprocedure_update.sql ? Oleg
Hi! Thanks very much, this was exactly the kind of advice I was hoping for! I'll give the WAL to ramdisk thing a try; fsync isoff by default anyways - the data in this cluster is not so valuable as not to risk one day of rollback for the performancegain of having fsync turned off and I'm doing nightly dumps anyway AND the machine as both a dedicated UPS andredundant PSUs, so the likelyhood of things turning snafu is considerably small. But having WAL in RAM might actuallyspeed up the whole lot quite considerably. Thank you! Kind regards Markus -----Original Message----- From: Magnus Hagander [mailto:mha@sollentuna.net] Sent: Mon 2/21/2005 13:04 To: Markus Wollny; pgsql-general@postgresql.org Cc: Subject: RE: [GENERAL] Ways to speed up dump&reload > Hello! > > Finally the time has come for us to upgrade our PostgreSQL > 7.4.5-servers to 8.0.1 - and though I'm very much looking > forward to some of the new features, the dump&reload process > is worrying me a bit this time. I've got one cluster in > particular which is roughly 9GB in size and features some > dreadfully large indices of the GiST-type for Tsearch2. > > I have already scheduled a nightshift for this upgrade, but > I'm not all too certain that I'll be up and running again in > the morning, so I wondered if there might be some > preparations that would allow for some speed-up during the > reload process? What kind of tuning could be done in > postgresql.conf in respect to just this particular workload > (COPY and finally the CREATE INDEX stuff) with no other > concurrent access going on? If you'll be around to babysit the system during the reload, turn off fsync during the load. If you have the RAM, consider temporarily moving the WAL logs to a ramdrive. Increase sort_mem and maintenance_sort_mem, since you will only have a single connection. If you're on hyperthreading CPUs, disable hyperthreading. Since you load in a single connectino, only one CPU will be used. Be very careful to remember to turn this back on after the load! Not sure if there is something specific you can do for tsearch, but this should help with the general stuff //Magnus
Quoth Markus.Wollny@computec.de ("Markus Wollny"): > Finally the time has come for us to upgrade our PostgreSQL > 7.4.5-servers to 8.0.1 - and though I'm very much looking forward to > some of the new features, the dump&reload process is worrying me a > bit this time. I've got one cluster in particular which is roughly > 9GB in size and features some dreadfully large indices of the > GiST-type for Tsearch2. > > I have already scheduled a nightshift for this upgrade, but I'm not > all too certain that I'll be up and running again in the morning, so > I wondered if there might be some preparations that would allow for > some speed-up during the reload process? What kind of tuning could > be done in postgresql.conf in respect to just this particular > workload (COPY and finally the CREATE INDEX stuff) with no other > concurrent access going on? You might consider using Slony-I to minimize the downtime. I haven't done an upgrade using, but know that others have, and I've done the same using eRServer... The idea is that you set up an 8.0.1 backend, and set up replication well ahead of time. Replicate from the 7.4 system to the 8.0 one. It might take several days to get replication up to date if the databases are large enough, but once they are relatively in sync, they should stay there pretty easily. Switching versions is then as easy as using MOVE SET to switch the origin from the 7.4 system to the 8.0 one. That ought to take mere seconds, so you wouldn't need a long time to do the upgrade. See the URL below for more information... -- output = reverse("moc.liamg" "@" "enworbbc") http://linuxdatabases.info/info/versionupgrade.html "Did you ever walk in a room and forget why you walked in? I think that's how dogs spend their lives." -- Sue Murphy
You might consider using Slony-I to minimize the downtime. I haven't done an upgrade using, but know that others have, and I've done the same using eRServer... The idea is that you set up an 8.0.1 backend, and set up replication well ahead of time. Replicate from the 7.4 system to the 8.0 one. It might take several days to get replication up to date if the databases are large enough, but once they are relatively in sync, they should stay there pretty easily. Switching versions is then as easy as using MOVE SET to switch the origin from the 7.4 system to the 8.0 one. That ou -- output = reverse("moc.liamg" "@" "enworbbc") http://linuxdatabases.info/info/versionupgrade.html The quickest way to a man's heart is through his chest, with an axe.
Hi! Thanks, I'll try Slony-I next time - I currently lack the time to test it beforehand on my non-production system; I'd bea bit worried about the additional load a second PG-instance as replication slave would impose on the machine, so at themoment I'm more confident going along the same tracks I've gone before during the previous switches since 7.1. Kind regards Markus
max out your checkpoint_segments. 128 or 256 is good if you have plenty of spare space for the pg_xlog directory. you will also want to increase checkpoint_timeout to something large. I like 900 seconds personally. if you can put pg_xlog on a separate physical RAID you're best off. increase the amount of ram your index operations can use (sort_mem or work_mem depending on PG version) to a very large number and then reset that to a normal value when you're done. then just let pg_dump + pg_restore do its work. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/