Thread: How to increace nightly backup speed
I need to create nightly backups to separate computer over internet from 8.1 server I tried to run pg_dump to make backup using command "..\pg_dump\pg_dump.exe" -i -Z9 -b -v -f "C:\061127 mybackup.backup" -F c -h 81.50.12.18 -U myuser -p 5431 mydb but this takes 4.5 hours to complete. How to increace backup speed ? Andrus.
On Nov 27, 2006, at 18:13 , Andrus wrote: > I need to create nightly backups to separate computer over internet > from 8.1 > server > > I tried to run pg_dump to make backup using command > > "..\pg_dump\pg_dump.exe" -i -Z9 -b -v -f "C:\061127 > mybackup.backup" -F c -h > 81.50.12.18 -U myuser -p 5431 mydb Reducing compression (change -Z9 to something like -Z0) might help a little, depending on the processing power of the server and available bandwidth. If you really want compression, you could do the dump over a SSH connection and gzip on the client end: $ ssh 81.50.12.18 "pg_dump -Z0 -Fc -ibv -U myuser mydb" | gzip -9 >mybackup.backup Alexander.
On Mon, 2006-11-27 at 19:13 +0200, Andrus wrote: > I need to create nightly backups to separate computer over internet from 8.1 > server > > I tried to run pg_dump to make backup using command > > "..\pg_dump\pg_dump.exe" -i -Z9 -b -v -f "C:\061127 mybackup.backup" -F c -h > 81.50.12.18 -U myuser -p 5431 mydb > > but this takes 4.5 hours to complete. > > How to increace backup speed ? > You might try using online backups. By following the steps in this document: http://www.postgresql.org/docs/8.1/static/backup-online.html you can back up the data at the speed of your filesystem. There's no way to make it faster than that. Hope this helps, Jeff Davis
> > > > You might try using online backups. By following the steps in this > document: > > http://www.postgresql.org/docs/8.1/static/backup-online.html > > you can back up the data at the speed of your filesystem. There's no way > to make it faster than that. Note however that you will want to apply the backups to the copied version of the postgresql every night. Otherwise you are likely looking at a very long restore process should you need to bring it up. Joshua D. Drake > > Hope this helps, > Jeff Davis > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Jeff Davis wrote: > On Mon, 2006-11-27 at 19:13 +0200, Andrus wrote: >> I need to create nightly backups to separate computer over internet from 8.1 >> server >> >> I tried to run pg_dump to make backup using command >> >> "..\pg_dump\pg_dump.exe" -i -Z9 -b -v -f "C:\061127 mybackup.backup" -F c -h >> 81.50.12.18 -U myuser -p 5431 mydb >> >> but this takes 4.5 hours to complete. >> >> How to increace backup speed ? I agree to use online backups. But, I bet the majority of your backup time is spent compressing your data (-Z9). The solution I'm currently using (both with online backups and pg_dump) is to copy all the data to an alternate volume uncompressed, then compress it (nice gzip -1). This allows the backup itself to happen as quickly as possible, and then the compression can run for awhile without having a large IO impact on the main data volumes. I worry about how long the initial copy/dump takes, but not so much after that. -Glen
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/27/06 19:26, Glen Parker wrote: > Jeff Davis wrote: >> On Mon, 2006-11-27 at 19:13 +0200, Andrus wrote: >>> I need to create nightly backups to separate computer over internet >>> from 8.1 server >>> >>> I tried to run pg_dump to make backup using command >>> >>> "..\pg_dump\pg_dump.exe" -i -Z9 -b -v -f "C:\061127 mybackup.backup" >>> -F c -h >>> 81.50.12.18 -U myuser -p 5431 mydb >>> >>> but this takes 4.5 hours to complete. >>> >>> How to increace backup speed ? > > > I agree to use online backups. But, I bet the majority of your backup > time is spent compressing your data (-Z9). > > The solution I'm currently using (both with online backups and pg_dump) > is to copy all the data to an alternate volume uncompressed, then > compress it (nice gzip -1). This allows the backup itself to happen as > quickly as possible, and then the compression can run for awhile without > having a large IO impact on the main data volumes. I worry about how > long the initial copy/dump takes, but not so much after that. If you've got excess CPU capacity at night, I wonder if -Z1 or -Z2 would speed the backup since it reduces the amount of data written to disk. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFa524S9HxQb37XmcRAhbgAKCCE0FMnE8X9zcRHQ5PIj0EWU5+vACgqjmm KhhZVmjhRvMadZdP7rHNX6o= =R8fi -----END PGP SIGNATURE-----
> You might try using online backups. By following the steps in this > document: > > http://www.postgresql.org/docs/8.1/static/backup-online.html > > you can back up the data at the speed of your filesystem. There's no way > to make it faster than that. PITR config is complicated. A lot of manual steps are required. Requires postgresql.conf file changing, server re-start, creating PITR directory and adding postgres user privilege to it, file copying. PITR recovery is also complicated. My goal is to create ERP system which creates backups without any attendance. I don'nt know how to automate this 100% and havent found any such sample. So I'm waiting when this is automated like 8.2 automates user privilege dropping and use pg_dump in current installations. Andrus.
> If you've got excess CPU capacity at night, I wonder if -Z1 or -Z2 > would speed the backup since it reduces the amount of data written > to disk. Where to find study which pg_dump compares backup speed and backup size by using various -Z options ? I'm wondering by -Z9 increases backup speed. Info-zip zip.exe -9 config option does not decrease compiression speed I expected that pg_dump uses same complression algorithm as zip.exe I'm really wondering why -Z9 decreases backup speed significantly. Andrus.
On 11/28/06, Andrus <eetasoft@online.ee> wrote:Depending on what you plan to do with the backups (like create a fallover server), I don't know that you'll find a fully automated solution without going to very expensive high end products like Oracle, and quite possibly not even then.
I have a three part approach to backups, all implemented via cron (on a Linux server):
1. On a daily basis I back up the key tables using pgdump.
2. On a weekly basis I also backup the two 'productiion' databases using pgdump.
3. Also on a weekly basis, I do a full backup (dumpall) of the entire PostgreSQL database.
The weekly backup of the larger of the two databases produces a file that is about 20GB and takes about an hour and 15 minutes. I then compress it down to about 4 GB, which takes another hour. However, because that's a separate task, it doesn't impact the database server as much. (I suspect all that I/O slows things down a bit, but I haven't noticed any significant effect in my transaction time reports. That task is run during the slowest 4 hour period of the week, though).
A 'restore' of that database on a different server takes somewhere between 4 and 6 hours.
BTW, if you've never actually tested your recovery capabilities, can you be sure they work?
I did a full-blown test in February or March and found a few loose ends. And when we had to do the real thing in May (due to a power supply failure), there were STILL a few loose ends, but we were back online within 12 hours of when I started the recovery process, and half of that time was spent completing the setup of the 'backup' server, which I had been rebuilding. I'm working to lower that downtime and will be doing another full-blown test in January or February.
My goal is to create ERP system which creates backups without any
attendance.
I don'nt know how to automate this 100% and havent found any such sample.
I have a three part approach to backups, all implemented via cron (on a Linux server):
1. On a daily basis I back up the key tables using pgdump.
2. On a weekly basis I also backup the two 'productiion' databases using pgdump.
3. Also on a weekly basis, I do a full backup (dumpall) of the entire PostgreSQL database.
The weekly backup of the larger of the two databases produces a file that is about 20GB and takes about an hour and 15 minutes. I then compress it down to about 4 GB, which takes another hour. However, because that's a separate task, it doesn't impact the database server as much. (I suspect all that I/O slows things down a bit, but I haven't noticed any significant effect in my transaction time reports. That task is run during the slowest 4 hour period of the week, though).
A 'restore' of that database on a different server takes somewhere between 4 and 6 hours.
BTW, if you've never actually tested your recovery capabilities, can you be sure they work?
I did a full-blown test in February or March and found a few loose ends. And when we had to do the real thing in May (due to a power supply failure), there were STILL a few loose ends, but we were back online within 12 hours of when I started the recovery process, and half of that time was spent completing the setup of the 'backup' server, which I had been rebuilding. I'm working to lower that downtime and will be doing another full-blown test in January or February.
> The weekly backup of the larger of the two databases produces a file that > is about 20GB and takes about an hour and 15 minutes. I then compress it > down to about 4 GB, which takes another hour. However, because that's a > separate task, it doesn't impact the database server as much. (I suspect > all that I/O slows things down a bit, but I haven't noticed any > significant effect in my transaction time reports. That task is run during > the slowest 4 hour period of the week, though). My environment is a bit different. For safety, I need to create backups to separate computer over over internet. 1. Backup computer has client category internet connection (ADSL, 600 KBit/s download speed). 2. Query "SELECT sum( relpages * 8/1000) FROM pg_class" returns 1302 for a database restored from backup. So my data size seems to be approx 1 GB only. 3. Backup file size is 70 M 4. Backup client has all ports closed. 5. Server has *only* 5432 port open. So I think that 4.5 hours which requires to create backup is because pg_dump download the whole database (1 GB) in uncompressed format over slow internet connection. Compression level does not affect to this almost at all. I think I can create backup copy fast in server computer but how to send it to backup computer? pg_read_file() can read only text files and is restricted only to superusers. How to add a function pg_read_backup() to Postgres which creates and returns backup file with download speed ? This problably requires implementing some file download protocol. > BTW, if you've never actually tested your recovery capabilities, can you > be sure they work? > I did a full-blown test in February or March and found a few loose ends. > And when we had to do the real thing in May (due to a power supply > failure), there were STILL a few loose ends, but we were back online > within 12 hours of when I started the recovery process, and half of that > time was spent completing the setup of the 'backup' server, which I had > been rebuilding. I'm working to lower that downtime and will be doing > another full-blown test in January or February. I expect that full database backup created using pd_dump does not have never have any problems on restore. Andrus.
> $ ssh 81.50.12.18 "pg_dump -Z0 -Fc -ibv -U myuser mydb" | gzip -9 Alexander, 1. My database size seems to be appox 1 GB and download speed is approx 600 kb/s. Your solution requires 4.5 hours download time since 1 GB of data must be downloaded. 2. I have only 5432 port open to public internet on server used by Postgres. Your solution requires ssh port to be opened also but sysadmins refuce to configure routers for second port. How to use your solution in those cases ? Andrus.
On Tue, Nov 28, 2006 at 06:01:43PM +0200, Andrus wrote: > 5. Server has *only* 5432 port open. > > pg_read_file() can read only text files and is restricted only to > superusers. > > How to add a function pg_read_backup() to Postgres which creates and > returns backup file with download speed ? You could use an *un*trusted procedural language to create a function to binary-read the backup from disk and return it as a bytea field. Not sure how efficient that is, though. You could then simply do select get_backup(); If you allow for parameters you could make it return certain backups based on, perhaps, timestamp of creation. select list_available_backups(); might complete the suite of tools. One could then always use some hashing tools (mhash with PG bindings comes to mind) to verify whether a backup has arrived safely: on local machine: ripemd160(backupfile) select yhash.ripemd160(get_backup()) = <local hash>; Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Andrus wrote: > So I think that 4.5 hours which requires to create backup is because pg_dump > download the whole database (1 GB) in uncompressed format over slow > internet connection. > Compression level does not affect to this almost at all. Might well be latency issues too. > I think I can create backup copy fast in server computer but how to send it > to backup computer? Use scp. Open port 22 and allow only connections from the backup machine with a specified user (e.g. "pgbackup"). Alternatively, you might try dumping in a text-format and using rsync to transfer changes. > pg_read_file() can read only text files and is restricted only to > superusers. > > How to add a function pg_read_backup() to Postgres which creates and > returns backup file with download speed ? > This problably requires implementing some file download protocol. Just don't try and do it within PG - use the tools the system provides. > I expect that full database backup created using pd_dump does not have never > have any problems on restore. It's entirely possible to cause problems by restoring as the wrong user or with a different character-set / locale. -- Richard Huxton Archonet Ltd
On Tue, Nov 28, 2006 at 03:29:44PM +0200, Andrus wrote: > > If you've got excess CPU capacity at night, I wonder if -Z1 or -Z2 > > would speed the backup since it reduces the amount of data written > > to disk. > > Where to find study which pg_dump compares backup speed and backup size by > using various -Z options ? pg_dump uses zlib, you should be able to find statistic on that. > I'm wondering by -Z9 increases backup speed. > Info-zip zip.exe -9 config option does not decrease compiression speed I don't beleive that. The whole point of increasing the compression level is that it spends longer on the data to compress it better. It you could compress it better in the same time, you'd just do it and not make it optional. > I'm really wondering why -Z9 decreases backup speed significantly. Level 9 is the absolute maximum compression level, it tries really hard to make the data small, but it's usually not that much better than level 6. If you're transferring over a 100Mb network, I wouldn't recommend going higher than level 3. It is not unheard of that increasing the compression makes the process take longer, because you're saturating the CPU while your network is idle. Also, if you have a multicore system, it may be worth having pg_dump not compress but piping the output through gzip, then you can use both processors simultaneously. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Richard, > Use scp. Open port 22 and allow only connections from the backup machine > with a specified user (e.g. "pgbackup"). > > Alternatively, you might try dumping in a text-format and using rsync to > transfer changes. I really do'nt want to open separate port for backup only. Pelase, can you recomment a solution which uses port 5432 owned by Postgres ? >> pg_read_file() can read only text files and is restricted only to >> superusers. >> >> How to add a function pg_read_backup() to Postgres which creates and >> returns backup file with download speed ? >> This problably requires implementing some file download protocol. > > Just don't try and do it within PG - use the tools the system provides. How to force postmaster to run the tool which system provides when it receives backup request instead of postgres child process ? Andrus.
> You could use an *un*trusted procedural language to create a > function to binary-read the backup from disk and return it > as a bytea field. Not sure how efficient that is, though. > > You could then simply do > > select get_backup(); > > If you allow for parameters you could make it return certain > backups based on, perhaps, timestamp of creation. Karsten, This id good idea but it forces to use Postgres protocol for downloading. This protocol has some timeouts which are too small for large file download. Postgres protocol has also a lot of overhead added to downloadable data. It also requires that whole downloadable file must fit into memory. I used ODBC. I tried this but was forced to store big files in 1 MB chunks in bytea fields and create file from downloaded blocks So I'm looking a more efficient protocols to download backup file. Or should I really write code which divides backup file to 1 MB chunks and stores them in bytea field ? Andrus.
I don't see where doing the backup directly to another computer increases your safety margin, it may even lower it due to the increased potential for network issues messing up the backup cycle. Do it locally then SCP the completed (and compressed) file to another computer, which is what I do. (In fact I send it to THREE different computers in two physical locations, including one 900 miles away.)
Mike Nolan
On 11/28/06, Andrus <eetasoft@online.ee > wrote:
--
My environment is a bit different. For safety, I need to create backups to
separate computer over over internet.
Mike Nolan
On Tue, 2006-11-28 at 19:34 +0200, Andrus wrote: > > You could use an *un*trusted procedural language to create a > > function to binary-read the backup from disk and return it > > as a bytea field. Not sure how efficient that is, though. > > > > You could then simply do > > > > select get_backup(); > > > > If you allow for parameters you could make it return certain > > backups based on, perhaps, timestamp of creation. > > > Karsten, > > This id good idea but it forces to use Postgres protocol for downloading. > This protocol has some timeouts which are too small for large file download. > Postgres protocol has also a lot of overhead added to downloadable data. > It also requires that whole downloadable file must fit into memory. You could backup to a postgresql large object, and then transmit that. Would that be more efficient? It would be kind of strange, but it might work if you can't open any other ports. Regards, Jeff Davis
On Tue, Nov 28, 2006 at 07:23:44PM +0200, Andrus wrote: > Pelase, can you recomment a solution which uses port 5432 owned by Postgres If you think you know your usage pattern: Have cron stop PostgreSQL at, say, 2am. Have cron start ssh on port 5432 at 2:05am if PG is down. Have cron shutdown ssh on port 5432 at 2:55am. Have cron KILL ssh on port 5432 if need be. Have cron start PostgreSQL at 3am if ssh is down. > How to force postmaster to run the tool which system provides when it > receives backup request instead of postgres child process ? There is no (builtin/obvious/easy) way for good reason. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Tue, Nov 28, 2006 at 07:34:56PM +0200, Andrus wrote: > This id good idea but it forces to use Postgres protocol for downloading. Why, of course. > This protocol has some timeouts which are too small for large file download. For "sane" values of "large" I doubt this is true. A field in PG can store about 1 GB of data (says the FAQ) and the protocol better be able to hand out as much. It may be that you need to increase statement_timeout - which can be done on a per-session basis. > Postgres protocol has also a lot of overhead added to downloadable data. Yes. But you wanted to use port 5432 on a machine already running PG. Not sure but using a binary cursor might improve things. Using a client library capable of the v3 (?) protocol should significantly lower the overhead, too. > It also requires that whole downloadable file must fit into memory. My PG knowledge isn't up to this task but I have a sneaking suspicion this isn't really enforced by PG itself. ODBC > I tried this but was forced to store big files in 1 MB chunks in bytea > fields and create file from downloaded blocks Other client libraries may do better here. > Or should I really write code which divides backup file to 1 MB chunks and > stores them in bytea field ? No. I would not even store them in the database at all. I would use the untrusted language function to read the file from disk and return a (virtual) bytea field (which doesn't exist in the database). Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Andrus wrote: > Richard, > >> Use scp. Open port 22 and allow only connections from the backup machine >> with a specified user (e.g. "pgbackup"). >> >> Alternatively, you might try dumping in a text-format and using rsync to >> transfer changes. > > I really do'nt want to open separate port for backup only. > Pelase, can you recomment a solution which uses port 5432 owned by Postgres > ? No. You seem dead set on making your life harder than it needs to be. If you really don't want to have the ssh port open then set the backup to run from a cron-job on the main server and have it scp the result over to the backup server. That way only the backup server needs ssh open. >>> pg_read_file() can read only text files and is restricted only to >>> superusers. >>> >>> How to add a function pg_read_backup() to Postgres which creates and >>> returns backup file with download speed ? >>> This problably requires implementing some file download protocol. >> Just don't try and do it within PG - use the tools the system provides. > > How to force postmaster to run the tool which system provides when it > receives backup request instead of postgres child process ? Don't. You're just making life difficult for yourself. Use the standard Unix solution for such things - ssh. How are you installing updates without ssh access to the server? -- Richard Huxton Archonet Ltd
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: > Not sure but using a binary cursor might improve things. Why not use COPY protocol? regards, tom lane
On Nov 28, 2006, at 11:11 AM, Andrus wrote: > 1. My database size seems to be appox 1 GB and download speed is > approx 600 > kb/s. Your solution requires 4.5 hours download time > since 1 GB of data must be downloaded. If you're running pg_dump on a remote host, you're transferring the data over the pipe and compressing locally, since the pg wire protocol is not compressed. The compression time is probably not causing any slowness unless your local CPU is incredibly slow and can't keep up with the data streaming in at that low speed. I don't see how you can improve your download speed without doing compression at the other end to reduce the number of bits you have to push through your network. SSH seems to be a resonable solution to this (run dump + compress on remote host, then copy data over), but if you rule out anything that doesn't go over port 5432 then I think you're out of luck... Well, one thing... is there another host on the remote LAN to which you can ssh? If so, then use SSH port-forwarding and enable compression on the ssh connection to that host, then connect to postgres via the forwarded port to do your dump locally. The data will be compressed on the wire.
Attachment
On þri, 2006-11-28 at 19:23 +0200, Andrus wrote: > Richard, > > I really do'nt want to open separate port for backup only. > Pelase, can you recomment a solution which uses port 5432 owned by Postgres I do not want to advice you to do things that might be counter your company's security policies, but you could set up a portforwarder on your database machine to pass incoming port 5432 requests from the backup machine to sshd, but let all other source ips go to postgres alternatively, if you have control of cron, and if the firewall restrictons are for incoming only, and if you have open ssh port on some other machine, such as the backupserver, you can have cron do the compressed backup, and send it via scp to the backupserver. > How to force postmaster to run the tool which system provides when it > receives backup request instead of postgres child process ? if you cannot control cron, but the firewall restrictions are only for incoming requests, you might try using some untrusted procedural language to start a backup script locally, that finishes by copying the compressed backup to the backupserver. whatever you choose to do, you should still get permission to do it. security admins are not amused when users use tricks to get around restrictons. maybe they would consider opening the ssh port if you make it clear that they may restrict it to requests from the backup machine? gnari
> You seem dead set on making your life harder than it needs to be. If you > really don't want to have the ssh port open then set the backup to run > from a cron-job on the main server and have it scp the result over to the > backup server. That way only the backup server needs ssh open. By backup clients use cheap ADSL internet connections. Thery have all ports closed by ISP. >> How to force postmaster to run the tool which system provides when it >> receives backup request instead of postgres child process ? > > Don't. You're just making life difficult for yourself. Use the standard > Unix solution for such things - ssh. > > How are you installing updates without ssh access to the server? My servers are usually Windows computers. I have installed LogMein ( www.logmein.com ) which allows to access those computers when ports are closed at both sides. I havent installed any updates. I start with 8.1 and installed last version on new server. So some servers have 8.1.1, some 8.1.2 etc up to 8.1.5 I havent touched old servers after system is set up. Andrus.
>> Not sure but using a binary cursor might improve things. > > Why not use COPY protocol? I did full text search in Postgres 8.1 help file for "COPY protocol" but havent found any matches. Which is COPY protocol and how to use it ? Andrus.
Andrus wrote: >> You seem dead set on making your life harder than it needs to be. If you >> really don't want to have the ssh port open then set the backup to run >> from a cron-job on the main server and have it scp the result over to the >> backup server. That way only the backup server needs ssh open. > > By backup clients use cheap ADSL internet connections. > Thery have all ports closed by ISP. In that case, if you really can't have incoming ports enabled at either end, you'll need to have a separate machine available to both servers and clients. Then you can use that as a switch and have: 1. servers connect and upload backups to it 2. clients connect and download the last backup. >>> How to force postmaster to run the tool which system provides when it >>> receives backup request instead of postgres child process ? >> Don't. You're just making life difficult for yourself. Use the standard >> Unix solution for such things - ssh. >> >> How are you installing updates without ssh access to the server? > > My servers are usually Windows computers. > I have installed LogMein ( www.logmein.com ) which allows to access those > computers > when ports are closed at both sides. Hmm - looks like they're doing something similar to the process I describe above, then running ftp/vnc etc. over the link. OK, so do something with the logmein client instead of ssh then. Assuming you can script it. It seems to suggest that you can do file synchronisation across their connection. > I havent installed any updates. I start with 8.1 and installed last version > on new server. So some servers have 8.1.1, > some 8.1.2 etc up to 8.1.5 > I havent touched old servers after system is set up. You should try and come up with a way of upgrading the servers to the latest (8.1.5 at the moment) on a regular basis. Some of those upgrades are very important. Use your logmein setup to keep them up to date. If you have many servers you'll want to automate the process, but I'm afraid I don't do enough Windows work to make any useful suggestions there. -- Richard Huxton Archonet Ltd
vivek@khera.org (Vivek Khera) writes: > On Nov 28, 2006, at 11:11 AM, Andrus wrote: > >> 1. My database size seems to be appox 1 GB and download speed is >> approx 600 >> kb/s. Your solution requires 4.5 hours download time >> since 1 GB of data must be downloaded. > > If you're running pg_dump on a remote host, you're transferring the > data over the pipe and compressing locally, since the pg wire > protocol is not compressed. The compression time is probably not > causing any slowness unless your local CPU is incredibly slow and > can't keep up with the data streaming in at that low speed. > > I don't see how you can improve your download speed without doing > compression at the other end to reduce the number of bits you have to > push through your network. ... And if the network is pretty fast, the amount of CPU that compression eats is likely to worsen the speed at which data gets transferred. > SSH seems to be a resonable solution to this (run dump + compress on > remote host, then copy data over), but if you rule out anything that > doesn't go over port 5432 then I think you're out of luck... > > Well, one thing... is there another host on the remote LAN to which > you can ssh? If so, then use SSH port-forwarding and enable > compression on the ssh connection to that host, then connect to > postgres via the forwarded port to do your dump locally. The data > will be compressed on the wire. We were finding that some of our bigger backups were, due to bzip2 cost, taking ~4h. Dumping to a file turned this into "dump for 1h, compress for 3" which cut down the length of the transaction. Our need did involve actually needing bzip2, as there's a later step where there is a need to transfer data across a much slower network connection. Some testing with varying bzip2 and gzip options showed that compression was pretty sure to be real expensive if used in the initial "processing pipeline." There are other options out there that could conceivably change the price of compression, such as: http://www.lzop.org/ http://www.quicklz.com/ Of course, those are not as well known compression systems, and so are not as well trusted. Maybe worth looking into, tho. -- "cbbrowne","@","linuxdatabases.info" http://cbbrowne.com/info/rdbms.html Rules of the Evil Overlord #135. "My doomsday machine will have the advanced technological device called a capacitor just in case someone inconveniently pulls the plug at the last moment. (If I have access to REALLY advanced technology, I will include the even better back-up device known as the "battery.")" <http://www.eviloverlord.com/>
Chris Browne schrieb: > There are other options out there that could conceivably change the > price of compression, such as: > > http://www.lzop.org/ > http://www.quicklz.com/ > > Of course, those are not as well known compression systems, and so are > not as well trusted. Maybe worth looking into, tho. LZO is pretty much rock solid. It is used in OpenVPN and supposedly was used for the communication with NASAs Mars Rovers Spirit and Opportunity, if that counts as trusted. regards, bkw
On Wed, Nov 29, 2006 at 11:21:41PM +0100, Bernhard Weisshuhn wrote: > LZO is pretty much rock solid. It is used in OpenVPN and supposedly was > used for the communication with NASAs Mars Rovers Spirit and > Opportunity, if that counts as trusted. It's also GPL, which makes it a hard sell. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.