VS: Backup/dump of huge tables and performance - Mailing list pgsql-general
From | Janne Blomqvist |
---|---|
Subject | VS: Backup/dump of huge tables and performance |
Date | |
Msg-id | 2A80FDF6CF54D211A97D00A0C992F5906CE81B@hanselx.hansel.fi Whole thread Raw |
List | pgsql-general |
>At 10:11 28/07/00 GMT, brianb-pggeneral@edsamail.com wrote: >> >>Philip Warner writes: >>> At 02:26 28/07/00 GMT, brianb-pggeneral@edsamail.com wrote: >>> >1. pg_dump or COPY take up a lot of I/O resources. That's not surprising >>> >considering the size of the tables, but the impact on the overall >>> >production system's performance is not acceptable. >>> >>> Firstly, you are not using '--inserts', I hope. It is very slow for both >>> backup & restore. Also, do you know if pg_dump is the IO bottleneck, or the >>> backend? >> >>Nope, I don't use --inserts. The bottleneck seems to be at the backend, >>since it also happens with COPY. > >I'm not sure you can conclude that, but you are probably right. If possible >can you use whatever tools you have to see which process *is* doing the >I/O? Just to be sure. > > >>> Other than that, I'm not sure what can be done about it - the I/O has to be >>> done some time. Possibly causing pg_dump to (optionally) pause between >>> records, but that seems like a bad idea, especially with 10s of millions of >>> records. >> >>> You could always do a file-based backup of the database, and restore it >>> somewhere else, and drop the tables you don't need. Not very elegant, I >>> realize. >> >>What file-based backup are you referring to? From reading previous posts, >>I'd inferred that I couldn't just copy my /var/lib/pgsql over to another >>server with Postgres and start postmaster. >You probably can't/shouldn't. To make it work, I suspect you'd have to kick >all the users off the DB, which is not going to help. >>> >>> I'm not sure what should be changed in pg_dump; delaying between records >>> seems like a bad idea since it does the dump in a single TX, and besides, >>> sleeping while a TX is open seems evil to me. >> >>I suppose it's a special case, then. I just need to move the table, and I'd >>rather have the I/O load spread over a longer period of time to soften the >>impact on the rest of the system. >It might bear thinking about...especially if anyone else has an opinion on >the options. One option which is sometimes recommended for larger Oracle installations is to use RAID 1 (ie. mirroring, which you should do anyway for performance and data security reasons). Then to take a backup you 1. Split the mirror. Let's call the volumes primary and secondary (they can of course span multiple disks). The database continues to run unaffected on the primary volume (except for a performance hit). Thus the secondary volume contains a snapshot of the database at the moment of the split. 2. Backup the files on the secondary volume. 3. Recombine the mirror. The RAID software will take care of updating the secondary volume. Of course this is less secure than a real online backup (ie. One of the primary disks fails during the backup process. oops.) but it's *a lot* faster than online backup and you can still be up and running all the time. Trading House Hansel Janne Blomqvist janne.blomqvist@hansel.fi
pgsql-general by date: