Thread: LARGE db dump/restore for upgrade question
Hi I have a very large database of seismic data. It is about 27 Gb now, and growing at about the rate of 1 Gb every 3-4 days. I am running postgres 7.1.2. I might possibly try to upgrade to 7.2 when it comes out, but I don't know if it will be possible for me to do 7.3 due to the pg_dump/pg_restore problem. In a little over a year the database will probably pass the halfway point on my raid and so it will physically be impossible to upgrade. Most of the space is probably taken up by large objects, which I am hoping will make a solution at least a little bit easier. I am trying a pg_dump right now, and in the first 25 minutes it dumped 54Mb, which means that a full dump will take about 200 hours! I would guess the restore would take about the same amount of time, so I would be looking at 17 DAYS of downtime to upgrade! Maybe it will speed up later in the dump, I don't know. And in about 3 months or so it will take me twice that amout of time. Also, this is on a 4 processor sun E450 with a A1000 hardware raid, so it not that I am using old slow hardware. Just for comparison, a file system dump to tape took 6 hours, and that was back when I only had a software raid! So, my question is, is it likely that one day postgres will no longer require dump/restores for upgrades? I would assume that there will always be a need to tweak the internal layout of files, but I wonder if there isn't a way to do this "in place" or at least to allow a file system move of the large objects without requiring them to be dumped as well? Even better would be if each new version of postgres could read the immediatly previous version tables, and could convert them in the background. Maybe just dreaming here. :) Could something related to making the upgrade less painful for very large databases be added to the ToDo list even if it isn't a high priority? Not that I am complaining, postgres seems to handle this data volume quite well, and it is certainly worth very dollar I didn't pay for it. :) Any suggestion on how to prepare for the next upgrade would be appreciated. thanks, Philip
Philip Crotwell wrote: > Hi > > I have a very large database of seismic data. It is about 27 Gb now, and > growing at about the rate of 1 Gb every 3-4 days. I am running <snip> Out of curiosity, how long does it take you to vacuum that? -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
> Any suggestion on how to prepare for the next upgrade would be > appreciated. I think it has to be said that if you want decent performance on excessively large (50GB+) databases, you're going to need excessively good hardware to operate it on. Buy a 3ware IDE RAID controller (www.hypermicro.com) and a stripe a couple of seagate 80GB IDE drives (the new ones have an awesome transfer rate). Turn of fsync, and use softupdates or even async I/O. I am guessing though you've already done all of the above. You might have to go beefier hardware, for PCs that means DDR SDRAM/Rambus, and a 64 bit PCI 3ware card with 4 or 8 hard drives. Hope that helps Andrew
Hi Just did a vacuum, took almost 4 hours. The interesting thing about this is that there are only two small tables that have updates, everything else has been just inserts. I would have thought that a vacuum of a database shouldn't take very long if there aren't alot of "deleted" rows. Another problem is that the size of the database is now 52Gb!!! I assume that it is WAL files growing out of control? I thought this was fixed in 7.1.2? As soon as I start a second vacuum the size drops back down to 27Gb. thanks, Philip pooh 19>date ; time vacuumdb sceppdata ; date Wed Aug 15 09:09:19 EDT 2001 VACUUM 0.03u 0.08s 3:49:07.53 0.0% Wed Aug 15 12:58:29 EDT 2001 pooh 20> On Tue, 14 Aug 2001, Joseph Shraibman wrote: > > > Philip Crotwell wrote: > > Hi > > > > I have a very large database of seismic data. It is about 27 Gb now, and > > growing at about the rate of 1 Gb every 3-4 days. I am running > <snip> > > Out of curiosity, how long does it take you to vacuum that? > > -- > Joseph Shraibman > jks@selectacast.net > Increase signal to noise ratio. http://www.targabot.com > >
My understanding of pgSQL is that, when a row is updated or deleted, the old row is marked as such, but maintained in the DB until I vaccuum. (This certainly would ease the implementation of transactions.) Please correct me if I am mistaken. Assuming for now this is true... Is there some way to view the old rows or even restore one? And doesn't this mean a timestamp must be automagically associated with each row? How do I access these timestamps?
If there were lots of inserts, I would guess it would need to re-analyze the tables to update its statistics on them, so the query optimizer can make good choices. --Wes Philip Crotwell <crotwell%seis.sc.edu@interlock.lexmark.com> on 08/15/2001 01:08:19 PM To: Joseph Shraibman <jks%selectacast.net@interlock.lexmark.com> cc: pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: Re: [GENERAL] LARGE db dump/restore for upgrade question Hi Just did a vacuum, took almost 4 hours. The interesting thing about this is that there are only two small tables that have updates, everything else has been just inserts. I would have thought that a vacuum of a database shouldn't take very long if there aren't alot of "deleted" rows.
> I am trying a pg_dump right now, and in the first 25 minutes it dumped > 54Mb, which means that a full dump will take about 200 hours! I would > guess the restore would take about the same amount of time, so I would be > looking at 17 DAYS of downtime to upgrade! Maybe it will speed up later in Thinking out loud here ... Perhaps a slightly modified (or even custom written) pg_dump that dumps your large tables incrementally. Since you said that the large tables only have inserts, you should be able to run most of this while the DB is up. Have it keep track of the timestamp of the last row dumped for each table. Then it could continue from there, appending to the same file, next time you run it. If you keep the dump, then you won't have to do another large pg_dump next time you upgrade, assuming you never do updates on those tables. Of course, you'll still have to wait a while for it to restore after every upgrade. The other suggestion about fast hardware should help there. If this doesn't make any sense, sorry, it's 3 in the morning.... > Not that I am complaining, postgres seems to handle this data volume quite > well, and it is certainly worth very dollar I didn't pay for it. :) What a ringing endorsement.... -- Like to travel? http://TravTalk.org Micah Yoder Internet Development http://yoderdev.com