Thread: > 16TB worth of data question
I have a system that will store about 2TB+ of images per year in a PG database. Linux unfortunatly has the 16TB limit for 32bit systems. Not really sure what should be done here. Would life better to not store the images as BLOBS, and instead come up with some complicated way to only store the location in the database, or is there someway to have postgres handle this somehow? What are other people out there doing about this sort of thing? thanx, -jj- -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
Jeremiah Jahn <jeremiah@cs.earlham.edu> writes: > I have a system that will store about 2TB+ of images per year in a PG > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > really sure what should be done here. Would life better to not store the > images as BLOBS, and instead come up with some complicated way to only > store the location in the database, or is there someway to have postgres > handle this somehow? What are other people out there doing about this > sort of thing? Sounds to me as if you can ignore it for now. 16TB will last you at least four years, at which time you'll be replacing hardware anyway and can just buy 64-bit systems. :) -Doug
On Mon, 2003-04-21 at 14:23, Jeremiah Jahn wrote: > I have a system that will store about 2TB+ of images per year in a PG > database. Linux unfortunatly has the 16TB limit for 32bit systems. I'm not sure why the 16TB limit you refer to would be a problem in the short to medium term, if you're only storing 2TB/year. > Would life better to not store the > images as BLOBS, and instead come up with some complicated way to only > store the location in the database FWIW, storing the FS location in the database is by no means complicated. I'm personally not very fond of doing that, but not because it's overly complex... Cheers, Neil
Notice the plus sign in that 2TB+. I have reason to belive that I could hit that mark in a little over 2 years. What do you think the chances of Linux 2.6 are of stablizing in that time frame.. ;) I'm just questioning the use of BLOB's really. It would be nice if they could be spread over multiple file systems. -jj- On Mon, 2003-04-21 at 13:28, Doug McNaught wrote: > Jeremiah Jahn <jeremiah@cs.earlham.edu> writes: > > > I have a system that will store about 2TB+ of images per year in a PG > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > really sure what should be done here. Would life better to not store the > > images as BLOBS, and instead come up with some complicated way to only > > store the location in the database, or is there someway to have postgres > > handle this somehow? What are other people out there doing about this > > sort of thing? > > Sounds to me as if you can ignore it for now. 16TB will last you at > least four years, at which time you'll be replacing hardware anyway > and can just buy 64-bit systems. > > :) > > -Doug > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
Neil Conway wrote: > >>Would life better to not store the >>images as BLOBS, and instead come up with some complicated way to only >>store the location in the database > > FWIW, storing the FS location in the database is by no means > complicated. I'm personally not very fond of doing that, but not because > it's overly complex... Heck, is the Pg BLOB interface anything more than a cute hook into filesystem storage? I've often wondered what the point of BLOBs is, beyond providing a single API to all the data (maybe that *is* the point). Is there a performance advantage? -- __ / | Paul Ramsey | Refractions Research | Email: pramsey@refractions.net \_
Sorry, for got a little info. A few years from now, I'd still like to be able to luse my old servers for a bit of load balancing.. -jj- On Mon, 2003-04-21 at 13:38, Neil Conway wrote: > On Mon, 2003-04-21 at 14:23, Jeremiah Jahn wrote: > > I have a system that will store about 2TB+ of images per year in a PG > > database. Linux unfortunatly has the 16TB limit for 32bit systems. > > I'm not sure why the 16TB limit you refer to would be a problem in the > short to medium term, if you're only storing 2TB/year. > > > Would life better to not store the > > images as BLOBS, and instead come up with some complicated way to only > > store the location in the database > > FWIW, storing the FS location in the database is by no means > complicated. I'm personally not very fond of doing that, but not because > it's overly complex... > > Cheers, > > Neil > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
Paul Ramsey <pramsey@refractions.net> writes: > Heck, is the Pg BLOB interface anything more than a cute hook into > filesystem storage? Yes; it's organized as a table and storage-managed the same way as other tables. > I've often wondered what the point of BLOBs is, > beyond providing a single API to all the data (maybe that *is* the > point). That, plus when you back up the database you're backing up everything you need, plus it's harder for the database and the 'filesystem' to get out of sync (you can use triggers etc to make sure, and easily run queries to make sure you don't have any dangling references). > Is there a performance advantage? Slight disadvantage, I should think (unless the individual-file approach stupidly puts them all in one directory, which can be very slow on some system). -Doug
Jeremiah Jahn <jeremiah@cs.earlham.edu> wrote: > I have a system that will store about 2TB+ of images per year in a PG > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > really sure what should be done here. Would life better to not store > the images as BLOBS, and instead come up with some complicated way to > only store the location in the database, or is there someway to have > postgres handle this somehow? What are other people out there doing > about this sort of thing? > I have some experience in this matter. We've made a system that contains about 5 million images today, and it's about 2.5 TB. Our system works by having the metadata in a postgresql database to be able to do complex searches, and it only has i filename in _relative_ form like this: 11/2/33 and then the application server handling requests maps it into a real file: /storage/links/11/2/33.jpg The links directory is just a indirection from the real storage location, so "11" in this case can be /storage/nfs/server1/11. Flexibility is the big win here. I can split partitions up any way i want it and i don't have to care about any limits except maybe that the "links" directory is getting too big. But since this is our application we know that when the database is complete (20 million images+) it will till have only 4000 directory entries in "links". The blob stuff also sucks from other points of views. It makes the database less manageable and also messes with lazy (we use rsync) mirroring backups of the images, and also database dumps must be huge? (i don't know how blobs are handled in pg_dump) If it's nessecary to be able to handle this via postgresql, i would do a serverside function that fetch the image from the filesystem and sends it over, maybe using the bytea encoding? I wouldn't go for the all-in-database approach ever in anything big. Cheers Magnus
On 21 Apr 2003, Jeremiah Jahn wrote: > I have a system that will store about 2TB+ of images per year in a PG > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > really sure what should be done here. Would life better to not store the > images as BLOBS, and instead come up with some complicated way to only > store the location in the database, or is there someway to have postgres > handle this somehow? What are other people out there doing about this > sort of thing? Then why not start right out on 64 bit systems? Low end 64 bit sparcs Ultra 60 type stuff) aren't too expensive, and debian and a few other flavors seem to run quite quickly on Sparc hardware. There's also 64 mainframe linux, and a couple of other 64 bit platforms that are fairly mature, IBM's Power Pc based systems run Linux as well. If you're gonna play with big datasets, that's the one time that 64 bit really starts to have advantages, and let's face it, you're gonna go there eventually anyway, might as well get a head start now.
The only issue with this is that it is difficult to recomend to our clients who depend on bob and cuz'n joe to support their hardware. We are kinda in the business of recomending the HW our clients need, and not getting into the support side of it. Althoguh this might be a decent option. thanx, -jj- PS. My office already this whole black and silver motif going on, and purplely blue would kinda clash. On Mon, 2003-04-21 at 15:30, scott.marlowe wrote: > On 21 Apr 2003, Jeremiah Jahn wrote: > > > I have a system that will store about 2TB+ of images per year in a PG > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > really sure what should be done here. Would life better to not store the > > images as BLOBS, and instead come up with some complicated way to only > > store the location in the database, or is there someway to have postgres > > handle this somehow? What are other people out there doing about this > > sort of thing? > > Then why not start right out on 64 bit systems? Low end 64 bit sparcs > Ultra 60 type stuff) aren't too expensive, and debian and a few other > flavors seem to run quite quickly on Sparc hardware. > > There's also 64 mainframe linux, and a couple of other 64 bit platforms > that are fairly mature, IBM's Power Pc based systems run Linux as well. > > If you're gonna play with big datasets, that's the one time that 64 bit > really starts to have advantages, and let's face it, you're gonna go there > eventually anyway, might as well get a head start now. > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
Shame, there's an E10k with 64 CPUs selling for $24,000 with no opening bid on ebay... Ends in 10. http://listings.ebay.com/pool2/plistings/endtoday/all/category41080/index.html?from=R11 Ummmmmmm 64 CPUs... On 21 Apr 2003, Jeremiah Jahn wrote: > The only issue with this is that it is difficult to recomend to our > clients who depend on bob and cuz'n joe to support their hardware. We > are kinda in the business of recomending the HW our clients need, and > not getting into the support side of it. Althoguh this might be a decent > option. > > > thanx, > -jj- > > PS. My office already this whole black and silver motif going on, and > purplely blue would kinda clash. > > > > On Mon, 2003-04-21 at 15:30, scott.marlowe wrote: > > On 21 Apr 2003, Jeremiah Jahn wrote: > > > > > I have a system that will store about 2TB+ of images per year in a PG > > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > > really sure what should be done here. Would life better to not store the > > > images as BLOBS, and instead come up with some complicated way to only > > > store the location in the database, or is there someway to have postgres > > > handle this somehow? What are other people out there doing about this > > > sort of thing? > > > > Then why not start right out on 64 bit systems? Low end 64 bit sparcs > > Ultra 60 type stuff) aren't too expensive, and debian and a few other > > flavors seem to run quite quickly on Sparc hardware. > > > > There's also 64 mainframe linux, and a couple of other 64 bit platforms > > that are fairly mature, IBM's Power Pc based systems run Linux as well. > > > > If you're gonna play with big datasets, that's the one time that 64 bit > > really starts to have advantages, and let's face it, you're gonna go there > > eventually anyway, might as well get a head start now. > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly >
Wrong URL, here's the right one: http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=3413528727&category=11216
Jeremiah Jahn <jeremiah@cs.earlham.edu> writes: > The only issue with this is that it is difficult to recomend to our > clients who depend on bob and cuz'n joe to support their hardware. And you expect them to be successful running a database that acquires 2TB+ of data per year? I think you need to recalibrate your expectations. Consumer-grade junk PCs do not have the reliability to make such a project even worth starting. Run the database on decent made-to-be-a-server hardware, or you'll regret it. I think I've spent more time chasing various people's hardware failures lately than I have in investigating real Postgres bugs. I keep volunteering to look at failures because I figure there are still some data-loss bugs to be found, but I am coming to have a *real* low opinion of off-the-shelf PC hardware. regards, tom lane
FWIW, FreeBSD's UFS filesystem doesn't suffer from the 2G limit. It's limit is something like 4TB iirc. I'll also second the opinion that if you're really going to put that much data in your database, x86 hardware might not be a good idea. However, I'd recommend RS/6000 over Sun if you can afford it; for Oracle, the standard is that you only need 1/2 the CPUs in RS/6000 as you would in a Sun, because RS/6000 puts heavy emphasis on memory bandwidth. On Mon, Apr 21, 2003 at 01:23:55PM -0500, Jeremiah Jahn wrote: > I have a system that will store about 2TB+ of images per year in a PG > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > really sure what should be done here. Would life better to not store the > images as BLOBS, and instead come up with some complicated way to only > store the location in the database, or is there someway to have postgres > handle this somehow? What are other people out there doing about this > sort of thing? > > thanx, > -jj- > > -- > Jeremiah Jahn <jeremiah@cs.earlham.edu> > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Mon, 2003-04-21 at 20:43, Tom Lane wrote: > Jeremiah Jahn <jeremiah@cs.earlham.edu> writes: > > The only issue with this is that it is difficult to recomend to our > > clients who depend on bob and cuz'n joe to support their hardware. > > And you expect them to be successful running a database that acquires > 2TB+ of data per year? I think you need to recalibrate your > expectations. Consumer-grade junk PCs do not have the reliability > to make such a project even worth starting. Run the database on decent > made-to-be-a-server hardware, or you'll regret it. That's the question...That 2 TB of data is nothing but documents and images. I'm under the perception that if that gets parked on a fibre channel disk array/ SAN the data will be pretty safe, and the server mostly replaceable at that time. Storage is my worry more than processing power. I don't think I'm on crack here...? > > I think I've spent more time chasing various people's hardware failures > lately than I have in investigating real Postgres bugs. I keep > volunteering to look at failures because I figure there are still some > data-loss bugs to be found, but I am coming to have a *real* low opinion > of off-the-shelf PC hardware. > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
http://www.bayarea.com/mld/mercurynews/news/5676110.htm -- Lyndon Tiu
At 08:13 AM 4/22/2003 -0500, Jeremiah Jahn wrote: >That's the question...That 2 TB of data is nothing but documents and >images. I'm under the perception that if that gets parked on a fibre >channel disk array/ SAN the data will be pretty safe, and the server >mostly replaceable at that time. Storage is my worry more than >processing power. I don't think I'm on crack here...? How about backups? Backing up 2-16TB needs a bit more planning and design. > > I think I've spent more time chasing various people's hardware failures > > lately than I have in investigating real Postgres bugs. I keep > > volunteering to look at failures because I figure there are still some > > data-loss bugs to be found, but I am coming to have a *real* low opinion > > of off-the-shelf PC hardware. > > > > regards, tom lane Well there's off-the-shelf x86 desktop PC hardware, and there's off-the-shelf x86 server hardware. In my experience for name brands it's about 1/7 DOA for the former, and 0 for the latter. There's a big diff in dependability. The latter tend to keep running for many years, typically being retired for no fault of their own. Then there are users who stick no name memory into their servers or "servers". Regards, Link.
> > And you expect them to be successful running a database that acquires > > 2TB+ of data per year? I think you need to recalibrate your > > expectations. Consumer-grade junk PCs do not have the reliability > > to make such a project even worth starting. Run the database on decent > > made-to-be-a-server hardware, or you'll regret it. > That's the question...That 2 TB of data is nothing but documents and > images. I'm under the perception that if that gets parked on a fibre > channel disk array/ SAN the data will be pretty safe, and the server > mostly replaceable at that time. Storage is my worry more than > processing power. I don't think I'm on crack here...? Actually, true server-grade machines usually have _less_ processing power than PCs. Why? Because their attention is paid to reliability, not speed. Server class (ie. name-brand, ECC) memory is a must. Previous-generation processors are a must, and someone who knows something about server hardware putting it together is a must. If you don't have these things, you are bound for server lockups, memory corruption, and all sorts of other evils. Microsoft gets a lot of blame for the failures of the PC, but in reality, a lot of them are hardware-related. Jon > > > > > > I think I've spent more time chasing various people's hardware failures > > lately than I have in investigating real Postgres bugs. I keep > > volunteering to look at failures because I figure there are still some > > data-loss bugs to be found, but I am coming to have a *real* low opinion > > of off-the-shelf PC hardware. > > > > regards, tom lane > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > -- > Jeremiah Jahn <jeremiah@cs.earlham.edu> > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
> FWIW, FreeBSD's UFS filesystem doesn't suffer from the 2G limit. It's > limit is something like 4TB iirc. If the data itself is large (i.e. - large images not just large # of images), I'd go with an XFS filesystem. It has a 5 petabyte limit, and for large sequential reads, it has almost full-disk speed. Jon > > I'll also second the opinion that if you're really going to put that > much data in your database, x86 hardware might not be a good idea. > However, I'd recommend RS/6000 over Sun if you can afford it; for > Oracle, the standard is that you only need 1/2 the CPUs in RS/6000 as > you would in a Sun, because RS/6000 puts heavy emphasis on memory > bandwidth. > > On Mon, Apr 21, 2003 at 01:23:55PM -0500, Jeremiah Jahn wrote: > > I have a system that will store about 2TB+ of images per year in a PG > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > really sure what should be done here. Would life better to not store the > > images as BLOBS, and instead come up with some complicated way to only > > store the location in the database, or is there someway to have postgres > > handle this somehow? What are other people out there doing about this > > sort of thing? > > > > thanx, > > -jj- > > > > -- > > Jeremiah Jahn <jeremiah@cs.earlham.edu> > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > -- > Jim C. Nasby (aka Decibel!) jim@nasby.net > Member: Triangle Fraternity, Sports Car Club of America > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > At 08:13 AM 4/22/2003 -0500, Jeremiah Jahn wrote: > >That's the question...That 2 TB of data is nothing but documents and > >images. I'm under the perception that if that gets parked on a fibre > >channel disk array/ SAN the data will be pretty safe, and the server > >mostly replaceable at that time. Storage is my worry more than > >processing power. I don't think I'm on crack here...? > > How about backups? Backing up 2-16TB needs a bit more planning and design. The proposed solution here is to have the raid controller mirror accross the street to a similar system. At what point do off line backups become pointless? > > > > I think I've spent more time chasing various people's hardware failures > > > lately than I have in investigating real Postgres bugs. I keep > > > volunteering to look at failures because I figure there are still some > > > data-loss bugs to be found, but I am coming to have a *real* low opinion > > > of off-the-shelf PC hardware. > > > > > > regards, tom lane > > Well there's off-the-shelf x86 desktop PC hardware, and there's > off-the-shelf x86 server hardware. > > In my experience for name brands it's about 1/7 DOA for the former, and 0 > for the latter. There's a big diff in dependability. The latter tend to > keep running for many years, typically being retired for no fault of their own. > > Then there are users who stick no name memory into their servers or "servers". > > Regards, > Link. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
JFS is the same, however all Linux filsystems are limit to 16TB on 32bit HW. Due to the page caching code. That's with the 2.5 kernel. The 2.4 is 4TB iirc. On Tue, 2003-04-22 at 11:07, Jonathan Bartlett wrote: > > FWIW, FreeBSD's UFS filesystem doesn't suffer from the 2G limit. It's > > limit is something like 4TB iirc. > > If the data itself is large (i.e. - large images not just large # of > images), I'd go with an XFS filesystem. It has a 5 petabyte limit, and > for large sequential reads, it has almost full-disk speed. > > Jon > > > > > I'll also second the opinion that if you're really going to put that > > much data in your database, x86 hardware might not be a good idea. > > However, I'd recommend RS/6000 over Sun if you can afford it; for > > Oracle, the standard is that you only need 1/2 the CPUs in RS/6000 as > > you would in a Sun, because RS/6000 puts heavy emphasis on memory > > bandwidth. > > > > On Mon, Apr 21, 2003 at 01:23:55PM -0500, Jeremiah Jahn wrote: > > > I have a system that will store about 2TB+ of images per year in a PG > > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > > really sure what should be done here. Would life better to not store the > > > images as BLOBS, and instead come up with some complicated way to only > > > store the location in the database, or is there someway to have postgres > > > handle this somehow? What are other people out there doing about this > > > sort of thing? > > > > > > thanx, > > > -jj- > > > > > > -- > > > Jeremiah Jahn <jeremiah@cs.earlham.edu> > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > -- > > Jim C. Nasby (aka Decibel!) jim@nasby.net > > Member: Triangle Fraternity, Sports Car Club of America > > Give your computer some brain candy! www.distributed.net Team #1828 > > > > Windows: "Where do you want to go today?" > > Linux: "Where do you want to go tomorrow?" > > FreeBSD: "Are you guys coming, or what?" > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
> On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > > At 08:13 AM 4/22/2003 -0500, Jeremiah Jahn wrote: > > >That's the question...That 2 TB of data is nothing but documents and > > >images. I'm under the perception that if that gets parked on a fibre > > >channel disk array/ SAN the data will be pretty safe, and the server > > >mostly replaceable at that time. Storage is my worry more than > > >processing power. I don't think I'm on crack here...? > > > > How about backups? Backing up 2-16TB needs a bit more planning and design. > The proposed solution here is to have the raid controller mirror accross > the street to a similar system. At what point do off line backups become > pointless? I've always been told to ask the question -- what's your data worth to you or how much $ will you lose if you lose your data? That should answer the question on offline backups. -philip
On Tue, 2003-04-22 at 11:47, Philip Hallstrom wrote: > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > > > At 08:13 AM 4/22/2003 -0500, Jeremiah Jahn wrote: > > > >That's the question...That 2 TB of data is nothing but documents and > > > >images. I'm under the perception that if that gets parked on a fibre > > > >channel disk array/ SAN the data will be pretty safe, and the server > > > >mostly replaceable at that time. Storage is my worry more than > > > >processing power. I don't think I'm on crack here...? > > > > > > How about backups? Backing up 2-16TB needs a bit more planning and design. > > The proposed solution here is to have the raid controller mirror accross > > the street to a similar system. At what point do off line backups become > > pointless? > > I've always been told to ask the question -- what's your data worth to > you or how much $ will you lose if you lose your data? That should answer > the question on offline backups. > > -philip It's for the court system, so not much really. It basically a convience only. -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
One thing to think of, is how much is your REPUTATION worth? If everything you touch, turns to gold, keeps working, raisesand better equipment are a LOT easier to get. OTOH................ Jeremiah Jahn wrote: > On Tue, 2003-04-22 at 11:47, Philip Hallstrom wrote: > >>>On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: >>> >>>>At 08:13 AM 4/22/2003 -0500, Jeremiah Jahn wrote: >>>> >>>>>That's the question...That 2 TB of data is nothing but documents and >>>>>images. I'm under the perception that if that gets parked on a fibre >>>>>channel disk array/ SAN the data will be pretty safe, and the server >>>>>mostly replaceable at that time. Storage is my worry more than >>>>>processing power. I don't think I'm on crack here...? >>>> >>>>How about backups? Backing up 2-16TB needs a bit more planning and design. >>> >>>The proposed solution here is to have the raid controller mirror accross >>>the street to a similar system. At what point do off line backups become >>>pointless? >> >>I've always been told to ask the question -- what's your data worth to >>you or how much $ will you lose if you lose your data? That should answer >>the question on offline backups. >> >>-philip > > > It's for the court system, so not much really. It basically a convience > only. >
> FWIW, FreeBSD's UFS filesystem doesn't suffer from the 2G > limit. It's limit is something like 4TB iirc. Actually, on FreeBSD >5.X, you have UFS2 which shatters the 4TB limit. Now you're limited to 16,777,216TB per file system. Sorry it's not more. In truth, I wish that limit was higher as I seem to run into it on a daily basis. ::grumps:: 8-] > I'll also second the opinion that if you're really going to put that > much data in your database, x86 hardware might not be a good idea. > However, I'd recommend RS/6000 over Sun if you can afford it; There really is something to be said for the architecture of PowerPC's memory access routines and its scalability... though quirky to build open source stuff on, AIX is still the grand daddy of *NIXes for this very reason. -sc -- Sean Chittenden
> > How about backups? Backing up 2-16TB needs a bit more planning and design. > The proposed solution here is to have the raid controller mirror accross > the street to a similar system. At what point do off line backups become > pointless? This does not help if someone rm -R's the whole thing. If you don't think that's possible, it happened to Travelocity one day. The DBA thought that a tablespace was no longer in use and rm'd it. Ooops. Entire database down and the tablespace had to be restored from backup. > > > > > > > I think I've spent more time chasing various people's hardware failures > > > > lately than I have in investigating real Postgres bugs. I keep > > > > volunteering to look at failures because I figure there are still some > > > > data-loss bugs to be found, but I am coming to have a *real* low opinion > > > > of off-the-shelf PC hardware. > > > > > > > > regards, tom lane > > > > Well there's off-the-shelf x86 desktop PC hardware, and there's > > off-the-shelf x86 server hardware. > > > > In my experience for name brands it's about 1/7 DOA for the former, and 0 > > for the latter. There's a big diff in dependability. The latter tend to > > keep running for many years, typically being retired for no fault of their own. > > > > Then there are users who stick no name memory into their servers or "servers". > > > > Regards, > > Link. > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > -- > Jeremiah Jahn <jeremiah@cs.earlham.edu> > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Mon, 2003-04-21 at 13:23, Jeremiah Jahn wrote: > I have a system that will store about 2TB+ of images per year in a PG > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > really sure what should be done here. Would life better to not store the > images as BLOBS, and instead come up with some complicated way to only > store the location in the database, or is there someway to have postgres > handle this somehow? What are other people out there doing about this > sort of thing? Now that the hard disk and file system issues have been hashed around, have you thought about how you are going to back up this much data? You'll need multiple modern tape drives (like SuperDLT2, that can store 160GB *raw* per tape), and having one SCSI controller per drive, so you don't starve each. drive. You might also think about daisy chaining a set of mini-libraries. Then, for performance, you'd also need multiple 66MHz x 64 bit PCI bridges so each set of 6 PCI slots doesn't get bandwidth constrained. If PCI-X is shipping, I'd look at that. The bottom line is that this is going to be a very big box, or a lot of mid-sized boxes. -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | An ad currently being run by the NEA (the US's biggest | | public school TEACHERS UNION) asks a teenager if he can | | find sodium and *chloride* in the periodic table of the | | elements. | | And they wonder why people think public schools suck... | +-----------------------------------------------------------+
On Tue, Apr 22, 2003 at 08:13:50AM -0500, Jeremiah Jahn wrote: > That's the question...That 2 TB of data is nothing but documents and > images. I'm under the perception that if that gets parked on a fibre > channel disk array/ SAN the data will be pretty safe, and the server > mostly replaceable at that time. Storage is my worry more than > processing power. I don't think I'm on crack here...? "Processing power" isn't automatically what server-class hardware will get you. You need good ECC memory. You need good disk controllers. You need to make sure that random flakey garbage hardware will not write random flakey garbage all over your high quality disk array. Speed may not be an issue, but reliability really is. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Jeremiah Jahn wrote: > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > > How about backups? Backing up 2-16TB needs a bit more planning and design. > The proposed solution here is to have the raid controller mirror accross > the street to a similar system. At what point do off line backups become > pointless? Fine, so you have a 10 MBit link across the "street" in the budget but nothing to buy decent server hardware? 4TB (just to be around where you expect to be after 6 months or so) per year means an average MBit per second ... on a second thought, the 10 MBit will probably not be enough to satisfy peak times. Who made that mirror proposal and where is your PITR capable backup? I once saved a company by doing PITR. They ran an Oracle database on two software mirrored Raid5 systems ... and the application was misconfigured and deleted too much data (way too much). Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Ron Johnson wrote: > > On Mon, 2003-04-21 at 13:23, Jeremiah Jahn wrote: > > I have a system that will store about 2TB+ of images per year in a PG > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > really sure what should be done here. Would life better to not store the > > images as BLOBS, and instead come up with some complicated way to only > > store the location in the database, or is there someway to have postgres > > handle this somehow? What are other people out there doing about this > > sort of thing? > > Now that the hard disk and file system issues have been hashed around, > have you thought about how you are going to back up this much data? Legato had shown a couple years ago already that Networker can backup more than a Terabyte per hour. They used an RS6000 with over 100 disks and 36 DLT 7000 drives on 16 controllers if I recall correctly ... not your average backup solution but it's possible. But I doubt one can configure something like this with x86 hardware. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Fri, 25 Apr 2003, Jan Wieck wrote: > Ron Johnson wrote: > > > > On Mon, 2003-04-21 at 13:23, Jeremiah Jahn wrote: > > > I have a system that will store about 2TB+ of images per year in a PG > > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > > really sure what should be done here. Would life better to not store the > > > images as BLOBS, and instead come up with some complicated way to only > > > store the location in the database, or is there someway to have postgres > > > handle this somehow? What are other people out there doing about this > > > sort of thing? > > > > Now that the hard disk and file system issues have been hashed around, > > have you thought about how you are going to back up this much data? > > Legato had shown a couple years ago already that Networker can backup > more than a Terabyte per hour. They used an RS6000 with over 100 disks > and 36 DLT 7000 drives on 16 controllers if I recall correctly ... not > your average backup solution but it's possible. But I doubt one can > configure something like this with x86 hardware. I'm sure you could, but it might well involve 12 PII-350's running a trio of DLTs each, with a RAID array for caching. :-)
"scott.marlowe" wrote: > > On Fri, 25 Apr 2003, Jan Wieck wrote: > > > Ron Johnson wrote: > > > > > > On Mon, 2003-04-21 at 13:23, Jeremiah Jahn wrote: > > > > I have a system that will store about 2TB+ of images per year in a PG > > > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > > > really sure what should be done here. Would life better to not store the > > > > images as BLOBS, and instead come up with some complicated way to only > > > > store the location in the database, or is there someway to have postgres > > > > handle this somehow? What are other people out there doing about this > > > > sort of thing? > > > > > > Now that the hard disk and file system issues have been hashed around, > > > have you thought about how you are going to back up this much data? > > > > Legato had shown a couple years ago already that Networker can backup > > more than a Terabyte per hour. They used an RS6000 with over 100 disks > > and 36 DLT 7000 drives on 16 controllers if I recall correctly ... not > > your average backup solution but it's possible. But I doubt one can > > configure something like this with x86 hardware. > > I'm sure you could, but it might well involve 12 PII-350's running a trio > of DLTs each, with a RAID array for caching. :-) How many bits per second is a Terabyte per hour? One TB is 8TBit (plus protocol overhead and so on and so forth but let's ignore that for this discussion), we're talking about at least 8,000 GBit ... and one hour has 3.6 Kiloseconds ... making more than 2 GBit per second ... so there are 3 of these PC's per Gigabit segment and the DB server has a 4x Gigabit card just for that, cool! Of course, assuming we want to backup the total 24 Terabyte he has in 2-3 years in less than a day, if we have a month to take a backup we can save some money on the backup solution. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Fri, 2003-04-25 at 17:20, Jan Wieck wrote: > Ron Johnson wrote: > > > > On Mon, 2003-04-21 at 13:23, Jeremiah Jahn wrote: > > > I have a system that will store about 2TB+ of images per year in a PG > > > database. Linux unfortunatly has the 16TB limit for 32bit systems. Not > > > really sure what should be done here. Would life better to not store the > > > images as BLOBS, and instead come up with some complicated way to only > > > store the location in the database, or is there someway to have postgres > > > handle this somehow? What are other people out there doing about this > > > sort of thing? > > > > Now that the hard disk and file system issues have been hashed around, > > have you thought about how you are going to back up this much data? > > Legato had shown a couple years ago already that Networker can backup > more than a Terabyte per hour. They used an RS6000 with over 100 disks > and 36 DLT 7000 drives on 16 controllers if I recall correctly ... not > your average backup solution but it's possible. But I doubt one can > configure something like this with x86 hardware. And we're running something similar, though not quite as extravagant, on an AlphaServer GS320. SuperDLT 320 is *really* fast... (But you gotta keep them fed, and that's where x86 might choke.) -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | An ad currently being run by the NEA (the US's biggest | | public school TEACHERS UNION) asks a teenager if he can | | find sodium and *chloride* in the periodic table of the | | elements. | | And they wonder why people think public schools suck... | +-----------------------------------------------------------+
At 07:32 PM 4/25/2003 -0400, Jan Wieck wrote: >Of course, assuming we want to backup the total 24 Terabyte he has in >2-3 years in less than a day, if we have a month to take a backup we can >save some money on the backup solution. If you take the month approach, the filesystem/db snapshot has to stay in place for a full month whilst the backup is occuring. Do-able, but can be stressful e.g. if something goes wrong after trying for a full month... But anyway he said off-line backups aren't that important - I gather that recreating the data is not totally impractical. Still 16TB, ouch. Regards, Link.
Lincoln Yeoh wrote: > > At 07:32 PM 4/25/2003 -0400, Jan Wieck wrote: > >Of course, assuming we want to backup the total 24 Terabyte he has in > >2-3 years in less than a day, if we have a month to take a backup we can > >save some money on the backup solution. > > If you take the month approach, the filesystem/db snapshot has to stay in > place for a full month whilst the backup is occuring. Do-able, but can be > stressful e.g. if something goes wrong after trying for a full month... > > But anyway he said off-line backups aren't that important - I gather that > recreating the data is not totally impractical. Still 16TB, ouch. I think a scenario like that, where one has a relatively small percentage of really updated data and a huge portion of constantly growing, is a good example for when it might be appropriate NOT to store everything in one database. Depending on the rest of the requirements, it might need 2PC and using 2 databases. The updated database then could be backed up by normal means while for the constantly growing one you just archive the redo logs. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Sat, 2003-04-26 at 19:18, Jan Wieck wrote: > Lincoln Yeoh wrote: > > > > At 07:32 PM 4/25/2003 -0400, Jan Wieck wrote: > > >Of course, assuming we want to backup the total 24 Terabyte he has in > > >2-3 years in less than a day, if we have a month to take a backup we can > > >save some money on the backup solution. > > > > If you take the month approach, the filesystem/db snapshot has to stay in > > place for a full month whilst the backup is occuring. Do-able, but can be > > stressful e.g. if something goes wrong after trying for a full month... > > > > But anyway he said off-line backups aren't that important - I gather that > > recreating the data is not totally impractical. Still 16TB, ouch. > > I think a scenario like that, where one has a relatively small > percentage of really updated data and a huge portion of constantly > growing, is a good example for when it might be appropriate NOT to store > everything in one database. > > Depending on the rest of the requirements, it might need 2PC and using 2 > databases. The updated database then could be backed up by normal means > while for the constantly growing one you just archive the redo logs. Here's another thought: does all 16TB of data *really* have to be in the database all the time?? Maybe there's a business rule that anything older than 6 months isn't in the database itself, but is "just" sitting out on somewhere on a filesystem, and if the old data is requested, then, either programmatically or thru operator intervention, the old data is copied into the database. Yes, it would take longer to access "old" data, but, hey, that's reality (unless you want to spend *really*large* amounts of money). And it's not an "OSS vs.Proprietary" either. -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | An ad currently being run by the NEA (the US's biggest | | public school TEACHERS UNION) asks a teenager if he can | | find sodium and *chloride* in the periodic table of the | | elements. | | And they wonder why people think public schools suck... | +-----------------------------------------------------------+
Here's a 70 TB Backup RAID at the University of Tübingen that should be able handle the needs adequately. http://www.tomshardware.com/storage/20030425/index.html > On Sat, 2003-04-26 at 19:18, Jan Wieck wrote: >> Lincoln Yeoh wrote: >> > >> > At 07:32 PM 4/25/2003 -0400, Jan Wieck wrote: >> > >Of course, assuming we want to backup the total 24 Terabyte he has >> in 2-3 years in less than a day, if we have a month to take a >> backup we can save some money on the backup solution. >> > >> > If you take the month approach, the filesystem/db snapshot has to >> stay in place for a full month whilst the backup is occuring. >> Do-able, but can be stressful e.g. if something goes wrong after >> trying for a full month... >> > >> > But anyway he said off-line backups aren't that important - I gather >> that recreating the data is not totally impractical. Still 16TB, >> ouch. >> >> I think a scenario like that, where one has a relatively small >> percentage of really updated data and a huge portion of constantly >> growing, is a good example for when it might be appropriate NOT to >> store everything in one database. >> >> Depending on the rest of the requirements, it might need 2PC and using >> 2 databases. The updated database then could be backed up by normal >> means while for the constantly growing one you just archive the redo >> logs. > > Here's another thought: does all 16TB of data *really* have to be > in the database all the time?? > > Maybe there's a business rule that anything older than 6 months isn't in > the database itself, but is "just" sitting out on somewhere on a > filesystem, and if the old data is requested, then, either > programmatically or thru operator intervention, the old data is copied > into the database. > > Yes, it would take longer to access "old" data, but, hey, that's > reality (unless you want to spend *really*large* amounts of money). And > it's not an "OSS vs.Proprietary" either.
On Fri, 2003-04-25 at 16:46, Jan Wieck wrote: > Jeremiah Jahn wrote: > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > > > How about backups? Backing up 2-16TB needs a bit more planning and design. > > The proposed solution here is to have the raid controller mirror accross > > the street to a similar system. At what point do off line backups become > > pointless? > > Fine, so you have a 10 MBit link across the "street" in the budget but > nothing to buy decent server hardware? 4TB (just to be around where you > expect to be after 6 months or so) per year means an average MBit per > second ... on a second thought, the 10 MBit will probably not be enough > to satisfy peak times. actually they have fiber accross the street. So the SAN can just move stuff right across at 2Gbps. > > Who made that mirror proposal and where is your PITR capable backup? I > once saved a company by doing PITR. They ran an Oracle database on two > software mirrored Raid5 systems ... and the application was > misconfigured and deleted too much data (way too much). I don't belive that postgres does PITR yet. However, this is a system that doesn't need to be 24/7, so we can shut down the db and dump the whole thing to tape every day with no problem. > > > Jan -- Jeremiah Jahn <jeremiah@cs.earlham.edu>
On 28 Apr 2003, Jeremiah Jahn wrote: > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote: > > Jeremiah Jahn wrote: > > > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > > > > How about backups? Backing up 2-16TB needs a bit more planning and design. > > > The proposed solution here is to have the raid controller mirror accross > > > the street to a similar system. At what point do off line backups become > > > pointless? > > > > Fine, so you have a 10 MBit link across the "street" in the budget but > > nothing to buy decent server hardware? 4TB (just to be around where you > > expect to be after 6 months or so) per year means an average MBit per > > second ... on a second thought, the 10 MBit will probably not be enough > > to satisfy peak times. > actually they have fiber accross the street. So the SAN can just move > stuff right across at 2Gbps. > > > > > > Who made that mirror proposal and where is your PITR capable backup? I > > once saved a company by doing PITR. They ran an Oracle database on two > > software mirrored Raid5 systems ... and the application was > > misconfigured and deleted too much data (way too much). > > I don't belive that postgres does PITR yet. However, this is a system > that doesn't need to be 24/7, so we can shut down the db and dump the > whole thing to tape every day with no problem. Don't shut it down and backup at file system level, leave it up, restrict access via pg_hba.conf if need be, and use pg_dump. File system level backups are not the best way to go, although for quick recovery they can be added to full pg_dumps as an aid, but don't leave out the pg_dump, it's the way you're supposed to backup postgresql, and it can do so when the database is "hot and in use" and provide a consistent backup snapshot.
On Mon, 2003-04-28 at 10:42, scott.marlowe wrote: > On 28 Apr 2003, Jeremiah Jahn wrote: > > > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote: > > > Jeremiah Jahn wrote: > > > > > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: [snip] > Don't shut it down and backup at file system level, leave it up, restrict > access via pg_hba.conf if need be, and use pg_dump. File system level > backups are not the best way to go, although for quick recovery they can > be added to full pg_dumps as an aid, but don't leave out the pg_dump, > it's the way you're supposed to backup postgresql, and it can do so when > the database is "hot and in use" and provide a consistent backup > snapshot. What's the problem with doing a file-level backup of a *cold* database? The problem with pg_dump is that it's single-threaded, and it would take a whole lotta time to back up 16TB using 1 tape drive... If a pg database is spread across multiple devices (using symlinks), then a cold database can be backed up, at the file level, using multiple tape drives. (Of course, all symlinks would have to be recreated when/if the database files had to be restored.) -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | An ad currently being run by the NEA (the US's biggest | | public school TEACHERS UNION) asks a teenager if he can | | find sodium and *chloride* in the periodic table of the | | elements. | | And they wonder why people think public schools suck... | +-----------------------------------------------------------+
On 28 Apr 2003, Ron Johnson wrote: > On Mon, 2003-04-28 at 10:42, scott.marlowe wrote: > > On 28 Apr 2003, Jeremiah Jahn wrote: > > > > > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote: > > > > Jeremiah Jahn wrote: > > > > > > > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > [snip] > > Don't shut it down and backup at file system level, leave it up, restrict > > access via pg_hba.conf if need be, and use pg_dump. File system level > > backups are not the best way to go, although for quick recovery they can > > be added to full pg_dumps as an aid, but don't leave out the pg_dump, > > it's the way you're supposed to backup postgresql, and it can do so when > > the database is "hot and in use" and provide a consistent backup > > snapshot. > > What's the problem with doing a file-level backup of a *cold* database? There's no problem with doing it, the problem is that in order to get anything back you pretty much have to have all of it to make it work right, and any subtle problems of a partial copy might not be so obvious. Plus it sticks you to one major rev of the database. Pulling out five year old copies of the base directory can involve a fair bit of work getting an older flavor of postgresql to run on a newer os. It's not that it's wrong, it's that it should be considered carefully before being done. Note that a month or so ago someone had veritas backing up their postgresql database while it was still live and it was corrupting data. Admitted, you're gonna bring down the database first, so it's not an issue for you. > The problem with pg_dump is that it's single-threaded, and it would take > a whole lotta time to back up 16TB using 1 tape drive... But, you can run pg_dump against individual databases or tables on the same postmaster, so you could theoretically write a script around pg_dump to dump the databases or large tables to different drives. We backup our main server to our backup server that way, albeit with only one backup process at a time, since we can backup about a gig a minute, it's plenty fast for us. If we needed to parallelize it that would be pretty easy. > If a pg database is spread across multiple devices (using symlinks), > then a cold database can be backed up, at the file level, using > multiple tape drives. (Of course, all symlinks would have to be > recreated when/if the database files had to be restored.) The same thing can be done with multiple pg_dumps running against the databases / tables you select. While the database is still up. It's a toss up, but know that the pg_dump format is the "supported" method, so to speak, for backing up. Hopefully before it becomes a huge issue PITR will be done, eh?
On Mon, 2003-04-28 at 16:59, scott.marlowe wrote: > On 28 Apr 2003, Ron Johnson wrote: > [snip] > > On Mon, 2003-04-28 at 10:42, scott.marlowe wrote: > > > On 28 Apr 2003, Jeremiah Jahn wrote: > > > > > > > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote: > > > > > Jeremiah Jahn wrote: > > > > > > > > > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > > [snip] > > > Don't shut it down and backup at file system level, leave it up, restrict > > > access via pg_hba.conf if need be, and use pg_dump. File system level > > > backups are not the best way to go, although for quick recovery they can > > > be added to full pg_dumps as an aid, but don't leave out the pg_dump, > > > it's the way you're supposed to backup postgresql, and it can do so when > > > the database is "hot and in use" and provide a consistent backup > > > snapshot. > > > > What's the problem with doing a file-level backup of a *cold* database? > > There's no problem with doing it, the problem is that in order to get > anything back you pretty much have to have all of it to make it work > right, and any subtle problems of a partial copy might not be so obvious. > > Plus it sticks you to one major rev of the database. Pulling out five > year old copies of the base directory can involve a fair bit of work > getting an older flavor of postgresql to run on a newer os. Good point... [snip] > > The problem with pg_dump is that it's single-threaded, and it would take > > a whole lotta time to back up 16TB using 1 tape drive... > > But, you can run pg_dump against individual databases or tables on the > same postmaster, so you could theoretically write a script around pg_dump > to dump the databases or large tables to different drives. We backup our > main server to our backup server that way, albeit with only one backup > process at a time, since we can backup about a gig a minute, it's plenty > fast for us. If we needed to parallelize it that would be pretty easy. But pg doesn't guarantee internal consistency unless you pg_dump the database in one command "pg_dump db_name > db_yyyymmdd.dmp". Thus, no parallelism unless there are multiple databases, but if there's only 1 database... -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | An ad currently being run by the NEA (the US's biggest | | public school TEACHERS UNION) asks a teenager if he can | | find sodium and *chloride* in the periodic table of the | | elements. | | And they wonder why people think public schools suck... | +-----------------------------------------------------------+
On Tue, Apr 29, 2003 at 01:01:39AM -0500, Ron Johnson wrote: > But pg doesn't guarantee internal consistency unless you pg_dump > the database in one command "pg_dump db_name > db_yyyymmdd.dmp". > > Thus, no parallelism unless there are multiple databases, but if there's > only 1 database... It would probably be useful if you could feed a specific transaction id/timestamp to pg_dump, so that it will take the snapshot as of that time. Of course, it'd probably be easiest if you could just tell pg_dump to use X number of threads or dump devices. -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Tue, 2003-04-29 at 16:05, Jim C. Nasby wrote: > On Tue, Apr 29, 2003 at 01:01:39AM -0500, Ron Johnson wrote: > > But pg doesn't guarantee internal consistency unless you pg_dump > > the database in one command "pg_dump db_name > db_yyyymmdd.dmp". > > > > Thus, no parallelism unless there are multiple databases, but if there's > > only 1 database... > > It would probably be useful if you could feed a specific transaction > id/timestamp to pg_dump, so that it will take the snapshot as of that > time. Of course, it'd probably be easiest if you could just tell pg_dump > to use X number of threads or dump devices. Something like: # pg_dump -f /dev/sg0 -f /dev/sg1 -f /dev/sg2 -f /dev/sg3 <dbname> OR # pg_dump -f \(/dev/sg0,/dev/sg1,/dev/sg2,/dev/sg3\) <dbname> Then, pg_dump would decide what data to send to each device. -- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | An ad currently being run by the NEA (the US's biggest | | public school TEACHERS UNION) asks a teenager if he can | | find sodium and *chloride* in the periodic table of the | | elements. | | And they wonder why people think public schools suck... | +-----------------------------------------------------------+