Thread: Re: [HACKERS] tables > 1 gig
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> Then we'd better fix the underlying problem. We can't change > >> RELSEG_SIZE for a minor release, unless you want to give up the > >> principle of not forcing initdb at minor releases. > > > Why can't we increase it? > > Consider a 1.5-gig table. 6.5 will store it as one gig in file "table", > one-half gig in file "table.1". Now recompile with larger RELSEG_SIZE. > The file manager will now expect to find all blocks of the relation in > file "table", and will never go to "table.1" at all. Presto, you lost > a bunch of data. > > Bottom line is just as it says in the config.h comments: you can't > change either BLCKSZ or RELSEG_SIZE without doing initdb. OK. I will reverse it out. I never thought that far ahead. Not sure how we can fix this easily, nor do I understand why more people aren't complaining about not being able to vacuum tables that are 1.5 gigs that they used to be able to vacuum. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > ... nor do I understand why more people aren't > complaining about not being able to vacuum tables that are 1.5 gigs that > they used to be able to vacuum. Most likely, not very many people with tables that big have adopted 6.5 yet ... if I were running a big site, I'd probably wait for 6.5.1 on general principles ;-) I think what we ought to do is finish working out how to make mdtruncate safe for concurrent backends, and then do it. That's the right long-term answer anyway. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > ... nor do I understand why more people aren't > > complaining about not being able to vacuum tables that are 1.5 gigs that > > they used to be able to vacuum. > > Most likely, not very many people with tables that big have adopted 6.5 > yet ... if I were running a big site, I'd probably wait for 6.5.1 on > general principles ;-) > > I think what we ought to do is finish working out how to make mdtruncate > safe for concurrent backends, and then do it. That's the right > long-term answer anyway. Problem is, no one knows how right now. I liked unlinking every segment, but was told by Hiroshi that causes a problem with concurrent access and vacuum because the old backends still think it is there. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> I think what we ought to do is finish working out how to make mdtruncate >> safe for concurrent backends, and then do it. That's the right >> long-term answer anyway. > Problem is, no one knows how right now. I liked unlinking every > segment, but was told by Hiroshi that causes a problem with concurrent > access and vacuum because the old backends still think it is there. I haven't been paying much attention, but I imagine that what's really going on here is that once vacuum has collected all the still-good tuples at the front of the relation, it doesn't bother to go through the remaining blocks of the relation and mark everything dead therein? It just truncates the file after the last block that it put tuples into, right? If this procedure works correctly for vacuuming a simple one-segment table, then it would seem that truncation of all the later segments to zero length should work correctly. You could truncate to zero length *and* then unlink the files if you had a mind to do that, but I can see why unlink without truncate would not work reliably. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> I think what we ought to do is finish working out how to make mdtruncate > >> safe for concurrent backends, and then do it. That's the right > >> long-term answer anyway. > > > Problem is, no one knows how right now. I liked unlinking every > > segment, but was told by Hiroshi that causes a problem with concurrent > > access and vacuum because the old backends still think it is there. > > I haven't been paying much attention, but I imagine that what's really > going on here is that once vacuum has collected all the still-good > tuples at the front of the relation, it doesn't bother to go through > the remaining blocks of the relation and mark everything dead therein? > It just truncates the file after the last block that it put tuples into, > right? > > If this procedure works correctly for vacuuming a simple one-segment > table, then it would seem that truncation of all the later segments to > zero length should work correctly. > > You could truncate to zero length *and* then unlink the files if you > had a mind to do that, but I can see why unlink without truncate would > not work reliably. That seems like the issue. The more complex problem is that when the relation lookes a segment via vacuum, things go strange on the other backends. Hiroshi seems to have a good testbed for this, and I thought it was fixed, so I didn't notice. Unlinking allows other backends to keep their open segments of the tables, but that causes some problems with backends opening segments they think still exist and they can't be opened. Truncating segments causes problems because backends are still accessing their own copies of the tables, and truncate modified what is seen in their open file descriptors. We basically have two methods, and both have problems under certain circumstances. I wonder if we unlink the files, but then create zero-length segments for the ones we unlink. If people think that may fix the problems, it is easy to do that, and we can do it atomically using the rename() system call. Create the zero-length file under a temp name, then rename it to the segment file name. That may do the trick of allowing existing file descriptors to stay active, while having segments in place for those that need to see them. Comments? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I haven't been paying much attention, but I imagine that what's really > going on here is that once vacuum has collected all the still-good > tuples at the front of the relation, it doesn't bother to go through > the remaining blocks of the relation and mark everything dead therein? > It just truncates the file after the last block that it put tuples into, > right? > > If this procedure works correctly for vacuuming a simple one-segment > table, then it would seem that truncation of all the later segments to > zero length should work correctly. Not sure about that. When we truncate single segment file, the table is being destroyed, so we invalidate it in the catalog cache and tell other backends. Also, we have a problem with DROP TABLE in a transaction while others are using it as described by a bug report a few days ago, so I don't think we have that 100% either. > You could truncate to zero length *and* then unlink the files if you > had a mind to do that, but I can see why unlink without truncate would > not work reliably. That is interesting. I never thought of that. Hiroshi, can you test that idea? If it is the non-existance of the file that other backends are checking for, my earlier idea of rename() with truncated file kept in place may be better. Also, I see why we are not getting more bug reports. They only get this when the table looses a segment, so it is OK to vacuum large tables as long as the table doesn't loose a segment during the vacuum. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > I haven't been paying much attention, but I imagine that what's really > > going on here is that once vacuum has collected all the still-good > > tuples at the front of the relation, it doesn't bother to go through > > the remaining blocks of the relation and mark everything dead therein? > > It just truncates the file after the last block that it put tuples into, > > right? > > > > If this procedure works correctly for vacuuming a simple one-segment > > table, then it would seem that truncation of all the later segments to > > zero length should work correctly. > > Not sure about that. When we truncate single segment file, the table is > being destroyed, so we invalidate it in the catalog cache and tell other > backends. Also, we have a problem with DROP TABLE in a transaction > while others are using it as described by a bug report a few days ago, > so I don't think we have that 100% either. > > That is interesting. I never thought of that. Hiroshi, can you test > that idea? If it is the non-existance of the file that other backends > are checking for, my earlier idea of rename() with truncated file kept > in place may be better. > > Also, I see why we are not getting more bug reports. They only get this > when the table looses a segment, so it is OK to vacuum large tables as > long as the table doesn't loose a segment during the vacuum. OK, this is 100% wrong. We truncate from vacuum any time the table size changes, and vacuum of large tables will fail even if not removing a segment. I forgot vacuum does this to reduce disk table size. I wonder if truncating a file to reduce its size will cause other table readers to have problems. I though vacuum had an exlusive lock on the table during vacuum, and if so, why are other backends having troubles? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, June 18, 1999 12:54 AM > To: Bruce Momjian > Cc: PostgreSQL-development; Inoue@tpf.co.jp > Subject: Re: [HACKERS] tables > 1 gig > > > Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> I think what we ought to do is finish working out how to make > mdtruncate > >> safe for concurrent backends, and then do it. That's the right > >> long-term answer anyway. > > > Problem is, no one knows how right now. I liked unlinking every > > segment, but was told by Hiroshi that causes a problem with concurrent > > access and vacuum because the old backends still think it is there. > > I haven't been paying much attention, but I imagine that what's really > going on here is that once vacuum has collected all the still-good > tuples at the front of the relation, it doesn't bother to go through > the remaining blocks of the relation and mark everything dead therein? > It just truncates the file after the last block that it put tuples into, > right? > > If this procedure works correctly for vacuuming a simple one-segment > table, then it would seem that truncation of all the later segments to > zero length should work correctly. > > You could truncate to zero length *and* then unlink the files if you > had a mind to do that, but I can see why unlink without truncate would > not work reliably. > Unlinking unused segments after truncating to zero length may cause the result such as Existent backends write to the truncated file to extend the relation while new backends create a new segment fileto extend the relation. Comments ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > > > I haven't been paying much attention, but I imagine that what's really > > > going on here is that once vacuum has collected all the still-good > > > tuples at the front of the relation, it doesn't bother to go through > > > the remaining blocks of the relation and mark everything dead therein? > > > It just truncates the file after the last block that it put > tuples into, > > > right? > > > > > > If this procedure works correctly for vacuuming a simple one-segment > > > table, then it would seem that truncation of all the later segments to > > > zero length should work correctly. > > > > Not sure about that. When we truncate single segment file, the table is > > being destroyed, so we invalidate it in the catalog cache and tell other > > backends. Also, we have a problem with DROP TABLE in a transaction > > while others are using it as described by a bug report a few days ago, > > so I don't think we have that 100% either. > > The problem is that (virtual) file descriptors,relcache entries ... etc are local to each process. I don't know the certain way to tell other processes just in time that target resources should be invalidated. > > That is interesting. I never thought of that. Hiroshi, can you test > > that idea? If it is the non-existance of the file that other backends > > are checking for, my earlier idea of rename() with truncated file kept > > in place may be better. > > > > Also, I see why we are not getting more bug reports. They only get this > > when the table looses a segment, so it is OK to vacuum large tables as > > long as the table doesn't loose a segment during the vacuum. > > OK, this is 100% wrong. We truncate from vacuum any time the table size > changes, and vacuum of large tables will fail even if not removing a > segment. I forgot vacuum does this to reduce disk table size. > > I wonder if truncating a file to reduce its size will cause other table > readers to have problems. Current implementation has a hidden bug. Once the size of a segment reached RELSEG_SIZE,mdnblocks() wouldn't check the real size of the segment any more. I'm not sure such other bugs doesn't exist any more. It's one of the reason why I don't recommend to apply my trial patch to mdtruncate(). > I though vacuum had an exlusive lock on the > table during vacuum, and if so, why are other backends having troubles? > We could not see any errors by unlinking segmented relations when commands are executed sequentailly. Vacuum calls RelationInvalidateHeapTuple() for a pg_class tuple and and other backends could recognize that the relcachle entry must be invalidated while executing StartTransaction() or CommandCounter Increment(). Even though the target relation is locked exclusively by vacuum,other backends could StartTransaction(),CommandCounterIncrement(), parse,analyze,rewrite,optimize,start Executor Stage and open relations. We could not rely on exclusive lock so much. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> Unlinking unused segments after truncating to zero length may cause > the result such as > > Existent backends write to the truncated file to extend the relation > while new backends create a new segment file to extend the relation. How about my idea of creating a truncated file, the renaming it to the table file. That keeps the table open for other open file descriptors, but put a zero-length file in place in an atomic manner. Fact is that the current code is really bad, so I request you do your best, and let's get it in there for people to review and improve if necessary. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us] > Sent: Friday, June 18, 1999 11:31 AM > To: Hiroshi Inoue > Cc: PostgreSQL-development > Subject: Re: [HACKERS] tables > 1 gig > > > > Unlinking unused segments after truncating to zero length may cause > > the result such as > > > > Existent backends write to the truncated file to extend > the relation > > while new backends create a new segment file to extend the > relation. > > How about my idea of creating a truncated file, the renaming it to the > table file. That keeps the table open for other open file descriptors, > but put a zero-length file in place in an atomic manner. > Sorry,I couldn't understand what you mean. What is differenct from truncating existent files to zero length ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > > Unlinking unused segments after truncating to zero length may cause > > > the result such as > > > > > > Existent backends write to the truncated file to extend > > the relation > > > while new backends create a new segment file to extend the > > relation. > > > > How about my idea of creating a truncated file, the renaming it to the > > table file. That keeps the table open for other open file descriptors, > > but put a zero-length file in place in an atomic manner. > > > > Sorry,I couldn't understand what you mean. > What is differenct from truncating existent files to zero length ? Glad to explain. Here is the pseudocode: create temp file, make it zero length, call it 'zz'rename(zz,tablename) What this does is to create a zero length file, and the rename unlinks the tablename file, and puts the zero-length file in it's place. rename() is atomic, so there is no time that the table file does not exist. It allows backends that have the table open via a descriptor to keep the table unchanged, while new backends see a zero-length file. Does this help? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > > > > How about my idea of creating a truncated file, the renaming it to the > > > table file. That keeps the table open for other open file > descriptors, > > > but put a zero-length file in place in an atomic manner. > > > > > > > Sorry,I couldn't understand what you mean. > > What is differenct from truncating existent files to zero length ? > > Glad to explain. Here is the pseudocode: > > create temp file, make it zero length, call it 'zz' > rename(zz,tablename) > > What this does is to create a zero length file, and the rename unlinks > the tablename file, and puts the zero-length file in it's place. > rename() is atomic, so there is no time that the table file does not > exist. > Leti1 be the inode of zzi2 be the inode of tablename before rename(). Does this mean New backends read/write i1 inode and backends that have the table open read/write i2 inode ? If so,it seems wrong. All backends should see same data. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > What this does is to create a zero length file, and the rename unlinks > > the tablename file, and puts the zero-length file in it's place. > > rename() is atomic, so there is no time that the table file does not > > exist. > > > > Let > i1 be the inode of zz > i2 be the inode of tablename > before rename(). > > Does this mean > > New backends read/write i1 inode and > backends that have the table open read/write i2 inode ? > > If so,it seems wrong. > All backends should see same data. Yes, I can see your point. It would show them different views of the table. So, as you were saying, we have no way of invalidating file descriptors of other backends for secondary segments. Why does truncating the file not work? Any ideas? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > > What this does is to create a zero length file, and the rename unlinks > > > the tablename file, and puts the zero-length file in it's place. > > > rename() is atomic, so there is no time that the table file does not > > > exist. > > > > > > > Let > > i1 be the inode of zz > > i2 be the inode of tablename > > before rename(). > > > > Does this mean > > > > New backends read/write i1 inode and > > backends that have the table open read/write i2 inode ? > > > > If so,it seems wrong. > > All backends should see same data. > > Yes, I can see your point. It would show them different views of the > table. > > So, as you were saying, we have no way of invalidating file descriptors > of other backends for secondary segments. It seems DROP TABLE has a similar problem. It has been already solved ? > Why does truncating the file > not work? Any ideas? > I have gotten no bug reports for my trial implementation. AFAIK,only Ole Gjerde has tested my patch. Is it sufficient ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > Yes, I can see your point. It would show them different views of the > > table. > > > > So, as you were saying, we have no way of invalidating file descriptors > > of other backends for secondary segments. > > It seems DROP TABLE has a similar problem. > It has been already solved ? Not solved. Someone reported it recently. > > > Why does truncating the file > > not work? Any ideas? > > > > I have gotten no bug reports for my trial implementation. > AFAIK,only Ole Gjerde has tested my patch. > Is it sufficient ? Yes. We need something, and maybe after we add it, people can do testing and find any problems. It is better to apply it than to leave it as it currently exists, no? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > > Yes, I can see your point. It would show them different views of the > > > table. > > > > > > So, as you were saying, we have no way of invalidating file > descriptors > > > of other backends for secondary segments. > > > > > Why does truncating the file > > > not work? Any ideas? > > > > > > > I have gotten no bug reports for my trial implementation. > > AFAIK,only Ole Gjerde has tested my patch. > > Is it sufficient ? > > Yes. We need something, and maybe after we add it, people can do > testing and find any problems. It is better to apply it than to leave > it as it currently exists, no? > OK,here is my patch for PostgreSQL6.5-release. Regards. Hiroshi Inoue Inoue@tpf.co.jp *** storage/smgr/md.c.orig Fri Jun 11 12:20:06 1999 --- storage/smgr/md.c Fri Jun 18 15:10:54 1999 *************** *** 674,684 **** segno = 0; for (;;) { ! if (v->mdfd_lstbcnt == RELSEG_SIZE ! || (nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ)) == RELSEG_SIZE) { - - v->mdfd_lstbcnt = RELSEG_SIZE; segno++; if (v->mdfd_chain == (MdfdVec *) NULL) --- 674,685 ---- segno = 0; for (;;) { ! nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ); ! if (nblocks > RELSEG_SIZE) ! elog(FATAL, "segment too big in mdnblocks!"); ! v->mdfd_lstbcnt = nblocks; ! if (nblocks == RELSEG_SIZE) { segno++; if (v->mdfd_chain == (MdfdVec *) NULL) *************** *** 711,732 **** MdfdVec *v; #ifndef LET_OS_MANAGE_FILESIZE ! int curnblk; curnblk = mdnblocks(reln); ! if (curnblk / RELSEG_SIZE > 0) ! { ! elog(NOTICE, "Can't truncate multi-segments relation %s", ! reln->rd_rel->relname.data); ! return curnblk; ! } #endif fd = RelationGetFile(reln); v = &Md_fdvec[fd]; if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ)< 0) return -1; return nblocks; --- 712,766 ---- MdfdVec *v; #ifndef LET_OS_MANAGE_FILESIZE ! int curnblk, ! i, ! oldsegno, ! newsegno, ! lastsegblocks; ! MdfdVec **varray; curnblk = mdnblocks(reln); ! if (nblocks > curnblk) ! return -1; ! oldsegno = curnblk / RELSEG_SIZE; ! newsegno = nblocks / RELSEG_SIZE; ! #endif fd = RelationGetFile(reln); v = &Md_fdvec[fd]; + #ifndef LET_OS_MANAGE_FILESIZE + varray = (MdfdVec **)palloc((oldsegno + 1) * sizeof(MdfdVec *)); + for (i = 0; i <= oldsegno; i++) + { + if (!v) + elog(ERROR,"segment isn't open in mdtruncate!"); + varray[i] = v; + v = v->mdfd_chain; + } + for (i = oldsegno; i > newsegno; i--) + { + v = varray[i]; + if (FileTruncate(v->mdfd_vfd, 0) < 0) + { + pfree(varray); + return -1; + } + v->mdfd_lstbcnt = 0; + } + /* Calculate the # of blocks in the last segment */ + lastsegblocks = nblocks - (newsegno * RELSEG_SIZE); + v = varray[i]; + pfree(varray); + if (FileTruncate(v->mdfd_vfd, lastsegblocks * BLCKSZ) < 0) + return -1; + v->mdfd_lstbcnt = lastsegblocks; + #else if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0) return -1; + v->mdfd_lstbcnt = nblocks; + #endif return nblocks;
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Unlinking unused segments after truncating to zero length may cause > the result such as > Existent backends write to the truncated file to extend the relation > while new backends create a new segment file to extend the relation. Ooh, good point. So, unless we want to invent some way for the process that's running vacuum to force other backends to close their FDs for segment files, the *only* correct solution is to truncate to zero length but leave the files in place. I still don't quite see why there is such a big problem, however, unless you're asserting that vacuum is broken for single-segment tables too. Surely vacuum acquires a lock over the whole table, not just a segment? regards, tom lane
Thank you. Applied. [Charset iso-8859-1 unsupported, filtering to ASCII...] > > > > > > Yes, I can see your point. It would show them different views of the > > > > table. > > > > > > > > So, as you were saying, we have no way of invalidating file > > descriptors > > > > of other backends for secondary segments. > > > > > > > Why does truncating the file > > > > not work? Any ideas? > > > > > > > > > > I have gotten no bug reports for my trial implementation. > > > AFAIK,only Ole Gjerde has tested my patch. > > > Is it sufficient ? > > > > Yes. We need something, and maybe after we add it, people can do > > testing and find any problems. It is better to apply it than to leave > > it as it currently exists, no? > > > > OK,here is my patch for PostgreSQL6.5-release. > > Regards. > > Hiroshi Inoue > Inoue@tpf.co.jp > > *** storage/smgr/md.c.orig Fri Jun 11 12:20:06 1999 > --- storage/smgr/md.c Fri Jun 18 15:10:54 1999 > *************** > *** 674,684 **** > segno = 0; > for (;;) > { > ! if (v->mdfd_lstbcnt == RELSEG_SIZE > ! || (nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ)) == RELSEG_SIZE) > { > - > - v->mdfd_lstbcnt = RELSEG_SIZE; > segno++; > > if (v->mdfd_chain == (MdfdVec *) NULL) > --- 674,685 ---- > segno = 0; > for (;;) > { > ! nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ); > ! if (nblocks > RELSEG_SIZE) > ! elog(FATAL, "segment too big in mdnblocks!"); > ! v->mdfd_lstbcnt = nblocks; > ! if (nblocks == RELSEG_SIZE) > { > segno++; > > if (v->mdfd_chain == (MdfdVec *) NULL) > *************** > *** 711,732 **** > MdfdVec *v; > > #ifndef LET_OS_MANAGE_FILESIZE > ! int curnblk; > > curnblk = mdnblocks(reln); > ! if (curnblk / RELSEG_SIZE > 0) > ! { > ! elog(NOTICE, "Can't truncate multi-segments relation %s", > ! reln->rd_rel->relname.data); > ! return curnblk; > ! } > #endif > > fd = RelationGetFile(reln); > v = &Md_fdvec[fd]; > > if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0) > return -1; > > return nblocks; > > --- 712,766 ---- > MdfdVec *v; > > #ifndef LET_OS_MANAGE_FILESIZE > ! int curnblk, > ! i, > ! oldsegno, > ! newsegno, > ! lastsegblocks; > ! MdfdVec **varray; > > curnblk = mdnblocks(reln); > ! if (nblocks > curnblk) > ! return -1; > ! oldsegno = curnblk / RELSEG_SIZE; > ! newsegno = nblocks / RELSEG_SIZE; > ! > #endif > > fd = RelationGetFile(reln); > v = &Md_fdvec[fd]; > > + #ifndef LET_OS_MANAGE_FILESIZE > + varray = (MdfdVec **)palloc((oldsegno + 1) * sizeof(MdfdVec *)); > + for (i = 0; i <= oldsegno; i++) > + { > + if (!v) > + elog(ERROR,"segment isn't open in mdtruncate!"); > + varray[i] = v; > + v = v->mdfd_chain; > + } > + for (i = oldsegno; i > newsegno; i--) > + { > + v = varray[i]; > + if (FileTruncate(v->mdfd_vfd, 0) < 0) > + { > + pfree(varray); > + return -1; > + } > + v->mdfd_lstbcnt = 0; > + } > + /* Calculate the # of blocks in the last segment */ > + lastsegblocks = nblocks - (newsegno * RELSEG_SIZE); > + v = varray[i]; > + pfree(varray); > + if (FileTruncate(v->mdfd_vfd, lastsegblocks * BLCKSZ) < 0) > + return -1; > + v->mdfd_lstbcnt = lastsegblocks; > + #else > if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0) > return -1; > + v->mdfd_lstbcnt = nblocks; > + #endif > > return nblocks; > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Fri, 18 Jun 1999, Bruce Momjian wrote: [snip - mdtruncate patch] While talking about this whole issue, there is one piece missing. Currently there is no way to dump a database/table over 2 GB. When it hits the 2GB OS limit, it just silently stops and gives no indication that it didn't finish. It's not a problem for me yet, but I'm getting very close. I have one database with 3 tables over 2GB(in postgres space), but they still come out under 2GB after a dump. I can't do a pg_dump on the whole database however, which would be very nice. I suppose it wouldn't be overly hard to have pg_dump/pg_dumpall do something similar to what postgres does with segments. I haven't looked at it yet however, so I can't say for sure. Comments? Ole Gjerde
Ole Gjerde wrote: > > On Fri, 18 Jun 1999, Bruce Momjian wrote: > [snip - mdtruncate patch] > > While talking about this whole issue, there is one piece missing. > Currently there is no way to dump a database/table over 2 GB. > When it hits the 2GB OS limit, it just silently stops and gives no > indication that it didn't finish. > > It's not a problem for me yet, but I'm getting very close. I have one > database with 3 tables over 2GB(in postgres space), but they still come > out under 2GB after a dump. I can't do a pg_dump on the whole database > however, which would be very nice. > > I suppose it wouldn't be overly hard to have pg_dump/pg_dumpall do > something similar to what postgres does with segments. I haven't looked > at it yet however, so I can't say for sure. > > Comments? As pg_dump writes to stdout, you can just use standard *nix tools: 1. use compressed dumps pg_dump really_big_db | gzip > really_big_db.dump.gz reload with gunzip -c really_big_db.dump.gz | psql newdb or cat really_big_db.dump.gz | gunzip | psql newdb 2. use split pg_dump really_big_db | split -b 1m - really_big_db.dump. reload with cat really_big_db.dump.* | pgsql newdb ----------------------- Hannu