RE: [HACKERS] Recovery on incomplete write - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject RE: [HACKERS] Recovery on incomplete write
Date
Msg-id 000701bf0f13$9c0790c0$2801007e@cadzone.tpf.co.jp
Whole thread Raw
List pgsql-hackers
>
> > -----Original Message-----
> > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> > Sent: Tuesday, September 28, 1999 11:54 PM
> > To: Tom Lane
> > Cc: Hiroshi Inoue; pgsql-hackers
> > Subject: Re: [HACKERS] Recovery on incomplete write
> >
> >
> > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > > > I have wondered that md.c handles incomplete block(page)s
> > > > correctly.
> > > > Am I mistaken ?
> > >
> > > I think you are right, and there may be some other trouble
> spots in that
> > > file too.  I remember thinking that the code depended heavily on never
> > > having a partial block at the end of the file.
> > >
> > > But is it worth fixing?  The only way I can see for the file length
> > > to become funny is if we run out of disk space part way
> through writing
> > > a page, which seems unlikely...
> > >
> >
> > That is how he got started, the TODO item about running out of disk
> > space causing corrupted databases.  I think it needs a fix, if we can.
> >
>
> Maybe it isn't so difficult to fix.
> I would provide a patch.
>

Here is a patch.

1) mdnblocks() ignores a partial block at the end of relation files.
2) mdread() ignores a partial block of block number 0.
3) mdextend() adjusts its position to a multiple of BLCKSZ   before writing.
4) mdextend() truncates extra bytes in case of incomplete write.

If there's no objection,I would commit this change to the current
tree.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

*** storage/smgr/md.c.orig    Thu Sep 30 10:50:58 1999
--- storage/smgr/md.c    Tue Oct  5 13:30:55 1999
***************
*** 233,239 **** int mdextend(Relation reln, char *buffer) {
!     long        pos;     int            nblocks;     MdfdVec    *v;

--- 233,239 ---- int mdextend(Relation reln, char *buffer) {
!     long        pos, nbytes;     int            nblocks;     MdfdVec    *v;

***************
*** 243,250 ****     if ((pos = FileSeek(v->mdfd_vfd, 0L, SEEK_END)) < 0)         return SM_FAIL;

!     if (FileWrite(v->mdfd_vfd, buffer, BLCKSZ) != BLCKSZ)         return SM_FAIL;
     /* remember that we did a write, so we can sync at xact commit */     v->mdfd_flags |= MDFD_DIRTY;
--- 243,264 ----     if ((pos = FileSeek(v->mdfd_vfd, 0L, SEEK_END)) < 0)         return SM_FAIL;

!     if (pos % BLCKSZ != 0) /* the last block is incomplete */
!     {
!         pos = BLCKSZ * (long)(pos / BLCKSZ);
!         if (FileSeek(v->mdfd_vfd, pos, SEEK_SET) < 0)
!             return SM_FAIL;
!     }
!
!     if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ)) != BLCKSZ)
!     {
!         if (nbytes > 0)
!         {
!             FileTruncate(v->mdfd_vfd, pos);
!             FileSeek(v->mdfd_vfd, pos, SEEK_SET);
!         }         return SM_FAIL;
+     }
     /* remember that we did a write, so we can sync at xact commit */     v->mdfd_flags |= MDFD_DIRTY;
***************
*** 432,437 ****
--- 446,453 ----     {         if (nbytes == 0)             MemSet(buffer, 0, BLCKSZ);
+         else if (blocknum == 0 && nbytes > 0 && mdnblocks(reln) == 0)
+             MemSet(buffer, 0, BLCKSZ);         else             status = SM_FAIL;     }
***************
*** 1067,1072 **** {     long        len;

!     len = FileSeek(file, 0L, SEEK_END) - 1;
!     return (BlockNumber) ((len < 0) ? 0 : 1 + len / blcksz); }
--- 1083,1088 ---- {     long        len;

!     len = FileSeek(file, 0L, SEEK_END);
!     return (BlockNumber) (len / blcksz); }




pgsql-hackers by date:

Previous
From: Jose Antonio Cotelo lema
Date:
Subject: User types using large objects. Is it really possible?
Next
From: "Hiroshi Inoue"
Date:
Subject: Questions about bufmgr