Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date
Msg-id 20120725170113.GA22286@momjian.us
Whole thread Raw
Responses Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Re: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
List pgsql-hackers
On Thu, Jul 19, 2012 at 02:24:36AM -0400, Bruce Momjian wrote:
> On Wed, Jul 18, 2012 at 01:30:34AM -0400, Tom Lane wrote:
> > I wrote:
> > > ... I'm wondering if maybe
> > > Solaris has a weird definition of struct dirent that breaks the
> > > calculation used here:
> >
> > >         entrysize = sizeof(struct dirent) - sizeof(direntry->d_name) +
> > >             strlen(direntry->d_name) + 1;
> >
> > Some googling suggests that on Solaris, this calculation would actually
> > give a result that's a little too *large* not too small, because of
> > padding in the declaration of struct dirent.  That would not be a
> > problem so far as callers were concerned, but it's conceivable that the
> > memcpy in load_directory would run off the end of memory and crash.
> > Could that explain Mike's problem?  You'd have to assume that the stack
> > trace he showed us had omitted to mention load_directory, but I don't
> > have a huge problem with making such an assumption.  Anyway the offsetof
> > formulation should avoid this hazard, so I'm still interested to know
> > if that helps.
>
> Great to hear this fixed the problem.  The only way I can see this
> causing a crash is:
>
>     * the padding of direntry->d_name is less than the padding of
>       struct dirent
>     * there is a non-mapped memory range after the struct
>     * the file name is near the full length of d_name
>
> Those are pretty rare odds.  The only other issue is that this code
> assumes direntry->d_name is the last element of the struct, and I am not
> sure how we can assume that is true.
>
> Also, this code tries to be tricky by reducing the allocated memory
> instead of allocating the maximum path.  Shouldn't there a comment
> about the propose of this code?

This bug report was fixed mostly via private email because a private
schema dump was sent to Tom and me.  Tom fixed the problem by changing
the way we interact with struct dirent, but neither of us is sure
exactly why the fix worked.  It is something about Solaris.

Tom suggested that load_directory() return a (char *) array, rather than
a struct dirent array, greatly simplifying the code.

I have done this in the attached patch, and because of the uncertainty
of ths fix, I would like to apply this to 9.2 as well.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: Notify system doesn't recover from "No space" error
Next
From: Alvaro Herrera
Date:
Subject: filenames in pg_basebackup