Thread: make check hangs in alpha5

make check hangs in alpha5

From

Hitoshi Harada

Date:

04 April 2010, 04:59:43

* Environment:
CentOS 5.2 on VirtualBox 3.0.6 of WindowsXP SP3

$ uname -a
Linux localhost.localdomain 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16
12:03:43 EST 2008 i686 i686 i386 GNU/Linux

$ gcc -v
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
--with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)

* Fact:
In Alpha4, Alpha5 and some snapshots from about middle of Jan, "make
check" hangs with an error

./pg_regress --inputdir=. --dlpath=. --multibyte=SQL_ASCII
--temp-install=./tmp_check --top-builddir=../../..
--schedule=./parallel_schedule
============== creating temporary installation        ==============
============== initializing database system           ==============
============== starting postmaster                    ==============

pg_regress: postmaster did not respond within 60 seconds
Examine /home/forcia/work/postgresql/src/test/regress/log/postmaster.log
for the reason
make[2]: *** [check] Error 2
make[2]: Leaving directory `/home/forcia/work/postgresql/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/forcia/work/postgresql/src/test'
make: *** [check] Error 2

$ cat src/test/regress/log/postmaster.log
LOG:  database system was shut down at 2010-04-04 16:35:37 JST
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
== end of log ==

Attached is another log with debug5. I run "make maintainer-clean"
every time before "make check".

$ ./configure CFLAGS= --enable-debug --enable-cassert

It seems postgres starts up successfully with aux processes but
doesn't accept neither UNIX domain socket / TCP socket. When I run
postgres from console with the same option that pg_regress forks it
accepts. This occurs only when pg_regress spawns postgres process.

* My investigation so far:
My bisect-ish (manual, actually :) research found that the failure
begins with the commit:
<Introduce two new libpq connection functions, PQconnectdbParams and>
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=cc6c533b43b5244cf4af58709340bb7b9af26cce

But this commit is OK:
<Fix bug in wasender's xlogid boundary handling, reported by Erik Rijkers.>
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=c46f3447f1efbc9fbc30ce5d5b9c65bd75e10b89

The problem isn't in libpq, since it is that the server doesn't listen
on startup as above. No /tmp/.s.PGSQL.5432, nor LISTENING entry in
netstat. And not only the target version psql but 8.4's psql cannot
connect to the database.

I cannot figure out at all what is wrong. Have any idea?

Regards,


--
Hitoshi Harada

Attachment

postmaster.log.bad

Re: make check hangs in alpha5

From

Tom Lane

Date:

04 April 2010, 13:19:50

Hitoshi Harada <umi.tanuki@gmail.com> writes:
> I cannot figure out at all what is wrong. Have any idea?

Since nobody else is reporting this, it seems like it must be either
something messed up about your system, or something wrong with your
copy of the PG sources.  In the latter connection I confess to having
little confidence in the git mirror setup.  Have you tried diffing
your source tree against a CVS checkout or nightly snapshot tarball?

A different line of attack is to insert some debugging logging into
the postmaster's port-opening code to see if you can find why it's
apparently not doing anything.
        regards, tom lane

Re: make check hangs in alpha5

From

Andrew Dunstan

Date:

04 April 2010, 13:52:40

Hitoshi Harada wrote:
> The problem isn't in libpq, since it is that the server doesn't listen
> on startup as above. No /tmp/.s.PGSQL.5432, nor LISTENING entry in
> netstat.

I find this somewhat implausible. The postmaster has this code that 
makes it die if it can't open a listening socket:
       if (ListenSocket[0] == PGINVALID_SOCKET)           ereport(FATAL,                   (errmsg("no socket created
forlistening")));

Perhaps you could strace the execution.

Of course, pg_regress doesn't usually run on port 5432, IIRC, so maybe 
you're looking for the wrong thing.

Another question worth asking is whether or not either SELinux or 
firewall settings on your CentOS box are interfering with connectivity.

cheers

andrew

Re: make check hangs in alpha5

From

Josh Berkus

Date:

04 April 2010, 14:13:31

On 4/4/10 9:19 AM, Tom Lane wrote:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> I cannot figure out at all what is wrong. Have any idea?

We ran Make Check on 4 linux laptops and 3 macs yesterday, and did not
see a hang.

Could this be an issue with VirtualBox?  Have you used this VM for
testing before?

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: make check hangs in alpha5

From

Hitoshi Harada

Date:

04 April 2010, 21:54:40

2010/4/5 Tom Lane <tgl@sss.pgh.pa.us>:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> I cannot figure out at all what is wrong. Have any idea?
>
> Since nobody else is reporting this, it seems like it must be either
> something messed up about your system, or something wrong with your
> copy of the PG sources.  In the latter connection I confess to having
> little confidence in the git mirror setup.  Have you tried diffing
> your source tree against a CVS checkout or nightly snapshot tarball?

I've tried clean tarball of alpha4, alpha5, so source messing seems
less possible. I'll try CVS HEAD. I've check various real (not VM)
Linux boxes and didn't hit the same situation. I know it's something
around environment issue.

>
> A different line of attack is to insert some debugging logging into
> the postmaster's port-opening code to see if you can find why it's
> apparently not doing anything.

I'll try it later. But debugging with gdb seems quite clean, bind()
and other initializations looks good and postmaster is in
ServerLoop().

Regards,

--
Hitoshi Harada

Re: make check hangs in alpha5

From

Hitoshi Harada

Date:

04 April 2010, 22:29:12

2010/4/5 Josh Berkus <josh@agliodbs.com>:
> On 4/4/10 9:19 AM, Tom Lane wrote:
>> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>>> I cannot figure out at all what is wrong. Have any idea?
>
> We ran Make Check on 4 linux laptops and 3 macs yesterday, and did not
> see a hang.
>
> Could this be an issue with VirtualBox?  Have you used this VM for
> testing before?
>

Yeah, this VM has mostly been used to develop my window function's
patches, so this is critical for me.

Regards,

--
Hitoshi Harada

Re: make check hangs in alpha5

From

Giles Lean

Date:

05 April 2010, 01:39:14

Josh Berkus <josh@agliodbs.com> wrote:

> Could this be an issue with VirtualBox?  Have you used this VM for
> testing before?

As I've hit a few bugs in VirtualBox, this is a definite
possibility.  (So is Tom's suggestion of inconsistent
sources.)

Because I could, I just installed a new CentOS 5.4 (no
updates, 64 bit) VM on VirtualBox 3.1.6 hosted on OX X 10.5.8
(Leopard) with bridged networking.

I installed postgresql-9.0alpha5-revised.tar.bz2, compiled it
with the default options, ran 'make check' and everything
passed:
 =======================  All 122 tests passed.  =======================

I'm not sure that this helps much as it doesn't rule out
VirtualBox, but my experience with it running Linux has
been positive.  I do reboot the host at the first hint of
trouble and that fixes many ills; VirtualBox is not what
I would consider production ready, but it's competitive
and I switched to it from Parallels which I'd paid for.

I'd trial VMware but it doesn't claim to support the *BSDs
which is a feature I need, and I've seen at least one report
where it didn't work well with FreeBSD.  That's hearsay, of
course, but it makes me cautious, particularly when I've got
something that works "well enough".

Regards,

Giles

Re: make check hangs in alpha5

From

Devrim GÜNDÜZ

Date:

05 April 2010, 08:29:05

On Sun, 2010-04-04 at 12:19 -0400, Tom Lane wrote:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
> > I cannot figure out at all what is wrong. Have any idea?
>
> Since nobody else is reporting this, it seems like it must be either
> something messed up about your system, or something wrong with your
> copy of the PG sources.

The only times that I see this issue over the last years is when I build
PostgreSQL RPMs in the xen-based virtual machines. It is rare, and it
happens when there are 10+ parallel builds.

...I'm pretty sure that it is related to the available resources during
build time...
--
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: make check hangs in alpha5

From

Hitoshi Harada

Date:

05 April 2010, 12:49:27

2010/4/5 Giles Lean <giles.lean@pobox.com>:
>
> Josh Berkus <josh@agliodbs.com> wrote:
>
>> Could this be an issue with VirtualBox?  Have you used this VM for
>> testing before?
>
> I'm not sure that this helps much as it doesn't rule out
> VirtualBox, but my experience with it running Linux has
> been positive.  I do reboot the host at the first hint of
> trouble and that fixes many ills; VirtualBox is not what
> I would consider production ready, but it's competitive
> and I switched to it from Parallels which I'd paid for.

Just note, I rebooted the guest VM today and retried but things are as
before. The host reboot doesn't affect either.

I also tried another CentOS5.4 VM on the same VirtualBox and succeeded
to build. Another RHEL Server 5.2 (Tikanga) x86_64 real machine can
also build alpha5. Seems quite hopeless but it's surely reproducible.

Regards,

--
Hitoshi Harada

Re: make check hangs in alpha5

From

Giles Lean

Date:

05 April 2010, 22:15:14

Hitoshi Harada <umi.tanuki@gmail.com> wrote:

> Just note, I rebooted the guest VM today and retried but things are as
> before. The host reboot doesn't affect either.

Bad luck. :-(

> I also tried another CentOS5.4 VM on the same VirtualBox and succeeded
> to build. Another RHEL Server 5.2 (Tikanga) x86_64 real machine can
> also build alpha5.

Well, that sounds like a workaround: at least you can continue work,
although it doesn't leave a lot of confidence that you won't have to
reinstall _another_ virtual machine from time to time. :-(

> Seems quite hopeless but it's surely reproducible.

They always are, but it can take a while.  Back when I used to work in
kernel support we had one problem take a year.  We were Not Pleased, to
put it mildly. (Nor was the main customer seeing it.)  We got to the
answer in the end, but oh boy, it was ugly on the way, and not our
finest moment.

(Answer was that someone left 'volatile' off a variable declaration, so
it was a race condition.  A race condition that came and went as code
around the incorrect line of code was changed, so only some patch levels
were affected.  Plus, like any race condition, customers either tended
never to see it or tended to hit it semi-regularly.  It was
... difficult.  It was't me who solved it; fortunately qsomeone else
"owned" it and eventually thought to disassemble all the different patch
levels.  (He was looking for a compiler bug, unsurprisingly; when the
problem first occurred he had only one patch version to deal with as
well, so that approach didn't suggest itself first off, and everyone
(where "everyone" was quite a lot of people) overlooked the missing
'volatile' during code reviews.)

Back to VirtualBox: it blew up on me _again_ yesterday, refusing to boot
FreeBSD as anything but 32 bit, no matter how I coaxed it and no matter
that it had been running a 64bit installation for weeks before that. As
booting a FreeBSD 64 bit kernel in 32 bit mode doesn't work (surprise!)
I have exorcised VirtualBox from my system (like Parallels before it,
which was even worse), and am deciding today qwhat real hardware I shall
buy to replace VirtualBox.  (Before anyone suggests VMware Fusion:
they're honest enough to admit to not supporting one of the operating
systems I care about, so that's not a solution, unfortunately.)

So I can't test with VirtualBox anymore to help out, sorry.

Giles