Re: PostgreSQL for VAX on NetBSD/OpenBSD - Mailing list pgsql-hackers

From Greg Stark
Subject Re: PostgreSQL for VAX on NetBSD/OpenBSD
Date
Msg-id CAM-w4HNNOU90k6uAuDyOArp11mBcYVb9xR_kDZXQE9A3VFvnJg@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL for VAX on NetBSD/OpenBSD  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PostgreSQL for VAX on NetBSD/OpenBSD  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
[- the vax lists since they cause majordomo confirmation emails for
anyone responding]

On Thu, Aug 20, 2015 at 3:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> There are some planner tests that fail with floating point exceptions
>> -- that's probably a bug on our part. And I've seen at least one
>> server crash (maybe two) apparently caused by one as well which I
>> don't believe is expected.
>
> That seems worth poking into.

Mea culpa. Not a planner crash but rather an overflow from exp(). It
turns out exp() and other math library functions on Vax do not signal
FPE but rather have a curious api that lets us catch the overflow by
defining a function "infnan()" to call when it overflows. If we don't
define that function then it executes an illegal instruction which
generates SIGILL with errno set to EDOM (iirc). For the moment I've
just attached our FPE handler to SIGILL and that's letting me run the
tests without crashes. It's probably just silly make-work but it would
be pretty easy to add a simple function to call our FPE handler
directly to avoid having to have a SIGILL handler which seems like a
bad idea in general.

>> 4) One of the tablesample tests seems to freeze indefinitely. I
>> haven't looked into why yet. That might indeed indicate that the
>> spinlock code isn't working?
>
> The tablesample tests seem like a not-very-likely first place for such a
> thing to manifest.  What I'm thinking is that there are places in there
> where we loop till we get an expected result.  Offhand I thought they were
> all integer math; but if one was float and the VAX code wasn't doing what
> was expected, maybe we could blame this on float discrepancies as well.

The hang is actually in the groupingset tests in
bipartite_match.c:hk_breadth_search().

Looking at that function it's not surprising that it doesn't work
without IEEE floats given that the first line is distance[0] = get_float4_infinity();

And the return value of the function is !isinf(distance[0]);


The other place where non-IEEE floats are causing problems internal to
postgres appears to be inside spgist -- even when planning queries
using spgist:
 EXPLAIN (COSTS OFF) SELECT count(*) FROM radix_text_tbl WHERE t <    'Aztec          Ct  ';
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This
probably means an out-of-range result or an invalid operation, such as
division by zero.

Other than these two places I think all the other failures are
user-visible arithmetic producing different results or getting SIGFPE
instead of displaying Inf/-Inf/NaN values. Some of it seems rather
suspect results but I assume there's some numerically sensitive
arithmetic that's producing it.


-- 
greg



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Error message with plpgsql CONTINUE
Next
From: Jim Nasby
Date:
Subject: Re: Error message with plpgsql CONTINUE