Re: Intermittent "cache lookup failed for type" buildfarm failures - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Intermittent "cache lookup failed for type" buildfarm failures
Date
Msg-id CA+TgmoaZ6hTA--KLtzfy43KAiuS9PL-p9jHfy-BzHmEt+CR+Gg@mail.gmail.com
Whole thread Raw
In response to Intermittent "cache lookup failed for type" buildfarm failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Intermittent "cache lookup failed for type" buildfarm failures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Aug 16, 2016 at 2:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There is something rotten in the state of Denmark.  Here are four recent
> runs that failed with unexpected "cache lookup failed for type nnnn"
> errors:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grouse&dt=2016-08-16%2008%3A39%3A03
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nudibranch&dt=2016-08-13%2009%3A55%3A09
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2016-08-09%2001%3A46%3A17
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2016-08-09%2000%3A44%3A18
>
> The first two are on HEAD, the second two on 9.5, which seems to rule out
> my first thought that this has something to do with parallel query.  It's
> notable though that all the failing machines are PPC or S/390 ... maybe
> big-endian related?
>
> I grepped through the buildfarm logs and determined that there are exactly
> zero similar failures going back as far as 2016-04-01.  Now that we've had
> four in a week, it seems certain that this indicates a bug introduced at
> most a few days before Aug 9.  A quick trawl through the git logs finds
> no obvious candidates, though.

Well, it would have to be something that was back-patched to 9.5,
right?  That doesn't leave too many candidates.

[rhaas pgsql]$ git log --format=oneline --before='Aug 10' --after='Aug
6' REL9_5_STABLE src/backend/
04cee8f835bcf95ff80b734c335927aaf6551d2d Fix several one-byte buffer
over-reads in to_number
4da812fa8adb22874a937f1b000253fecf526cb0 Translation updates
98b0c6280667ce1efae763340fb2c13c81e4d706 Fix two errors with nested
CASE/WHEN constructs.
cb5c14984ad327e52dfb470fde466a5aca7d50a1 Fix misestimation of
n_distinct for a nearly-unique column with many nulls.
71dca408c0030ad76044c6b17367c9fbeac511ec Don't propagate a null
subtransaction snapshot up to parent transaction.

Obviously, the third and fourth of those seem like the most likely
candidates, but I don't have any theory on how either of them could be
causing this.

It would sure be nice if those cache lookup failure messages printed
the file and line number.  I wonder if we could teach psql to always
treat the VERBOSITY as verbose when the error code is XX000.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: condition variables
Next
From: Andres Freund
Date:
Subject: Re: Pluggable storage