Re: Various intermittent bugs/instability - how to debug? - Mailing list pgsql-general

From Mark Cave-Ayland
Subject Re: Various intermittent bugs/instability - how to debug?
Date
Msg-id 48C7889E.9010905@siriusit.co.uk
Whole thread Raw
In response to Various intermittent bugs/instability - how to debug?  (Frederik Ramm <frederik.ramm@geofabrik.de>)
List pgsql-general
Frederik Ramm wrote:

> Dear
> PostgreSQL community,
>
>    I hope you can help me with a problem I'm having - I'm stuck and
> don't know how to debug this further.
>
> I have a rather large nightly process that imports a lot of data from
> the OpenStreetMap project into a PostGIS database, then proceeds doing
> all sorts of things - creating spatial indexes, computing bounding
> boxes, doing simplification of geometries, that kind of stuff. The whole
> job usually takes about five hours.
>
> I'm running this on a Quad-Core Linux (Ubuntu, PostgreSQL 8.3) machine
> with 8 GB RAM.
>
> Every other night, the process aborts with some strange error message,
> and never at the same position:
>
> ERROR:  invalid page header in block 166406 of relation "node_tags"
>
> ERROR:  could not open segment 2 of relation 1663/24253056/24253895
> (target block 1421295656): No such file or directory
>
> ERROR:  Unknown geometry type: 10
>
> When I continue the process after the failure, it will usually work.
>
> I know you all think "hardware problem" now. Of course this was my first
> guess as well. I ran a memory test for a night, no results; I downgraded
> do "failsafe defaults" for all BIOS timings, again no change. Ran
> "cpuburn" and all sorts of other things to grill the hardware - nothing.
>
> Then I bought an entirely new machine; similar setup, but using a
> Gigabyte instead of Asus mainboard, different chipset, slightly faster
> Quad-Core processor, and again 8 GB RAM and Ubuntu "Hardy" with
> PostgresSQL 8.3 and matching PostGIS.
>
> Believe it or not, this machine shows the *same* problems. It is not
> 100% reproducible, sometimes the job works fully, but every other day it
> just breaks down with one of the funny messages like above. No memtest
> errors here either.
>
> Both machines are "consumer" quality, i.e. normal Intel processors and
> not the "server" (Xeon) stock.
>
> I am at a loss - how can I proceed? This looks like a hardware problem
> alright, but so simliar problems on two so different machines? Is there
> something wrong with Intel's Quad-Core CPUs?
>
> What could I do to have a better chance of reproducing the error and
> ultimately identifying the component responsible? Is there some kind of
> "PostgresSQL load test", something like "cpuburn" for PostgreSQL?
>
> Have there been other reports of intermittent problems like mine, and
> does anybody have any blind guesses...?
>
> Thanks
> Frederik


Hi Frederik,

We did find a memory clobber in the PostGIS ANALYZE routine a while
back, but the fix hasn't yet made it into a release.

If you are building from source, please can you try applying the patch
here: http://code.google.com/p/postgis/issues/detail?id=43 and reporting
back whether it helps or not?


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

pgsql-general by date:

Previous
From: "Artis Caune"
Date:
Subject: Re: Server installation problem using freebsd ports
Next
From: Simon Riggs
Date:
Subject: Re: PostgreSQL TPC-H test result?