Thread: Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
From
"Oliver Elphick"
Date:
[postgresql lists added to Cc in hope of elucidation] Adam Di Carlo wrote: > >[Background: PostgreSQL is causing extremely hard crashes on my Sun4u >(Ultra5) Debian SPARC system. Anyone should be able to reproduce this >by installing the postgresql-test environment, and running: > > # cd /usr/lib/postgresql/test/regress > # chown -R postgres . > # su - postgres > $ cd /usr/lib/postgresql/test/regress > $ make runtest > >BEWARE -- this hard crashes my system. You may crash hard; you may >lose data. > >Note: I am running a mostly up-to-date 2.2.9 kernel (stock image from >potato) with the newest postgresql package (6.5.1-3 I believe). >] > >>That is very nasty -- and unexpected; I would like to report whatever >>information is available to pgsql-ports@postgresql.org. However, they >>will need to know exactly what was going on - logfile output, if available, >>progress through the test, test output file, if it survived. It doesn't >>seem at all like the problem that I thought I was asking you to look at. >>We should investigate whether there is some entirely separate cause. > >Yes. On followup, I am getting intermittant hard crashes when running >regress.sh or doing any operation with postgresql. Obviously, this is >more on the level of a sparc64 kernel problem, even, than a purely >postgres problem -- after all, no user process should be able to take >out the system this way. I regret that I have no experience with kernel debugging. > >My most recent crash has this output to 'make runtest': > >path .. ok >polygon .. ok >circle .. ok >geometry .. failed >timespan .. > >And in the postgres.log, with debugging at 4: > >plan: > >{ SEQSCAN :cost 43 :size 334 :width 16 :state <> :qptargetlist >({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 705 :restypmod -1 >:resname "one" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } >:expr { CONST :consttype 705 :constlen -1 :constisnull false >:constvalue 4 [ 0 0 0 4 ] :constbyval false }} { TARGETENTRY >:resdom { RESDOM :resno 2 :restype 600 :restypmod -1 :resname "f1" >:reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR >:varno 1 :varattno 1 :vartype 600 :vartypmod -1 :varlevelsup 0 >:varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16 :opType func >:oper { FUNC :funcid 1532 :functype 16 :funcisindex false :funcsize 0 >:func_fcache @ 0x0 :func_tlist ({ TARGETENTRY :resdom { RESDOM :resno >1 :restype 16 :restypmod -1 :resname "<noname>" :reskey 0 :reskeyop 0 >:resgroupref 0 :resjunk false } :expr { VAR :varno -1 :varattno 1 >:vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold -1 :varoattno 1}}) >:func_planlist <>} :args ({ VAR :varno 1 :varattno 1 :vartype 600 >:vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST >:consttype 600 :constlen 16 :constisnull false :constvalue 16 [ 64 >20 102 102 102 102 102 102 64 65 64 0 0 0 0 0 ] >:constbyval false })}) :lefttree <> :righttree <> :extprm () :locprm >() :initplan <> :nprm 0 :scanrelid 1 } > >ProcessQuery >CommitTransactionCommand >StartTransactionCommand >query: SELECT '' AS one, p1.f1 > FROM POINT_TBL p1 > WHERE p1.f1 ?| '(5.1,34.5)'::point; >parser outputs: > >{ QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal >false :isBinary false :isTemp false :unionall false :unique <> >:sortClause <> :rtable ({ RTE :relname point_tbl :refname p1 :relid >20864 :inh false :inFromCl true :skipAcl false}) :targetlist >({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 705 :restypmod -1 >:resname "one" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } >:expr { CONST :consttype 705 :constlen -1 :constisnull false >:constvalue 4 [ 0 0 0 4 ] :constbyval false }} { TARGETENTRY >:resdom { RESDOM :resno 2 :restype 600 :restypmod -1 >:resname "f1" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } >:expr { VAR :varno 1 :varattno 1 :vartype 600 :vartypmod -1 >:varlevelsup 0 :varnoold 1 :varoattno 1}}) :qual { EXPR :typeOid 16 >:opType op :oper { OPER :opno 809 >:opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype > >------ > >Output just stops there, with a hard crash to the system. not even a kernel oops output? > >-- >.....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/> Can postgresql developers tell from this what routine we are in when the crash occurs? I suppose that log output is buffered; where can we turn off buffering so that all possible output is saved to disk before the crash? -- Vote against SPAM: http://www.politik-digital.de/spam/ ======================================== Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID 32B8FAA1 ======================================== "And why call ye me, Lord, Lord, and do not the things which I say?" Luke 6:46
Re: [HACKERS] Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
From
Tom Lane
Date:
"Oliver Elphick" <olly@lfix.co.uk> writes: >> Yes. On followup, I am getting intermittant hard crashes when running >> regress.sh or doing any operation with postgresql. Obviously, this is >> more on the level of a sparc64 kernel problem, even, than a purely >> postgres problem -- after all, no user process should be able to take >> out the system this way. Yipes... > Can postgresql developers tell from this what routine we are in when the > crash occurs? I suppose that log output is buffered; where can we turn > off buffering so that all possible output is saved to disk before the > crash? The log is not nearly detailed enough to tell what routine we're in, even if there weren't the buffering problem. Also, given that this is a kernel crash, I'm not sure I'd assume that even fsync() after every line of output would ensure that the last line made it to disk. What you really want is a truss or strace log of kernel calls, anyhow, but there's still the problem of getting it out to disk before the crash. Better find a kernel-debugging expert to ask for advice... regards, tom lane
Re: [HACKERS] Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
From
Michael Alan Dorman
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > What you really want is a truss or strace log of kernel calls, anyhow, > but there's still the problem of getting it out to disk before the > crash. Better find a kernel-debugging expert to ask for advice... Serial terminal, or printer or some such hooked up to a serial port. Mike.
Re: [HACKERS] Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
From
Adam Di Carlo
Date:
>> Can postgresql developers tell from this what routine we are in when the >> crash occurs? I suppose that log output is buffered; where can we turn >> off buffering so that all possible output is saved to disk before the >> crash? > >The log is not nearly detailed enough to tell what routine we're in, >even if there weren't the buffering problem. Also, given that this is >a kernel crash, I'm not sure I'd assume that even fsync() after every >line of output would ensure that the last line made it to disk. > >What you really want is a truss or strace log of kernel calls, anyhow, >but there's still the problem of getting it out to disk before the >crash. Better find a kernel-debugging expert to ask for advice... Hopefully someone from the sparc or sparc64 team at Debian can look into this. I am going on business travel for 4 days so will be away from any Debian/SPARC machines for a while. These are the questions which need to be answered: * do other people running debian sparc finding the problem, using the recipe I mentioned in previous email? * Is it 2.2.9 specific? Sun4u specific? * get strace output as Tom suggests * shouldn't we notify the Sparc/Linux folks? -- .....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/>