Re: BUG #15420: Server crash. Segmentation fault when parsing xml file - Mailing list pgsql-bugs

From Sergey Mirvoda
Subject Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Date
Msg-id CALkWAriUN-6GsYyURvAB5f5+HsDbb_bx1YgsXMjs0xsMvCd-xQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15420: Server crash. Segmentation fault when parsing xml file  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: BUG #15420: Server crash. Segmentation fault when parsing xml file  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-bugs

On Fri, Oct 5, 2018 at 10:08 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:

 >> You're sure about that libxml2 version? I can reproduce a crash on
 >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
 >> message instead)

 Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
 Andrey> brew was not used.

 Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
 Andrey> xmllint: using libxml version 20904

I have a complete diagnosis of why it crashes on 2.9.4, and I can see
why it does not crash the same way on 2.9.7, but I would not bet
anything on 2.9.7 not having some comparable issue.

What happens on 2.9.4 is this (this is all inside libxml2):

 - at some point when parsing an element tag, the code decides to raise
   a fatal error and call xmlHaltParser

 - xmlHaltParser works by resetting the input buffer's "base" and "cur"
   pointers to point to a literal "" in the code (thus, a null byte)

 - xmlParseStartTag2 detects that input->base has changed, and assumes
   that this is because the buffer got reallocated; in the process of
   dealing with this, it resets input->cur to input->base + cur where
   "cur" is a local variable holding the previous offset in the buffer
   (which is now of course nonsense, so input->cur points into the
   weeds)

 - something later tries to access the byte at *input->cur and likely
   crashes (depending on many random factors, including load addresses
   of shared libraries and where in the buffer the original error was
   detected)

Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
reallocations differently so it doesn't fail the same way (it no longer
tries to modify input->cur). But there are so many ways that this error
path can screw itself up that I honestly would not trust it for one
second.

--
Andrew (irc:RhodiumToad)


Sorry for top posting and spelling, T9 and mobile gmail not very usable.

Some notes.

if i set xmloption to document

this code works as expected
postgres=# select d::xml from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
....
postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
 xml_is_well_formed
--------------------
 t
(1 строка)

but all other XML functions still crashing server

for example:
postgres=# select  xpath_exists('//СвЮЛ'::text,d::xml) from convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);

--
--Regards, Sergey Mirvoda

pgsql-bugs by date:

Previous
From: Sergey Mirvoda
Date:
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Next
From: Andrew Gierth
Date:
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file