>> You're sure about that libxml2 version? I can reproduce a crash on >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error >> message instead)
Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from Andrey> brew was not used.
Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version Andrey> xmllint: using libxml version 20904
I have a complete diagnosis of why it crashes on 2.9.4, and I can see why it does not crash the same way on 2.9.7, but I would not bet anything on 2.9.7 not having some comparable issue.
What happens on 2.9.4 is this (this is all inside libxml2):
- at some point when parsing an element tag, the code decides to raise a fatal error and call xmlHaltParser
- xmlHaltParser works by resetting the input buffer's "base" and "cur" pointers to point to a literal "" in the code (thus, a null byte)
- xmlParseStartTag2 detects that input->base has changed, and assumes that this is because the buffer got reallocated; in the process of dealing with this, it resets input->cur to input->base + cur where "cur" is a local variable holding the previous offset in the buffer (which is now of course nonsense, so input->cur points into the weeds)
- something later tries to access the byte at *input->cur and likely crashes (depending on many random factors, including load addresses of shared libraries and where in the buffer the original error was detected)
Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer reallocations differently so it doesn't fail the same way (it no longer tries to modify input->cur). But there are so many ways that this error path can screw itself up that I honestly would not trust it for one second.
-- Andrew (irc:RhodiumToad)
Sorry for top posting and spelling, T9 and mobile gmail not very usable.
Some notes.
if i set xmloption to document
this code works as expected
postgres=# select d::xml from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
.... postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d); xml_is_well_formed -------------------- t (1 строка)
but all other XML functions still crashing server
for example:
postgres=# select xpath_exists('//СвЮЛ'::text,d::xml) from convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);