Thread: BUG #15420: Server crash. Segmentation fault when parsing xml file

BUG #15420: Server crash. Segmentation fault when parsing xml file

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15420
Logged by:          Sergey Mirvoda
Email address:      sergey@mirvoda.com
PostgreSQL version: 10.5
Operating system:   Ubuntu 16.04.3 LTS
Description:

Tested on
9.6.2 (Windows server 2012 r2), 10.5 (Ubuntu 16.04.3 LTS), 12devel
(Ubuntu)
9.4.1  (Windows server 2012 r2) works fine.

Steps to reproduce
1. Place this file into PG_DATA directory
https://www.dropbox.com/s/upteflaye9r3fz7/EGRUL_FULL_2018-01-01_X.XML?dl=1

2. Run this query in psql 
select d::xml from
convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
g(d);

3. Notice connection crashed without restoring
4. Error log 
Sergey Mirvoda, [04.10.18 12:24]
2018-10-04 07:23:39.946 UTC [17114] LOG:  server process (PID 26155) was
terminated by signal 11: Segmentation fault
2018-10-04 07:23:39.946 UTC [17114] DETAIL:  Failed process was running:
select d::xml from
convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
g(d);
2018-10-04 07:23:39.946 UTC [17114] LOG:  terminating any other active
server processes
2018-10-04 07:23:39.946 UTC [26143] WARNING:  terminating connection because
of crash of another server process
2018-10-04 07:23:39.946 UTC [26143] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2018-10-04 07:23:39.946 UTC [26143] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2018-10-04 07:23:39.947 UTC [26146] postgres@egrul WARNING:  terminating
connection because of crash of another server process
2018-10-04 07:23:39.947 UTC [26146] postgres@egrul DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2018-10-04 07:23:39.947 UTC [26146] postgres@egrul HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2018-10-04 07:23:39.949 UTC [26157] postgres@postgres FATAL:  the database
system is in recovery mode
2018-10-04 07:23:39.969 UTC [17114] LOG:  all server processes terminated;
reinitializing
2018-10-04 07:23:40.011 UTC [26158] LOG:  database system was interrupted;
last known up at 2018-10-04 07:23:20 UTC
2018-10-04 07:23:40.942 UTC [26158] LOG:  database system was not properly
shut down; automatic recovery in progress
2018-10-04 07:23:40.947 UTC [26158] LOG:  redo starts at 18/32CAACA0
2018-10-04 07:23:40.947 UTC [26158] LOG:  invalid record length at
18/32CAACD8: wanted 24, got 0
2018-10-04 07:23:40.947 UTC [26158] LOG:  redo done at 18/32CAACA0
2018-10-04 07:23:40.976 UTC [17114] LOG:  database system is ready to accept
connections


We did a very quick research and believe the error is somewhere in
xmlParseBalancedChunkMemory handling code.


Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Michael Paquier
Date:
On Thu, Oct 04, 2018 at 08:57:34AM +0000, PG Bug reporting form wrote:
> Steps to reproduce
> 1. Place this file into PG_DATA directory
> https://www.dropbox.com/s/upteflaye9r3fz7/EGRUL_FULL_2018-01-01_X.XML?dl=1

If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Attachment

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:
Also this query works just fine against 9.4.1 on Windows

And xml_is_well_formed function returns true

postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);

 xml_is_well_formed
--------------------
 t

On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Oct 04, 2018 at 08:57:34AM +0000, PG Bug reporting form wrote:
> Steps to reproduce
> 1. Place this file into PG_DATA directory
> https://www.dropbox.com/s/upteflaye9r3fz7/EGRUL_FULL_2018-01-01_X.XML?dl=1

If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael


--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:


On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Looks like it is too big to send uncompressed, here it is in zip archive

--
--Regards, Sergey Mirvoda
Attachment

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:
Hi

čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Looks like it is too big to send uncompressed, here it is in zip archive

I am try to import this xml to Postgres with pgimportdoc


and looks like some libxml2 issue.

pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
pgimportdoc: Error: ERROR:  invalid XML content
DETAIL:  line 178950: internal error: Huge input lookup
� органе Пенсионного фонда Российской Федер
                                                                               ^
line 178950: attributes construct error




--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Michael Paquier
Date:
On Thu, Oct 04, 2018 at 12:18:05PM +0200, Pavel Stehule wrote:
> čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com>
> I am try to import this xml to Postgres with pgimportdoc
>
> https://github.com/okbob/pgimportdoc
>
> and looks like some libxml2 issue.
>
> pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
> pgimportdoc: Error: ERROR:  invalid XML content
> DETAIL:  line 178950: internal error: Huge input lookup
> � органе Пенсионного фонда Российской Федер
>
> ^
> line 178950: attributes construct error

Sergey, what is the version of libxml2 bundled with the Windows
installer and Ubuntu?  You should normally be able to see which version
it is on Windows by looking at the version number of the libxml2 dll...
However libxml2 upstream is not smart enough to produce a version file
if I recall correctly (please note last time I looked at that I had to
add to the upstream tree a custom patch to associate a correct version
to the produced dll, making sure that minor MSI upgrades were able to
update the version of libxml2 bundled in a Postgres MSI installer I
lately maintain).
--
Michael

Attachment

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


čt 4. 10. 2018 v 12:18 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Looks like it is too big to send uncompressed, here it is in zip archive

I am try to import this xml to Postgres with pgimportdoc


and looks like some libxml2 issue.

pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
pgimportdoc: Error: ERROR:  invalid XML content
DETAIL:  line 178950: internal error: Huge input lookup
� органе Пенсионного фонда Российской Федер
                                                                               ^
line 178950: attributes construct error

I checked Sergey's example, and it doesn't crash on Linux - The error is displayed correctly. Looks like MS Windows issue of libxml2

postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('error.xml'),'windows-1251') g(d);
┌────────────────────┐
│ xml_is_well_formed │
╞════════════════════╡
│ f                  │
└────────────────────┘
(1 row)

This issue can be enforced by relatively new libxml2 limits


Unfortunately, default configuration uses xmlParseBalancedChunkMemory for parsing content, and this function cannot to get option like
XML_PARSE_HUGE
So it is hard to fix it.

Regards

Pavel




--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:

чт, 4 окт. 2018, 16:20 Pavel Stehule <pavel.stehule@gmail.com>:


čt 4. 10. 2018 v 12:18 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Looks like it is too big to send uncompressed, here it is in zip archive

I am try to import this xml to Postgres with pgimportdoc


and looks like some libxml2 issue.

pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
pgimportdoc: Error: ERROR:  invalid XML content
DETAIL:  line 178950: internal error: Huge input lookup
� органе Пенсионного фонда Российской Федер
                                                                               ^
line 178950: attributes construct error

I checked Sergey's example, and it doesn't crash on Linux - The error is displayed correctly. Looks like MS Windows issue of libxml2

postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('error.xml'),'windows-1251') g(d);
┌────────────────────┐
│ xml_is_well_formed │
╞════════════════════╡
│ f                  │
└────────────────────┘
(1 row)

This issue can be enforced by relatively new libxml2 limits


Unfortunately, default configuration uses xmlParseBalancedChunkMemory for parsing content, and this function cannot to get option like
XML_PARSE_HUGE
So it is hard to fix it.

Regards

Pavel

Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

Additionally Andrey just reproduced this on his dev box. 

--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


čt 4. 10. 2018 v 13:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


čt 4. 10. 2018 v 12:18 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Looks like it is too big to send uncompressed, here it is in zip archive

I am try to import this xml to Postgres with pgimportdoc


and looks like some libxml2 issue.

pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
pgimportdoc: Error: ERROR:  invalid XML content
DETAIL:  line 178950: internal error: Huge input lookup
� органе Пенсионного фонда Российской Федер
                                                                               ^
line 178950: attributes construct error

I checked Sergey's example, and it doesn't crash on Linux - The error is displayed correctly. Looks like MS Windows issue of libxml2

postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('error.xml'),'windows-1251') g(d);
┌────────────────────┐
│ xml_is_well_formed │
╞════════════════════╡
│ f                  │
└────────────────────┘
(1 row)

This issue can be enforced by relatively new libxml2 limits


Unfortunately, default configuration uses xmlParseBalancedChunkMemory for parsing content, and this function cannot to get option like
XML_PARSE_HUGE
So it is hard to fix it.

It probably requires refactoring of parsing xml like http://xmlsoft.org/examples/parse4.c

Regards

Pavel



Regards

Pavel




--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Andrey Borodin
Date:
Hi!

> 4 окт. 2018 г., в 16:35, Sergey Mirvoda <sergey@mirvoda.com> написал(а):
>
>
> Additionally Andrey just reproduced this on his dev box.

Yep, crashes on my laptop with MacOS and libxml2/2.9.7 from brew on current master.

Best regards, Andrey Borodin.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:



Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

This issue is related to libxml2 limits - and it cannot to work with modern libxml2 libraries.

Pavel
 

Additionally Andrey just reproduced this on his dev box. 

--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


čt 4. 10. 2018 v 12:18 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


On Thu, Oct 4, 2018 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
If you can, could you please attach this file to this thread?  This is
important for the archives.
--
Michael

Looks like it is too big to send uncompressed, here it is in zip archive

I am try to import this xml to Postgres with pgimportdoc


and looks like some libxml2 issue.

pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
pgimportdoc: Error: ERROR:  invalid XML content
DETAIL:  line 178950: internal error: Huge input lookup
� органе Пенсионного фонда Российской Федер
                                                                               ^
line 178950: attributes construct error


probably it is same issue like


Regards

Pavel



--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Andrey Borodin
Date:


4 окт. 2018 г., в 16:38, Pavel Stehule <pavel.stehule@gmail.com> написал(а):




Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

This issue is related to libxml2 limits - and it cannot to work with modern libxml2 libraries.
Yes, root cause is inside libxml2 code.

Can we protect postmaster from crashing from libxml2 error? There is a bunch of PG_TRY there, but it does not help.

Best regards, Andrey Borodin.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


čt 4. 10. 2018 v 13:43 odesílatel Andrey Borodin <x4mmm@yandex-team.ru> napsal:


4 окт. 2018 г., в 16:38, Pavel Stehule <pavel.stehule@gmail.com> написал(а):




Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

This issue is related to libxml2 limits - and it cannot to work with modern libxml2 libraries.
Yes, root cause is inside libxml2 code.

Can we protect postmaster from crashing from libxml2 error? There is a bunch of PG_TRY there, but it does not help.

Unfortunately, no. You cannot to handle crash. PostgreSQL doesn't start separate process for libxml2 calls, and fault there is fatal.

Regards

Pavel


Best regards, Andrey Borodin.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


čt 4. 10. 2018 v 13:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


čt 4. 10. 2018 v 13:43 odesílatel Andrey Borodin <x4mmm@yandex-team.ru> napsal:


4 окт. 2018 г., в 16:38, Pavel Stehule <pavel.stehule@gmail.com> написал(а):




Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

This issue is related to libxml2 limits - and it cannot to work with modern libxml2 libraries.
Yes, root cause is inside libxml2 code.

Can we protect postmaster from crashing from libxml2 error? There is a bunch of PG_TRY there, but it does not help.

Unfortunately, no. You cannot to handle crash. PostgreSQL doesn't start separate process for libxml2 calls, and fault there is fatal.

I played with it, and it looks on some problems with libxml2 and your specific document (maybe too much multibyte chars, .. I don't know)

I imported 200MB long xml document with 1M items. So it has not sense to limit xml size of PostgreSQL side.

It looks so your xml document hits some corner case of libxml2 where it is extremely memory expensive. What I can see, there is lot of long content inside attributes.

Regards

Pavel



Regards

Pavel


Best regards, Andrey Borodin.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:


чт, 4 окт. 2018, 19:03 Pavel Stehule <pavel.stehule@gmail.com>:


čt 4. 10. 2018 v 13:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


čt 4. 10. 2018 v 13:43 odesílatel Andrey Borodin <x4mmm@yandex-team.ru> napsal:


4 окт. 2018 г., в 16:38, Pavel Stehule <pavel.stehule@gmail.com> написал(а):




Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

This issue is related to libxml2 limits - and it cannot to work with modern libxml2 libraries.
Yes, root cause is inside libxml2 code.

Can we protect postmaster from crashing from libxml2 error? There is a bunch of PG_TRY there, but it does not help.

Unfortunately, no. You cannot to handle crash. PostgreSQL doesn't start separate process for libxml2 calls, and fault there is fatal.

I played with it, and it looks on some problems with libxml2 and your specific document (maybe too much multibyte chars, .. I don't know)

I imported 200MB long xml document with 1M items. So it has not sense to limit xml size of PostgreSQL side.

It looks so your xml document hits some corner case of libxml2 where it is extremely memory expensive. What I can see, there is lot of long content inside attributes.

Regards

Pavel, thank you for your interest. 
It is definitely something inside this document. 

Actually we loaded about 10k different documents like this one. About 10Gb of content and crash is only on this one. 

But every other parser we tried (.net, Java, python)  handled this just fine. 

For now we ended with custom plpython function for parsing xml and this is slow as hell. 

This is looks like regression, pg 9.4 load this document without any problem. 

Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Alvaro Herrera
Date:
On 2018-Oct-04, Sergey Mirvoda wrote:

> чт, 4 окт. 2018, 19:03 Pavel Stehule <pavel.stehule@gmail.com>:

> Pavel, thank you for your interest.
> It is definitely something inside this document.
> 
> Actually we loaded about 10k different documents like this one. About 10Gb
> of content and crash is only on this one.

It's probably a good idea to report this to libxml2 then.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:


чт, 4 окт. 2018, 19:31 Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2018-Oct-04, Sergey Mirvoda wrote:

> чт, 4 окт. 2018, 19:03 Pavel Stehule <pavel.stehule@gmail.com>:

> Pavel, thank you for your interest.
> It is definitely something inside this document.
>
> Actually we loaded about 10k different documents like this one. About 10Gb
> of content and crash is only on this one.

It's probably a good idea to report this to libxml2 then.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sure, but bug is mostly about unhandled server crash. Is it normal?

Also, as far as I understand Pavel, lbxml2 interface somehow changed for handling 'huge' documents, but postgres don't handle this correctly. 

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


čt 4. 10. 2018 v 16:42 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


чт, 4 окт. 2018, 19:31 Alvaro Herrera <alvherre@2ndquadrant.com>:
On 2018-Oct-04, Sergey Mirvoda wrote:

> чт, 4 окт. 2018, 19:03 Pavel Stehule <pavel.stehule@gmail.com>:

> Pavel, thank you for your interest.
> It is definitely something inside this document.
>
> Actually we loaded about 10k different documents like this one. About 10Gb
> of content and crash is only on this one.

It's probably a good idea to report this to libxml2 then.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sure, but bug is mostly about unhandled server crash. Is it normal?

Nobody can handle process crash. This should be fixed on libxml2 side.
 

Also, as far as I understand Pavel, lbxml2 interface somehow changed for handling 'huge' documents, but postgres don't handle this correctly. 

This can be fixed only partially - the authors of libxml2 introduced new limit and new option, but there is not possible apply new option on all API.

If I understand well to this behave - libxml2 implemented new limits as safeguards against strange attacker documents. Is it question if it is good idea disable this safeguards by default. Second issue is impossibility to set this option for functionality that we use from libxml2 - and there are not alternatives. I have not any idea, how these issues can be fixed on postgresql side.

The mentioned limit is 10 000 000 bytes - if you will generate documents less than this size, then probably don't hit this issue.

Regards

Pavel


Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Michael Paquier
Date:
On Thu, Oct 04, 2018 at 05:02:05PM +0200, Pavel Stehule wrote:
> čt 4. 10. 2018 v 16:42 odesílatel Sergey Mirvoda <sergey@mirvoda.com>
> napsal:
>> Sure, but bug is mostly about unhandled server crash. Is it normal?
>
> Nobody can handle process crash. This should be fixed on libxml2 side.

If some code crashes in glibc or within a system call, impacting
Postgres backends, there is usually little to do, and the correct call
is to fix the place where the problem happens.  If libxml2 has
integrated with a new API which is considered safer to use, then we
could consider working on improving Postgres regarding that aspect.
--
Michael

Attachment

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:

 Andrey> Yep, crashes on my laptop with MacOS and libxml2/2.9.7 from
 Andrey> brew on current master.

You're sure about that libxml2 version? I can reproduce a crash on
2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
message instead)

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile

From
Andrey Borodin
Date:

> 5 окт. 2018 г., в 8:40, Andrew Gierth <andrew@tao11.riddles.org.uk> написал(а):
>
>>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:
>
> Andrey> Yep, crashes on my laptop with MacOS and libxml2/2.9.7 from
> Andrey> brew on current master.
>
> You're sure about that libxml2 version? I can reproduce a crash on
> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
> message instead)

You are right, there was default 2.9.4 from OS, and 2.9.4 from brew was not used.
x4mmm-osx:pgsql x4mmm$ xmllint --version
xmllint: using libxml version 20904

Sorry.

Best regards, Andrey Borodin.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:

 >> You're sure about that libxml2 version? I can reproduce a crash on
 >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
 >> message instead)

 Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
 Andrey> brew was not used.

 Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
 Andrey> xmllint: using libxml version 20904

I have a complete diagnosis of why it crashes on 2.9.4, and I can see
why it does not crash the same way on 2.9.7, but I would not bet
anything on 2.9.7 not having some comparable issue.

What happens on 2.9.4 is this (this is all inside libxml2):

 - at some point when parsing an element tag, the code decides to raise
   a fatal error and call xmlHaltParser

 - xmlHaltParser works by resetting the input buffer's "base" and "cur"
   pointers to point to a literal "" in the code (thus, a null byte)

 - xmlParseStartTag2 detects that input->base has changed, and assumes
   that this is because the buffer got reallocated; in the process of
   dealing with this, it resets input->cur to input->base + cur where
   "cur" is a local variable holding the previous offset in the buffer
   (which is now of course nonsense, so input->cur points into the
   weeds)

 - something later tries to access the byte at *input->cur and likely
   crashes (depending on many random factors, including load addresses
   of shared libraries and where in the buffer the original error was
   detected)

Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
reallocations differently so it doesn't fail the same way (it no longer
tries to modify input->cur). But there are so many ways that this error
path can screw itself up that I honestly would not trust it for one
second.

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:
It is hard to find version of libxml2.dll on Windows, but libxml2 bundleed win postgresql 9.4 (works well)
definitely different from later ones (2146kb vs 2212kb(9.6) vs 2212(10.1))

Ubuntu Version is  2.9.8 (server crashes hard)
administrator@capybara:~$ xmllint --version
xmllint: using libxml version 20908-GITv2.9.8


On Thu, Oct 4, 2018 at 4:11 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Oct 04, 2018 at 12:18:05PM +0200, Pavel Stehule wrote:
> čt 4. 10. 2018 v 12:12 odesílatel Sergey Mirvoda <sergey@mirvoda.com>
> I am try to import this xml to Postgres with pgimportdoc
>
> https://github.com/okbob/pgimportdoc
>
> and looks like some libxml2 issue.
>
> pgimportdoc: Unexpected result status: PGRES_FATAL_ERROR
> pgimportdoc: Error: ERROR:  invalid XML content
> DETAIL:  line 178950: internal error: Huge input lookup
> � органе Пенсионного фонда Российской Федер
>
> ^
> line 178950: attributes construct error

Sergey, what is the version of libxml2 bundled with the Windows
installer and Ubuntu?  You should normally be able to see which version
it is on Windows by looking at the version number of the libxml2 dll...
However libxml2 upstream is not smart enough to produce a version file
if I recall correctly (please note last time I looked at that I had to
add to the upstream tree a custom patch to associate a correct version
to the produced dll, making sure that minor MSI upgrades were able to
update the version of libxml2 bundled in a Postgres MSI installer I
lately maintain).
--
Michael


--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:

On Fri, Oct 5, 2018 at 10:08 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:

 >> You're sure about that libxml2 version? I can reproduce a crash on
 >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
 >> message instead)

 Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
 Andrey> brew was not used.

 Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
 Andrey> xmllint: using libxml version 20904

I have a complete diagnosis of why it crashes on 2.9.4, and I can see
why it does not crash the same way on 2.9.7, but I would not bet
anything on 2.9.7 not having some comparable issue.

What happens on 2.9.4 is this (this is all inside libxml2):

 - at some point when parsing an element tag, the code decides to raise
   a fatal error and call xmlHaltParser

 - xmlHaltParser works by resetting the input buffer's "base" and "cur"
   pointers to point to a literal "" in the code (thus, a null byte)

 - xmlParseStartTag2 detects that input->base has changed, and assumes
   that this is because the buffer got reallocated; in the process of
   dealing with this, it resets input->cur to input->base + cur where
   "cur" is a local variable holding the previous offset in the buffer
   (which is now of course nonsense, so input->cur points into the
   weeds)

 - something later tries to access the byte at *input->cur and likely
   crashes (depending on many random factors, including load addresses
   of shared libraries and where in the buffer the original error was
   detected)

Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
reallocations differently so it doesn't fail the same way (it no longer
tries to modify input->cur). But there are so many ways that this error
path can screw itself up that I honestly would not trust it for one
second.

--
Andrew (irc:RhodiumToad)


Sorry for top posting and spelling, T9 and mobile gmail not very usable.

Some notes.

if i set xmloption to document

this code works as expected
postgres=# select d::xml from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
....
postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
 xml_is_well_formed
--------------------
 t
(1 строка)

but all other XML functions still crashing server

for example:
postgres=# select  xpath_exists('//СвЮЛ'::text,d::xml) from convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);

--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Sergey" == Sergey Mirvoda <sergey@mirvoda.com> writes:

 Sergey> Ubuntu Version is  2.9.8 (server crashes hard)

Any chance you can get a backtrace of that?

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:


On Fri, Oct 5, 2018 at 5:28 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Sergey" == Sergey Mirvoda <sergey@mirvoda.com> writes:

 Sergey> Ubuntu Version is  2.9.8 (server crashes hard)

Any chance you can get a backtrace of that?

--
Andrew (irc:RhodiumToad)


Sure, but I dont know how to do it right.

Can I send you private message with connection info to the test machine.
It is clean virtual server only for developing you can install, drop or truncate anything)



--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


pá 5. 10. 2018 v 14:09 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:

On Fri, Oct 5, 2018 at 10:08 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:

 >> You're sure about that libxml2 version? I can reproduce a crash on
 >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
 >> message instead)

 Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
 Andrey> brew was not used.

 Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
 Andrey> xmllint: using libxml version 20904

I have a complete diagnosis of why it crashes on 2.9.4, and I can see
why it does not crash the same way on 2.9.7, but I would not bet
anything on 2.9.7 not having some comparable issue.

What happens on 2.9.4 is this (this is all inside libxml2):

 - at some point when parsing an element tag, the code decides to raise
   a fatal error and call xmlHaltParser

 - xmlHaltParser works by resetting the input buffer's "base" and "cur"
   pointers to point to a literal "" in the code (thus, a null byte)

 - xmlParseStartTag2 detects that input->base has changed, and assumes
   that this is because the buffer got reallocated; in the process of
   dealing with this, it resets input->cur to input->base + cur where
   "cur" is a local variable holding the previous offset in the buffer
   (which is now of course nonsense, so input->cur points into the
   weeds)

 - something later tries to access the byte at *input->cur and likely
   crashes (depending on many random factors, including load addresses
   of shared libraries and where in the buffer the original error was
   detected)

Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
reallocations differently so it doesn't fail the same way (it no longer
tries to modify input->cur). But there are so many ways that this error
path can screw itself up that I honestly would not trust it for one
second.

--
Andrew (irc:RhodiumToad)


Sorry for top posting and spelling, T9 and mobile gmail not very usable.

Some notes.

if i set xmloption to document

this code works as expected
postgres=# select d::xml from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
....
postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
 xml_is_well_formed
--------------------
 t
(1 строка)

but all other XML functions still crashing server

for example:
postgres=# select  xpath_exists('//СвЮЛ'::text,d::xml) from convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);

There are different parsing methods

 xmlCtxtReadDoc versus xmlParseBalancedChunkMemory

The problem is with xmlParseBalancedChunkMemory

Regards

Pavel


--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 Pavel> There are different parsing methods

 Pavel>  xmlCtxtReadDoc versus xmlParseBalancedChunkMemory

 Pavel> The problem is with xmlParseBalancedChunkMemory

No, it's not.

I don't yet know about 2.9.8, but the crash with 2.9.4 is a bug in
libxml2's error handling which has _nothing to do_ with which API you
use. Read the analysis I posted.

-- 
Andrew.


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


pá 5. 10. 2018 v 15:22 odesílatel Andrew Gierth <andrew@tao11.riddles.org.uk> napsal:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 Pavel> There are different parsing methods

 Pavel>  xmlCtxtReadDoc versus xmlParseBalancedChunkMemory

 Pavel> The problem is with xmlParseBalancedChunkMemory

No, it's not.

I don't yet know about 2.9.8, but the crash with 2.9.4 is a bug in
libxml2's error handling which has _nothing to do_ with which API you
use. Read the analysis I posted.

ok. Probably there are more than one issue. 1. crash, 2. raising huge input lookup, maybe other.

libxml2 is great library, but I didn't understand their system still.

Regards

Pavel


--
Andrew.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> I don't yet know about 2.9.8, but the crash with 2.9.4 is a bug in
 >> libxml2's error handling which has _nothing to do_ with which API
 >> you use. Read the analysis I posted.

 Pavel> ok. Probably there are more than one issue. 1. crash, 2. raising
 Pavel> huge input lookup, maybe other.

It's certainly possible that the error that provokes the crash is
libxml2 complaining that the tag name is too long or whatever.

 Pavel> libxml2 is great library,

HAH. From reading the code, as I have been for much of today, it's
pretty damn bad.

Interestingly, the FreeBSD package build of PG now disables XML by
default on account of libxml2's very poor security record. We need a
better XML library :-(

-- 
Andrew.


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


pá 5. 10. 2018 v 15:39 odesílatel Andrew Gierth <andrew@tao11.riddles.org.uk> napsal:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> I don't yet know about 2.9.8, but the crash with 2.9.4 is a bug in
 >> libxml2's error handling which has _nothing to do_ with which API
 >> you use. Read the analysis I posted.

 Pavel> ok. Probably there are more than one issue. 1. crash, 2. raising
 Pavel> huge input lookup, maybe other.

It's certainly possible that the error that provokes the crash is
libxml2 complaining that the tag name is too long or whatever.

 Pavel> libxml2 is great library,

HAH. From reading the code, as I have been for much of today, it's
pretty damn bad.

Interestingly, the FreeBSD package build of PG now disables XML by
default on account of libxml2's very poor security record. We need a
better XML library :-(

Is some other with good licence? And with XPath support?

Pavel

--
Andrew.

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:


On Fri, Oct 5, 2018 at 6:46 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 5. 10. 2018 v 15:39 odesílatel Andrew Gierth <andrew@tao11.riddles.org.uk> napsal:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> I don't yet know about 2.9.8, but the crash with 2.9.4 is a bug in
 >> libxml2's error handling which has _nothing to do_ with which API
 >> you use. Read the analysis I posted.

 Pavel> ok. Probably there are more than one issue. 1. crash, 2. raising
 Pavel> huge input lookup, maybe other.

It's certainly possible that the error that provokes the crash is
libxml2 complaining that the tag name is too long or whatever.

 Pavel> libxml2 is great library,

HAH. From reading the code, as I have been for much of today, it's
pretty damn bad.

Interestingly, the FreeBSD package build of PG now disables XML by
default on account of libxml2's very poor security record. We need a
better XML library :-(

Is some other with good licence? And with XPath support?

Pavel

--
Andrew.

libxml2 very powerful library, but googling around for about two days showed that
there are number of problems in handling large files

Here is one of them

--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Sergey" == Sergey Mirvoda <sergey@mirvoda.com> writes:

 Sergey> libxml2 very powerful library, but googling around for about
 Sergey> two days showed that there are number of problems in handling
 Sergey> large files

 Sergey> Here is one of them
 Sergey> https://github.com/sparklemotion/nokogiri/issues/740

That's pretty old (2012).

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Interestingly, the FreeBSD package build of PG now disables XML by
> default on account of libxml2's very poor security record.

It's hard to argue with their choice.

> We need a better XML library :-(

Is there one anywhere?  (I think writing our own is right out.)

Should we be officially deprecating XML facilities and telling people
to head towards JSON?  There at least we have control of the quality
of implementation ...

            regards, tom lane


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Sergey Mirvoda
Date:


On Fri, Oct 5, 2018 at 7:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Should we be officially deprecating XML facilities and telling people
to head towards JSON?  There at least we have control of the quality
of implementation ...

                        regards, tom lane

I believe MS SQL Server guys chosen JSON path some years ago.
XML performance in SQL Server looks very deprecated.
Actually we tried to load our data into MS SQL at first, but performance was below our needs even with xml index.


--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Pavel Stehule
Date:


pá 5. 10. 2018 v 16:33 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:


On Fri, Oct 5, 2018 at 7:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Should we be officially deprecating XML facilities and telling people
to head towards JSON?  There at least we have control of the quality
of implementation ...

                        regards, tom lane

I believe MS SQL Server guys chosen JSON path some years ago.
XML performance in SQL Server looks very deprecated.
Actually we tried to load our data into MS SQL at first, but performance was below our needs even with xml index.

The bugs of libxml2 are unpleasant, on second hand, this library is fast and almost time works well. One our benefit against other open source databases is good XML support - and it cannot be replaced by JSON support.

Anybody who prefer absolute safety can use PostgreSQL without XML support. We don't hide fact so we use libxml2, and anybody can check reputation of libxml2

More - fresh libxml2 - 2.9.8 doesn't crash - so maybe this bug was fixed.

Regards

Pavel


--
--Regards, Sergey Mirvoda

Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Sergey" == Sergey Mirvoda <sergey@mirvoda.com> writes:

 Sergey> Ubuntu Version is  2.9.8 (server crashes hard)
 Sergey> administrator@capybara:~$ xmllint --version
 Sergey> xmllint: using libxml version 20908-GITv2.9.8

Unfortunately the version of libxml2 that pg is using on that system is
in fact 2.9.4, not 2.9.8:

root@capybara:~# ldd /usr/lib/postgresql/10/bin/postgres | fgrep xml
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f332517f000)

ii  libxml2:amd64               2.9.4+dfsg1-6.1ubu amd64              GNOME XML library

and poking at it with gdb shows exactly the same bug that I found on my
system, though on yours it fails at a somewhat different place (when
trying to print the file context as part of the error message, but the
basic problem is still that input->cur is off in the weeds).

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> We need a better XML library :-(

 Tom> Is there one anywhere?  (I think writing our own is right out.)

I don't know of one. I think the hard part would be finding one that
both does XPath and has a compatible license.

-- 
Andrew (irc:RhodiumToad)