Re: BUG #15420: Server crash. Segmentation fault when parsing xml file - Mailing list pgsql-bugs

From Sergey Mirvoda
Subject Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Date
Msg-id CALkWArjA5ApwXTnWWGMSmw6CFUaaTWHiL5gmJuMZXsMsb0tqeQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15420: Server crash. Segmentation fault when parsing xml file  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs


чт, 4 окт. 2018, 19:03 Pavel Stehule <pavel.stehule@gmail.com>:


čt 4. 10. 2018 v 13:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:


čt 4. 10. 2018 v 13:43 odesílatel Andrey Borodin <x4mmm@yandex-team.ru> napsal:


4 окт. 2018 г., в 16:38, Pavel Stehule <pavel.stehule@gmail.com> написал(а):




Actually we found this error in very fresh intatallation of Ubuntu 16.04 and postgres 10.5
After that we checked every configuration we have. 
And only postgres 9.4 works as expected. 

This issue is related to libxml2 limits - and it cannot to work with modern libxml2 libraries.
Yes, root cause is inside libxml2 code.

Can we protect postmaster from crashing from libxml2 error? There is a bunch of PG_TRY there, but it does not help.

Unfortunately, no. You cannot to handle crash. PostgreSQL doesn't start separate process for libxml2 calls, and fault there is fatal.

I played with it, and it looks on some problems with libxml2 and your specific document (maybe too much multibyte chars, .. I don't know)

I imported 200MB long xml document with 1M items. So it has not sense to limit xml size of PostgreSQL side.

It looks so your xml document hits some corner case of libxml2 where it is extremely memory expensive. What I can see, there is lot of long content inside attributes.

Regards

Pavel, thank you for your interest. 
It is definitely something inside this document. 

Actually we loaded about 10k different documents like this one. About 10Gb of content and crash is only on this one. 

But every other parser we tried (.net, Java, python)  handled this just fine. 

For now we ended with custom plpython function for parsing xml and this is slow as hell. 

This is looks like regression, pg 9.4 load this document without any problem. 

pgsql-bugs by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xmlfile