8.3b2 XPath-based function index server crash - Mailing list pgsql-general

From Matt Magoffin
Subject 8.3b2 XPath-based function index server crash
Date
Msg-id 49309.192.168.1.108.1195416986.squirrel@msqr.us
Whole thread Raw
Responses Re: 8.3b2 XPath-based function index server crash  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hello, I currently have a table in Postgres 8.1 with a text column that
contains XML. I use the xml2 module to define several XPath-based function
indices on that column, and this has worked very well.

I'm trying not to evaluate the native XML support in 8.3b2. I dumped this
table from 8.1, then loaded the data into a new table in 8.3 with the text
column re-defined as the xml type. The load happened without any errors.

Now I am trying to define equivalent XPath based function indices on the
xml column, but am running into a problem where the server processes the
'add index' command for a while, then crashes with a seg fault. While
investigating, I found I was able to reproduce the crash consistently by
executing a select statement with a large offset and limit in the query.

Initially I had thought some particular row in the xml column was causing
a problem. However, if I query directly for any specific row by its
primary key, the server does not crash and returns the result without
error.

Here are some details of the xml functions I am trying to perform. In 8.1,
I have an xml2 module function index defined like

"assigned_area_idx" btree (xpath_string(xml,
'/als:auto-lead-service/als:meta[@key="AREA"][1]'::text))

Here "xml" in the text column. In 8.3, then, I was trying to mimic this
same index with this:

create index assigned_area_idx ON lead (
    XMLSERIALIZE( CONTENT
(xpath('/als:auto-lead-service/als:meta[@key="AREA"][1]/text()', xml,
        ARRAY[ARRAY['als','http://autoleadservice.com/xml/als']]))[1] as text)
);

Is this the correct way to define such an index? I tried several different
ways, and this seemed to be the only way I could get it to be accepted.
However, after running for a while, this command fails and the postgres
server crashes.

So I explored with a SELECT statement, thinking there was some specific
XML document causing the crash. I could consistently execute this
statement to get a crash:

select XMLSERIALIZE( CONTENT
(xpath('/als:auto-lead-service/als:meta[@key="AREA"][1]/text()', xml,
        ARRAY[ARRAY['als','http://autoleadservice.com/xml/als']]))[1] as text)
from lead order by id limit 1 offset 83367;

The query would take a long time to execute, and then crash. I took some
samples with Instruments (OS X) and found while the server was processing
this select, it was spending a ton of time in libxml2, as if it was
evaluating the XMLSERIALIZE on every row leading up to the first returned
offset row. Is that expected for this type of query (i.e. I was thinking
it would just find the first offset row, then execute the XMLSERIALIZE
statement on that row)?

If I query for that same row using its primary key instead of the offset,
the query runs fine and returns the expected results, i.e.

select id, XMLSERIALIZE( CONTENT
(xpath('/als:auto-lead-service/als:meta[@key="AREA"][1]/text()', xml,
        ARRAY[ARRAY['als','http://autoleadservice.com/xml/als']]))[1] as text)
from lead where id = 84521;

-- m@





pgsql-general by date:

Previous
From: Sam Mason
Date:
Subject: Re: GIN: any ordering guarantees for the hits returned?
Next
From: Tom Lane
Date:
Subject: Re: 8.3b2 XPath-based function index server crash