Thread: BUG #3860: xpath crashes backend when is querying xmlagg result

BUG #3860: xpath crashes backend when is querying xmlagg result

From
"Sokolov Yura"
Date:
The following bug has been logged online:

Bug reference:      3860
Logged by:          Sokolov Yura
Email address:      funny.falcon@gmail.com
PostgreSQL version: 8.3beta4
Operating system:   Debian Linux 4.0r2 both 32bit and amd64
Description:        xpath crashes backend when is querying xmlagg result
Details:

xpath() crashes backend when is querying particular xmlagg results.

Behavior is unstable, but it is reproduced with command line:

psql < bg***.sql

and initially observed with pgadmin3 1.8

-------- bg1.sql---------------
-- crashed, when queries xmlelement with containment given by xmlagg
-- and type of aggregated elements is different
select xpath('any_non_empty_expression', xmlelement(name a, xmlagg(el) )) as
el
from ( values
    ( xmlelement(name b, 0) ),  -- integer
    ( xmlelement(name c, 0.1) ) -- numeric, string crashe backend
                                -- on 32bit Debian float4 and float8 does
not
                                -- on 64 bit Debian float4 does but float8
not
    -- order is irrelevant
) as t(el);
-----end bg1.sql --------------

-------- bg2.sql---------------
-- crashed, when queries xml given by xmlagg
-- and types of elements text() is different
select xpath('any_non_empty_expression', xmlagg(el) ) as el
from ( values
    ( xmlelement(name b, 0) ),
    ( xmlelement(name c, 0.1) )
) as t(el);
-----end bg2.sql --------------

-------- bg3.sql---------------
-- this crashes 64bit machine
select xpath('/a/c', xmlagg(el)) as el
from ( values
   ( xmlelement(name b, 0) ),
   ( xmlelement(name c, 0.1::float4) )
) as t(el);
-----end bg3.sql --------------

-------- bg-not31.sql ----------
-- but this works on 64bit machine
select xpath('/b', xmlagg(el)) as el
from ( values
   ( xmlelement(name b, 0) ),
   ( xmlelement(name c, 0.1::float8) )
) as t(el);
-----end bg-not31.sql ----------

-------- bg-not32.sql ----------
-- and, hardly believed, combined file works too
select xpath('/b', xmlagg(el)) as el
from ( values
   ( xmlelement(name b, 0) ),
   ( xmlelement(name c, 0.1::float8) )
) as t(el);
select xpath('/c', xmlagg(el)) as el
from ( values
   ( xmlelement(name b, 0) ),
   ( xmlelement(name c, 0.1::float4) )
) as t(el);
-----end bg-not32.sql ----------

-------- bg-not.sql -----------
-- Empty xpath expression catched - no backend crash
select xpath('',xmlelement(name a, xmlagg(el) )) as el
from ( values
( xmlelement(name b, 0) ),
( xmlelement(name c, 0.1) )
) as t(el);
-----end bg-not.sql -----------

-------- bg-not1.sql -----------
-- when types are same no backend crash
-- with numeric works on 32bit machine, but crashes on 64bit
select xpath('/a/b',xmlelement(name a, xmlagg(el) )) as el
from ( values
( xmlelement(name b, 0.1) ),
( xmlelement(name c, 0.1) )
) as t(el);
-----end bg-not1.sql -----------

-------- bg-not2.sql -----------
-- when elements given explicitly then no backend crash
select xpath('/a/c',
           xmlelement(name a,
           xmlelement(name b, 0),
           xmlelement(name c, 0.1) ));
-----end bg-not2.sql -----------

-------- bg-not3.sql -----------
-- xmlelement itself not crashed
select xmlelement(name a, xmlagg(el) ) as el
from ( values
( xmlelement(name b, 0) ),
( xmlelement(name c, 0.1) )
) as t(el);
-----end bg-not3.sql -----------

-------- bg-not4.sql -----------
-- when elements given with xmlconcat2 then no backend crash
select xpath('/a/c',
           xmlelement(name a,
           xmlconcat2(
             xmlelement(name b, 0),
             xmlelement(name c, 0.1))
           ));
-----end bg-not4.sql -----------

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Sokolov Yura escribió:

> -------- bg1.sql---------------
> -- crashed, when queries xmlelement with containment given by xmlagg
> -- and type of aggregated elements is different
> select xpath('any_non_empty_expression', xmlelement(name a, xmlagg(el) )) as
> el
> from ( values
>     ( xmlelement(name b, 0) ),  -- integer
>     ( xmlelement(name c, 0.1) ) -- numeric, string crashe backend
>                                 -- on 32bit Debian float4 and float8 does
> not
>                                 -- on 64 bit Debian float4 does but float8
> not
>     -- order is irrelevant
> ) as t(el);
> -----end bg1.sql --------------

I can reproduce this crash.  The backtrace looks like this:

(gdb) bt
#0  0x000000000076a06e in pfree (pointer=0xc4b1c8) at /pgsql/source/00head/src/backend/utils/mmgr/mcxt.c:589
#1  0x000000000072c773 in xml_pfree (ptr=0xc4b1c8) at /pgsql/source/00head/src/backend/utils/adt/xml.c:1342
#2  0x00002b7c3b3ecb12 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2
#3  0x00002b7c3b3f4d75 in xmlCleanupParser () from /usr/lib/libxml2.so.2
#4  0x0000000000730778 in xpath (fcinfo=0x7fff6f909d70) at /pgsql/source/00head/src/backend/utils/adt/xml.c:3441
#5  0x000000000058dc89 in ExecMakeFunctionResult (fcache=0xc40780, econtext=0xc40688, isNull=0xc416f8 "",
    isDone=0xc417b0) at /pgsql/source/00head/src/backend/executor/execQual.c:1351
#6  0x000000000058e587 in ExecEvalFunc (fcache=0xc40780, econtext=0xc40688, isNull=0xc416f8 "", isDone=0xc417b0)
    at /pgsql/source/00head/src/backend/executor/execQual.c:1753
#7  0x0000000000594ed6 in ExecTargetList (targetlist=0xc40f08, econtext=0xc40688, values=0xc416d8,
    isnull=0xc416f8 "", itemIsDone=0xc417b0, isDone=0x0)
    at /pgsql/source/00head/src/backend/executor/execQual.c:4601
#8  0x00000000005953a9 in ExecProject (projInfo=0xc41718, isDone=0x0)
    at /pgsql/source/00head/src/backend/executor/execQual.c:4802
#9  0x000000000059b5e4 in agg_retrieve_direct (aggstate=0xc40378)
    at /pgsql/source/00head/src/backend/executor/nodeAgg.c:989
#10 0x000000000059b2e0 in ExecAgg (node=0xc40378) at /pgsql/source/00head/src/backend/executor/nodeAgg.c:816
#11 0x000000000058b34b in ExecProcNode (node=0xc40378)
    at /pgsql/source/00head/src/backend/executor/execProcnode.c:394
#12 0x0000000000588a60 in ExecutePlan (estate=0xc3fe48, planstate=0xc40378, operation=CMD_SELECT, numberTuples=0,
    direction=ForwardScanDirection, dest=0xc30fc0) at /pgsql/source/00head/src/backend/executor/execMain.c:1233
#13 0x000000000058721a in ExecutorRun (queryDesc=0xc08ed8, direction=ForwardScanDirection, count=0)
    at /pgsql/source/00head/src/backend/executor/execMain.c:267
#14 0x000000000066692b in PortalRunSelect (portal=0xc37228, forward=1 '\001', count=0, dest=0xc30fc0)
    at /pgsql/source/00head/src/backend/tcop/pquery.c:943
#15 0x000000000066657b in PortalRun (portal=0xc37228, count=9223372036854775807, isTopLevel=1 '\001',
    dest=0xc30fc0, altdest=0xc30fc0, completionTag=0x7fff6f90a6e0 "")
    at /pgsql/source/00head/src/backend/tcop/pquery.c:769
#16 0x0000000000660a1f in exec_simple_query (
    query_string=0xbf3868 "select xpath('any_non_empty_expression', xmlelement(name a, xmlagg(el) )) as el\nfrom ( v
alues ( xmlelement(name b, 0) ), ( xmlelement(name c, 0.1) ) ) as t(el);")
    at /pgsql/source/00head/src/backend/tcop/postgres.c:963
#17 0x0000000000664917 in PostgresMain (argc=4, argv=0xb4ca28, username=0xb4c9d8 "alvherre")
    at /pgsql/source/00head/src/backend/tcop/postgres.c:3535
#18 0x000000000062609c in BackendRun (port=0xb72400)
    at /pgsql/source/00head/src/backend/postmaster/postmaster.c:3180
#19 0x00000000006255de in BackendStartup (port=0xb72400)
    at /pgsql/source/00head/src/backend/postmaster/postmaster.c:2803


--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I can reproduce this crash.  The backtrace looks like this:

Ah, good.  This looks like it matches a previous report, but we didn't have
a reproducible test case:
http://archives.postgresql.org/pgsql-general/2007-12/msg01086.php

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I can reproduce this crash.  The backtrace looks like this:

Hmm, what I'm seeing is that libxml is apparently trying to pfree
something that it didn't allocate via palloc ...


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Hmm, what I'm seeing is that libxml is apparently trying to pfree
> something that it didn't allocate via palloc ...

Not totally surprising --- when we call xmlMemSetup() we are telling
libxml to start using xml_palloc etc rather than its default of malloc
etc.  We had some bugs earlier involving not calling xmlMemSetup soon
enough during some seqauence of operations; this may be another.

Another likely theory is that libxml is trying to pfree something that
had been palloc'd in a previous cycle of function calls, and then went
away in a memory context reset.  We try to shut the library down
completely at exit from xml.c functions, so that it doesn't think it has
any open allocated memory, buut there may be something we've missed
doing.  See the note at lines 27ff of xml.c.

If nothing else comes to mind, try instrumenting xml_palloc and friends
to print a trace of what they're doing, and match up the calls ...

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Hmm, what I'm seeing is that libxml is apparently trying to pfree
> > something that it didn't allocate via palloc ...
>
> Not totally surprising --- when we call xmlMemSetup() we are telling
> libxml to start using xml_palloc etc rather than its default of malloc
> etc.  We had some bugs earlier involving not calling xmlMemSetup soon
> enough during some seqauence of operations; this may be another.

The crash is on free of the character encoding stuff in libxml.  So I
traced its initialization and xmlMemSetup, and my conclusion that this
is not the problem -- xmlMemSetup is called earlier.

> Another likely theory is that libxml is trying to pfree something that
> had been palloc'd in a previous cycle of function calls, and then went
> away in a memory context reset.  We try to shut the library down
> completely at exit from xml.c functions, so that it doesn't think it has
> any open allocated memory, buut there may be something we've missed
> doing.  See the note at lines 27ff of xml.c.

[ ... a lot of playing with gdb later, including clearing up confusion
between tracepoints and watchpoints ... ]

What's happening is that libxml encoding handler table is being
allocated in an ExprContext which apparently is too short-lived.

I'm not seeing very well how to handle this -- one idea would be to
manually call xmlInitCharEncodingHandlers (which is public but supposed
to be called internally by libxml) with a longer-lived context, but I
wonder whether there's some other initialization that will come bite us.
Another idea would be to initialize and then destroy the libxml state
separately for each expression, which perhaps doesn't really work at
all.

Perhaps a better idea is to create a separate LibxmlContext memcxt,
child of QueryContext, and have xml_palloc etc always use that.  That
way it won't be reset between calls.  It probably also means we could
wire xml reset to transaction abort, making it a bit simpler.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> What's happening is that libxml encoding handler table is being
> allocated in an ExprContext which apparently is too short-lived.

> I'm not seeing very well how to handle this -- one idea would be to
> manually call xmlInitCharEncodingHandlers (which is public but supposed
> to be called internally by libxml) with a longer-lived context, but I
> wonder whether there's some other initialization that will come bite us.

Ugh.  This seems like about the worst-case scenario: we don't know when
libxml chooses to allocate or free this data structure, and apparently
we have no way to force it to be freed.

> Another idea would be to initialize and then destroy the libxml state
> separately for each expression, which perhaps doesn't really work at
> all.

That's what we're trying to do now, I thought.

> Perhaps a better idea is to create a separate LibxmlContext memcxt,
> child of QueryContext, and have xml_palloc etc always use that.  That
> way it won't be reset between calls.  It probably also means we could
> wire xml reset to transaction abort, making it a bit simpler.

Might as well go back to letting it use malloc :-(.

I actually don't see a problem with letting it use malloc for stuff that
it is going to manage for itself.  I guess the problem comes with
transient return values of libxml functions; we'd want to explicitly
move those into palloc-based storage and then free() them.

This looks like a rather large rewrite though.  Peter, have you any
better ideas?

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > What's happening is that libxml encoding handler table is being
> > allocated in an ExprContext which apparently is too short-lived.
>
> > I'm not seeing very well how to handle this -- one idea would be to
> > manually call xmlInitCharEncodingHandlers (which is public but supposed
> > to be called internally by libxml) with a longer-lived context, but I
> > wonder whether there's some other initialization that will come bite us.
>
> Ugh.  This seems like about the worst-case scenario: we don't know when
> libxml chooses to allocate or free this data structure, and apparently
> we have no way to force it to be freed.

Yeah, we can free it with xmlCleanupParser.  Too blunt an instrument,
perhaps?

> > Another idea would be to initialize and then destroy the libxml state
> > separately for each expression, which perhaps doesn't really work at
> > all.
>
> That's what we're trying to do now, I thought.

Yeah, apparently it doesn't work very well.

> I actually don't see a problem with letting it use malloc for stuff that
> it is going to manage for itself.  I guess the problem comes with
> transient return values of libxml functions; we'd want to explicitly
> move those into palloc-based storage and then free() them.

I guess the problem is that the objects can contain pointers and we have
no way of following those to copy them.  Perhaps we can make it use
malloc, let it create the object, switch to palloc, create a clone,
revert to malloc, free the original.

What problem you have with it having its own memory context?  I don't
think it has any particular disadvantage.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> What problem you have with it having its own memory context?  I don't
> think it has any particular disadvantage.

I don't object to that per se; I'm just saying it doesn't do much to
fix the problem.  The difficulty we've got here ultimately comes down
to separating long-lived objects from not-long-lived ones.  It seems
libxml doesn't have adequate controls to let us do that.  If we run
libxml in a context we don't ever reset, then we leak memory if an
error occurs while we're fooling around with a function result.
If we do reset the context, then we've got exactly this problem again
of possibly deleting objects that libxml thinks are still there.

We might be able to compromise by only resetting the context after
an error, but this is still only possible if we have a way to make
libxml let go of *all* pointers to alloc'd objects.  I don't understand
your comment that xmlCleanupParser solves it --- we call that already,
and it doesn't seem to be preventing the problem.

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:

> We might be able to compromise by only resetting the context after
> an error, but this is still only possible if we have a way to make
> libxml let go of *all* pointers to alloc'd objects.  I don't understand
> your comment that xmlCleanupParser solves it --- we call that already,
> and it doesn't seem to be preventing the problem.

We call it *after* the context has been reset -- or at least that's what
gdb is showing me.

Hardware watchpoint 3: handlers[0]

Old value = (xmlCharEncodingHandlerPtr) 0xc43580
New value = (xmlCharEncodingHandlerPtr) 0x7f7f7f7f7f7f7f7f
0x00002b579ad84283 in memset () from /lib/libc.so.6
(gdb) bt
#0  0x00002b579ad84283 in memset () from /lib/libc.so.6
#1  0x0000000000768b69 in AllocSetReset (context=0xc1d970)
    at /pgsql/source/00head/src/backend/utils/mmgr/aset.c:456
#2  0x0000000000769f42 in MemoryContextReset (context=0xc1d970)
    at /pgsql/source/00head/src/backend/utils/mmgr/mcxt.c:130
#3  0x000000000059791c in ReScanExprContext (econtext=0xc42050)
    at /pgsql/source/00head/src/backend/executor/execUtils.c:447
#4  0x00000000005a6ed8 in ValuesNext (node=0xc41f38)
    at /pgsql/source/00head/src/backend/executor/nodeValuesscan.c:106
#5  0x00000000005957ec in ExecScan (node=0xc41f38,
    accessMtd=0x5a6d98 <ValuesNext>)
    at /pgsql/source/00head/src/backend/executor/execScan.c:68
#6  0x00000000005a706c in ExecValuesScan (node=0xc41f38)
    at /pgsql/source/00head/src/backend/executor/nodeValuesscan.c:173
#7  0x000000000058b6bd in ExecProcNode (node=0xc41f38)
    at /pgsql/source/00head/src/backend/executor/execProcnode.c:360
#8  0x000000000059b86b in agg_retrieve_direct (aggstate=0xc41448)
    at /pgsql/source/00head/src/backend/executor/nodeAgg.c:921
#9  0x000000000059b6d0 in ExecAgg (node=0xc41448)
    at /pgsql/source/00head/src/backend/executor/nodeAgg.c:816
#10 0x000000000058b73b in ExecProcNode (node=0xc41448)
    at /pgsql/source/00head/src/backend/executor/execProcnode.c:394
#11 0x0000000000588e50 in ExecutePlan (estate=0xc40f18, planstate=0xc41448,
    operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection,
    dest=0xc31748) at /pgsql/source/00head/src/backend/executor/execMain.c:1233
#12 0x000000000058760a in ExecutorRun (queryDesc=0xc09f78,
    direction=ForwardScanDirection, count=0)
    at /pgsql/source/00head/src/backend/executor/execMain.c:267
#13 0x000000000066715b in PortalRunSelect (portal=0xc382f8, forward=1 '\001',
    count=0, dest=0xc31748) at /pgsql/source/00head/src/backend/tcop/pquery.c:943
#14 0x0000000000666dab in PortalRun (portal=0xc382f8, count=9223372036854775807,
    isTopLevel=1 '\001', dest=0xc31748, altdest=0xc31748,
    completionTag=0x7fff10f94cf0 "")
    at /pgsql/source/00head/src/backend/tcop/pquery.c:769
#15 0x000000000066125f in exec_simple_query (
    query_string=0xbf4938 "select xpath('any_non_empty_expression', xmlagg(el) ) as el\nfrom ( values\n    (
xmlelement(nameb, 0) ),\n    ( xmlelement(name c, 0.1) )\n) as t(el);") at
/pgsql/source/00head/src/backend/tcop/postgres.c:963
[...]

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Alvaro Herrera escribió:
> Tom Lane escribió:
>
> > We might be able to compromise by only resetting the context after
> > an error, but this is still only possible if we have a way to make
> > libxml let go of *all* pointers to alloc'd objects.  I don't understand
> > your comment that xmlCleanupParser solves it --- we call that already,
> > and it doesn't seem to be preventing the problem.
>
> We call it *after* the context has been reset -- or at least that's what
> gdb is showing me.
>
> Hardware watchpoint 3: handlers[0]

I forgot to mention that I had a breakpoint on xmlCleanupParser that
didn't fire.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:

> We might be able to compromise by only resetting the context after
> an error, but this is still only possible if we have a way to make
> libxml let go of *all* pointers to alloc'd objects.  I don't understand
> your comment that xmlCleanupParser solves it --- we call that already,
> and it doesn't seem to be preventing the problem.

With the attached patch, it doesn't crash, but I see the added WARNING
four times in the log, which is proof that the cleanup thing is not
called as the code seems to think.

I wonder -- is this thing supposed to be reentrant?  I think that's the
whole problem with it.

(I think what I'm doing in xml_init in the non-first case is bogus
anyway -- but I post the patch to show my point.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
I wonder whether the real issue here isn't that we have some functions
that invoke libxml without ultimately doing xmlCleanupParser() ---
xml_in being the first obvious candidate.  Maybe that is the mechanism
through which libxml ends up with dangling pointers.

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:
> I wonder whether the real issue here isn't that we have some functions
> that invoke libxml without ultimately doing xmlCleanupParser() ---
> xml_in being the first obvious candidate.  Maybe that is the mechanism
> through which libxml ends up with dangling pointers.

Hmm.  I see that xml_in uses xml_parse, which does call
xmlCleanupParser.  However, I see that xml_in calls xmlFreeDoc _after_
xmlCleanupParser.  Maybe that's a bug in itself.  However, I doubt that
explains the issue at hand, because as far as the segmentation fault,
nobody has called xmlCleanupParser yet.  Perhaps look for another
candidate.

Another thing I'm wondering is why xml_parse is wasting time returning
the parsed document, if all callers just call xmlFreeDoc() on it
immediately.

Hmm ... parse_xml_decl is seen calling xml_init and libxml functions,
but not xmlCleanupParser.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Alvaro Herrera escribió:
> Tom Lane escribió:
>
> > We might be able to compromise by only resetting the context after
> > an error, but this is still only possible if we have a way to make
> > libxml let go of *all* pointers to alloc'd objects.  I don't understand
> > your comment that xmlCleanupParser solves it --- we call that already,
> > and it doesn't seem to be preventing the problem.
>
> With the attached patch, it doesn't crash, but I see the added WARNING
> four times in the log, which is proof that the cleanup thing is not
> called as the code seems to think.

Update: it does crash for another of the test cases by Sokolov, if
executed more than once.  Reason: it calls xml_init after
xmlCleanupParser has been called, so the memory context is created
again, only to be reset later.  The culprit seems to be
xml_out_internal.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:

> > Perhaps a better idea is to create a separate LibxmlContext memcxt,
> > child of QueryContext, and have xml_palloc etc always use that.  That
> > way it won't be reset between calls.  It probably also means we could
> > wire xml reset to transaction abort, making it a bit simpler.
>
> Might as well go back to letting it use malloc :-(.
>
> I actually don't see a problem with letting it use malloc for stuff that
> it is going to manage for itself.  I guess the problem comes with
> transient return values of libxml functions; we'd want to explicitly
> move those into palloc-based storage and then free() them.
>
> This looks like a rather large rewrite though.  Peter, have you any
> better ideas?

OK, after a lot of research I think the best way to deal with this is
what I suggest above.  With the attached patch, I cannot cause the
system to crash with any of the given examples.

Furthermore this may help clean up the PG_TRY blocks that are currently
sprinkled through the xml.c code.

Handling of subtransactions is no good at this point, but I think that
could easily be improved.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> + void
> + AtEOXact_xml(void)
> + {
> +     if (LibxmlContext == NULL)
> +         return;
> +
> +     MemoryContextDelete(LibxmlContext);
> +     LibxmlContext = NULL;
> +
> +     xmlCleanupParser();
> + }

[ blink... ]  Shouldn't that be the other way around?  It looks to me
like xmlCleanupParser will be pfree'ing already-deleted data.  Did you
test this with CLOBBER_FREED_MEMORY active?

Also, I don't see how this patch fixes what I believe to be the
fundamental problem, which is that we have code paths that invoke
libxml without having previously initialized the alloc support.
I think the sequence

    if (LibxmlContext == NULL)
    {
        create LibxmlContext;
        xmlMemSetup(...);
    }

ought to be executed before *any* use of libxml stuff (which suggests
factoring it out as a subroutine...)

We also need to do some testing to see if letting the thing go until
transaction end is OK, or whether we will see unacceptable memory leaks
over long series of XML calls.  (In particular I suspect that we'd
better zap the context at subtransaction abort; but even without any
error, are we sure that normal operations don't leak memory?)

One thing I was wondering about earlier today is whether libxml isn't
expecting NULL-return-on-failure from the malloc-substitute routine.
If we take control away from it unexpectedly, I wouldn't be a bit
surprised if its data structures are left corrupt.  This might lead to
failures during cleanup.

I do like the idea of getting rid of the PG_TRY blocks and the
associated cleanup attempts; with the approach you're converging on
here, we need only assume that xmlCleanupParser() will get rid of
all staticly-allocated pointers and not crash, whereas right now we
are assuming a great deal of libxml stuff still works after an ereport
(which makes throwing ereport from xml_palloc even more risky).

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Alvaro Herrera
Date:
Tom Lane escribió:

> One thing I was wondering about earlier today is whether libxml isn't
> expecting NULL-return-on-failure from the malloc-substitute routine.
> If we take control away from it unexpectedly, I wouldn't be a bit
> surprised if its data structures are left corrupt.  This might lead to
> failures during cleanup.

Hmm, this is a very good point.  I quick look at the source shows that
they are not very consistent on its own checking for memory allocation
errors.  For example, see a bug I just reported:

http://bugzilla.gnome.org/show_bug.cgi?id=508662

The problem is that many routines look like this:

xmlXPathNewNodeSet(xmlNodePtr val) {
    xmlXPathObjectPtr ret;

    ret = (xmlXPathObjectPtr) xmlMalloc(sizeof(xmlXPathObject));
    if (ret == NULL) {
        xmlXPathErrMemory(NULL, "creating nodeset\n");
        return(NULL);
    }


and others would call this code and then happily use the return value
without checking for null.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> One thing I was wondering about earlier today is whether libxml isn't
>> expecting NULL-return-on-failure from the malloc-substitute routine.
>> If we take control away from it unexpectedly, I wouldn't be a bit
>> surprised if its data structures are left corrupt.  This might lead to
>> failures during cleanup.

> Hmm, this is a very good point.  I quick look at the source shows that
> they are not very consistent on its own checking for memory allocation
> errors.  For example, see a bug I just reported:
> http://bugzilla.gnome.org/show_bug.cgi?id=508662

Ugh.  So we're pretty much damned if we do and damned if we don't.

Given what you showed, it is certain that we are at risk if we return
NULL, whereas it is merely hypothetical that we are at risk if we
longjmp.  So let's stick to the palloc infrastructure for now.

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Tom Lane
Date:
"Sokolov Yura" <funny.falcon@gmail.com> writes:
> xpath() crashes backend when is querying particular xmlagg results.

Please see whether this patch fixes it:

http://archives.postgresql.org/pgsql-committers/2008-01/msg00190.php

The specific examples you give seem to be fixed, but that's not very
much of a test case for such a low-level issue.

            regards, tom lane

Re: BUG #3860: xpath crashes backend when is querying xmlagg result

From
Sokolov Yura
Date:
Tom Lane wrote:
> "Sokolov Yura" <funny.falcon@gmail.com> writes:
>
>> xpath() crashes backend when is querying particular xmlagg results.
>>
>
> Please see whether this patch fixes it:
>
> http://archives.postgresql.org/pgsql-committers/2008-01/msg00190.php
>
> The specific examples you give seem to be fixed, but that's not very
> much of a test case for such a low-level issue.
>
>             regards, tom lane
>
>
I have no other tests. I was just playing with the cool new feature.