Re: Another issue with invalid XML values - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Another issue with invalid XML values
Date
Msg-id 23388.1311118974@sss.pgh.pa.us
Whole thread Raw
In response to Re: Another issue with invalid XML values  (Florian Pflug <fgp@phlo.org>)
Responses Re: Another issue with invalid XML values
Re: Another issue with invalid XML values
List pgsql-hackers
Florian Pflug <fgp@phlo.org> writes:
> Updated patch attached. Do you think this is "Ready for Committer"?

I've been looking through this patch.  While it's mostly good, I'm
pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
deliberately designed to be non-reentrant (ie, throw an Assert if
pg_xml_init is called twice without pg_xml_done in between).
There are at least two issues with that:

1. If you forget pg_xml_done in some code path, you'll find out from
an Assert at the next pg_xml_init, which is probably far away from where
the actual problem is.

2. I don't think it's entirely unlikely that uses of libxml could be
nested.

xpath_table in particular calls an awful lot of stuff between
pg_xml_init and pg_xml_done, and is at the very least open to loss of
control via an elog before it's called pg_xml_done.

I think this patch has already paid 90% of the notational price for
supporting fully re-entrant use of libxml.  What I'm imagining is
that we move all five of the static variables (xml_strictness,
xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
xml_structuredErrorContext_saved) into a struct that's palloc'd
by pg_xml_init and eventually passed to pg_xml_done.  It could be
passed to xml_errorHandler via the currently-unused context argument.
A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

Now the risk factor if we do that is that if someone misses a
pg_xml_done call, we leave an error handler installed with a context
argument that's probably pointing at garbage, and if someone then tries
to use libxml without re-establishing their error handler, they've 
got problems.  But they'd have problems anyway with the current form of
the patch.  We could provide some defense against this by including a
magic identifier value in the palloc'd struct and making
xml_errorHandler check it before doing anything dangerous.  Also, we
could make pg_xml_done warn if libxml's current context pointer is
different from the struct passed to it, which would provide another
means of detection that somebody had missed a cleanup call.

Unless someone sees a major hole in this idea, or a better way to do it,
I'm going to modify the patch along those lines and commit.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: PgWest CFP closes in two weeks
Next
From: Bruce Momjian
Date:
Subject: Re: A few user-level questions on Streaming Replication and pg_upgrade