Thread: XMLPARSE() evaluated multiple times?

XMLPARSE() evaluated multiple times?

From
Tim Landscheidt
Date:
Hi,

I have been playing around with PostgreSQL's XML support
lately (cf.
<URI:news:m3ljmocolf.fsf@passepartout.tim-landscheidt.de>)
and stumbled upon some performance issues related to
XMLPARSE(). In my "application", the XML document is supp-
lied as a string constant via a DBI ? parameter, for testing
purposes I have put it into a separate table:

| tim=# \timing
| Zeitmessung ist an.
| tim=# SELECT LENGTH(XMLasText) FROM tmpTestData;
|  length
| --------
|  364446
| (1 Zeile)

| Zeit: 6,295 ms
| tim=# SELECT SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM tmpTestData;
|  substring
| -----------
|  <
| (1 Zeile)

| Zeit: 40,072 ms
| tim=#

(The SUBSTRING()s above and following are for reasons of
brevity only; the results are comparable when the raw XML is
queried.)

| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM generate_series(1, 10) AS G(A),
tmpTestData;
|  a  | substring
| ----+-----------
|   1 | <
| [...]
|  10 | <
| (10 Zeilen)

| Zeit: 416,069 ms
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM generate_series(1, 100) AS G(A),
tmpTestData;
|   a  | substring
| -----+-----------
|    1 | <
| [...]
|  100 | <
| (100 Zeilen)

| Zeit: 3029,196 ms
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM generate_series(1, 1000) AS G(A),
tmpTestData;
|   a   | substring
| ------+-----------
|     1 | <
|  1000 | <
| (1000 Zeilen)

| Zeit: 30740,626 ms
| tim=#

It seems that XMLPARSE() is called for every row without
PostgreSQL realizing that it is IMMUTABLE. This even seems
to be the case if the XMLPARSE() is part of a WHERE clause:

| tim=# SELECT G.A FROM generate_series(1, 10) AS G(A) WHERE G.A::TEXT = XMLPARSE(DOCUMENT (SELECT XMLasText FROM
tmpTestData))::TEXT;
|  a
| ---
| (0 Zeilen)

| Zeit: 240,626 ms
| tim=# SELECT G.A FROM generate_series(1, 100) AS G(A) WHERE G.A::TEXT = XMLPARSE(DOCUMENT (SELECT XMLasText FROM
tmpTestData))::TEXT;
|  a
| ---
| (0 Zeilen)

| Zeit: 2441,135 ms
| tim=# SELECT G.A FROM generate_series(1, 1000) AS G(A) WHERE G.A::TEXT = XMLPARSE(DOCUMENT (SELECT XMLasText FROM
tmpTestData))::TEXT;
|  a
| ---
| (0 Zeilen)

| Zeit: 25228,180 ms
| tim=#

Obviously, the "problem" can be circumvented by "caching"
the results of the XMLPARSE() in a temporary table (or even
a IMMUTABLE function?), but I would assume that this should
be PostgreSQL's task.

  Any thoughts why this is not the case already? :-)

Tim

Re: XMLPARSE() evaluated multiple times?

From
Tom Lane
Date:
Tim Landscheidt <tim@tim-landscheidt.de> writes:
> It seems that XMLPARSE() is called for every row without
> PostgreSQL realizing that it is IMMUTABLE.

Indeed, the system doesn't consider it immutable.  None of the examples
you show would benefit if it did, though.

I believe there are GUC-parameter dependencies that prevent us from
treating it as truly immutable, but if you want to ignore that
consideration and force constant-folding anyway, you could wrap it
in a SQL function that's marked as IMMUTABLE.

            regards, tom lane