Thread: [HACKERS] bugfix: xpath encoding issue

[HACKERS] bugfix: xpath encoding issue

From
Pavel Stehule
Date:
Hi

When I tested XMLTABLE function I found a bug of XPATH function - xpath_internal

There xmltype is not correctly encoded to xmlChar due possible invalid encoding info in header. It is possible when XML was loaded with recv function and has not UTF8 encoding.

The functions based on xml_parse function works well.

The fix is simple

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index f81cf489d2..89aae48cb3 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
        ns_count = 0;
    }
 
-   datastr = VARDATA(data);
-   len = VARSIZE(data) - VARHDRSZ;
+   datastr = xml_out_internal(data, 0);
+   len = strlen(datastr);
+
    xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
+
    if (xpath_len == 0)
        ereport(ERROR,
                (errcode(ERRCODE_DATA_EXCEPTION),

Regards

Pavel

Re: [HACKERS] bugfix: xpath encoding issue

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> Hi
> 
> When I tested XMLTABLE function I found a bug of XPATH function -
> xpath_internal
> 
> There xmltype is not correctly encoded to xmlChar due possible invalid
> encoding info in header. It is possible when XML was loaded with recv
> function and has not UTF8 encoding.

Hmm ... is it possible to create a test that verifies this?  I suppose
we could have a non-utf8 value as bytea.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] bugfix: xpath encoding issue

From
Pavel Stehule
Date:


2017-04-13 17:19 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> When I tested XMLTABLE function I found a bug of XPATH function -
> xpath_internal
>
> There xmltype is not correctly encoded to xmlChar due possible invalid
> encoding info in header. It is possible when XML was loaded with recv
> function and has not UTF8 encoding.

Hmm ... is it possible to create a test that verifies this?  I suppose
we could have a non-utf8 value as bytea.

I have not any idea. The problem is in utf8 encoded xml with bad header. There is dependency to database encoding and supported locales.

Maybe I can simulate it with cast without function from bytea to xml - but it looks more than little bit dirty

Regards

Pavel

 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services