Thread: proposal casting from XML[] to int[], numeric[], text[]

proposal casting from XML[] to int[], numeric[], text[]

From
"Pavel Stehule"
Date:
Hello

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int               FROM generate_series(1, array_upper($1,1))
g(i))
$$ LANGUAGE SQL IMMUTABLE;


CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));

Does anybody know better solution?

Regards
Pavel Stehule


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Peter Eisentraut
Date:
Am Dienstag, 25. September 2007 schrieb Pavel Stehule:
> Current result from xpath function isn't indexable. It cannot be
> problem with possibility cast it to some base types.

Nikolay might be able to remind us what happened to the proposed functions 
xpath_bool, xpath_text, etc.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Nikolay Samokhvalov"
Date:
The problem with contrib/xml2's xpath_* functions (that return
scalars) was that they are very specific. If XPath expression
evaluation returns array of values (set of XML pieces), but the
function returns only the first, significant information is lost,
while there is no any gain in speed at all.

The key idea was to create only one generic function at the first
stage -- xpath(), returning an array of XML pieces.

We would create wrappers returning int[], bool[], string[], but there
are several issues with such functions: - if the type of the data located on nodes that match XPath
expression differs from what is expected, what should we do? - in XML world, if you request for a text under some node,
all
descendants should be involved in generating result string (example:
what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL  is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

Regarding GIN indexes, alternative approach would be creating opclass
for xml[], it should be pretty simple (and better than creating
implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
for 8.3 or it's too late? It would be very helpful feature.

Without that, the only way to have indexes is to use functional btree
indexes over XPath expression (smth like "...btree(((xpath('...',
field)[1]::text))" -- pretty ugly construction...)

On 9/25/07, Peter Eisentraut <peter_e@gmx.net> wrote:
> Am Dienstag, 25. September 2007 schrieb Pavel Stehule:
> > Current result from xpath function isn't indexable. It cannot be
> > problem with possibility cast it to some base types.
>
> Nikolay might be able to remind us what happened to the proposed functions
> xpath_bool, xpath_text, etc.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>


-- 
Best regards,
Nikolay


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Pavel Stehule"
Date:
> We would create wrappers returning int[], bool[], string[], but there
> are several issues with such functions:
>   - if the type of the data located on nodes that match XPath
> expression differs from what is expected, what should we do?

raise exception

>   - in XML world, if you request for a text under some node, all
> descendants should be involved in generating result string (example:
> what should be returned for XML like "<em><strong>PostgreSQL</strong>
> is a powerful, open source relational database system</em>" if user
> requests for text under "em" node? In XML world, the correct answer is
> "PostgreSQL  is a powerful, open source relational database system" --
> concatenation of all strings from the node itself and all its
> descendants, in the correct order. Will be this expected for RDBMS
> users?).

It is corect. Or we can disallow any nested elements in casting array.
It's poblem only for text type. Numeric types are clear.

> Regarding GIN indexes, alternative approach would be creating opclass
> for xml[], it should be pretty simple (and better than creating
> implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
> for 8.3 or it's too late? It would be very helpful feature.

It's not practic. If I would to use it for functional indexes for
xpath functions I need constructor for xml[], and I have not it
currently:

xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222]

>
> Without that, the only way to have indexes is to use functional btree
> indexes over XPath expression (smth like "...btree(((xpath('...',
> field)[1]::text))" -- pretty ugly construction...)

It's not usefull, if xpath returns more values

Regards
Pavel Stehule


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Nikolay Samokhvalov"
Date:
On 9/28/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > We would create wrappers returning int[], bool[], string[], but there
> > are several issues with such functions:
> >   - if the type of the data located on nodes that match XPath
> > expression differs from what is expected, what should we do?
>
> raise exception

Will it be convenient for cases when there are many different (various
structures) XMLs in one column (no single DTD)?

>
> >   - in XML world, if you request for a text under some node, all
> > descendants should be involved in generating result string (example:
> > what should be returned for XML like "<em><strong>PostgreSQL</strong>
> > is a powerful, open source relational database system</em>" if user
> > requests for text under "em" node? In XML world, the correct answer is
> > "PostgreSQL  is a powerful, open source relational database system" --
> > concatenation of all strings from the node itself and all its
> > descendants, in the correct order. Will be this expected for RDBMS
> > users?).
>
> It is corect. Or we can disallow any nested elements in casting array.
> It's poblem only for text type. Numeric types are clear.

Actually, casting to numeric types might seem to be odd. But there is
some sense from practical point of view -- it works and that's better
that nothing (like now). But it's too late for 8.3, isn't it?

>
> > Regarding GIN indexes, alternative approach would be creating opclass
> > for xml[], it should be pretty simple (and better than creating
> > implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
> > for 8.3 or it's too late? It would be very helpful feature.
>
> It's not practic. If I would to use it for functional indexes for
> xpath functions I need constructor for xml[], and I have not it
> currently:
>
> xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222]

I do not understand. Do you mean that there is no equality comparison
operator for type xml yet?

To implement GIN for xml[] we need to have comparison operator for
xml. Standard says "XML values are not comparable" (subclause 4.2.4 of
the latest draft from wiscorp.com), but without that cannot implement
straight GIN support, what is not good :-/

-- 
Best regards,
Nikolay


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Pavel Stehule"
Date:
2007/9/28, Nikolay Samokhvalov <nikolay@samokhvalov.com>:
> On 9/28/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > > We would create wrappers returning int[], bool[], string[], but there
> > > are several issues with such functions:
> > >   - if the type of the data located on nodes that match XPath
> > > expression differs from what is expected, what should we do?
> >
> > raise exception
>
> Will it be convenient for cases when there are many different (various
> structures) XMLs in one column (no single DTD)?
>

I don't know

> >
> > >   - in XML world, if you request for a text under some node, all
> > > descendants should be involved in generating result string (example:
> > > what should be returned for XML like "<em><strong>PostgreSQL</strong>
> > > is a powerful, open source relational database system</em>" if user
> > > requests for text under "em" node? In XML world, the correct answer is
> > > "PostgreSQL  is a powerful, open source relational database system" --
> > > concatenation of all strings from the node itself and all its
> > > descendants, in the correct order. Will be this expected for RDBMS
> > > users?).
> >
> > It is corect. Or we can disallow any nested elements in casting array.
> > It's poblem only for text type. Numeric types are clear.
>
> Actually, casting to numeric types might seem to be odd. But there is
> some sense from practical point of view -- it works and that's better
> that nothing (like now). But it's too late for 8.3, isn't it?
>

I thing so SQL based casting like my cust functions are relative
simple for adding to core now.
> >
> > > Regarding GIN indexes, alternative approach would be creating opclass
> > > for xml[], it should be pretty simple (and better than creating
> > > implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
> > > for 8.3 or it's too late? It would be very helpful feature.
> >
> > It's not practic. If I would to use it for functional indexes for
> > xpath functions I need constructor for xml[], and I have not it
> > currently:
> >
> > xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222]
>
> I do not understand. Do you mean that there is no equality comparison
> operator for type xml yet?
>

No, I mean some different. Nobody will construct special xml nodes for
quality comparision with xpath function when expect xpath's result as
int[], or float. So when result of xpath is xml[] but is with possible
casting to int[] it's more simple do casting and build index on int[]
because I can search int[].

> To implement GIN for xml[] we need to have comparison operator for
> xml. Standard says "XML values are not comparable" (subclause 4.2.4 of
> the latest draft from wiscorp.com), but without that cannot implement
> straight GIN support, what is not good :-/
>

I belive so xml values are not comparable, but I belive so the are
transferable to some of base types.

Pavel


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Peter Eisentraut
Date:
Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov:
> what should be returned for XML like "<em><strong>PostgreSQL</strong>
> is a powerful, open source relational database system</em>" if user
> requests for text under "em" node? In XML world, the correct answer is
> "PostgreSQL  is a powerful, open source relational database system" --
> concatenation of all strings from the node itself and all its
> descendants, in the correct order. Will be this expected for RDBMS
> users?).

Well, if that is the defined behavior for XPath, then that's what we should 
do.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Andrew Dunstan
Date:

Peter Eisentraut wrote:
> Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov:
>   
>> what should be returned for XML like "<em><strong>PostgreSQL</strong>
>> is a powerful, open source relational database system</em>" if user
>> requests for text under "em" node? In XML world, the correct answer is
>> "PostgreSQL  is a powerful, open source relational database system" --
>> concatenation of all strings from the node itself and all its
>> descendants, in the correct order. Will be this expected for RDBMS
>> users?).
>>     
>
> Well, if that is the defined behavior for XPath, then that's what we should 
> do.
>
>   

The xpath string value of a single node is the concatentation of the 
text children of the node and all its children in document order, IIRC. 
But that's not what we're dealing with here. xpath() doesn't return a 
single node but a node set (or so say the docs). The string value of a 
node set is in effect the string value of its first member, which seems 
less than useful in this context, or at least no great guide for us.

I think there's probably a good case for a cast from xml[] to text[] if 
we don't have one.

cheers

andrew


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> 
> Peter Eisentraut wrote:
> > Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov:
> >   
> >> what should be returned for XML like "<em><strong>PostgreSQL</strong>
> >> is a powerful, open source relational database system</em>" if user
> >> requests for text under "em" node? In XML world, the correct answer is
> >> "PostgreSQL  is a powerful, open source relational database system" --
> >> concatenation of all strings from the node itself and all its
> >> descendants, in the correct order. Will be this expected for RDBMS
> >> users?).
> >>     
> >
> > Well, if that is the defined behavior for XPath, then that's what we should 
> > do.
> >
> >   
> 
> The xpath string value of a single node is the concatentation of the 
> text children of the node and all its children in document order, IIRC. 
> But that's not what we're dealing with here. xpath() doesn't return a 
> single node but a node set (or so say the docs). The string value of a 
> node set is in effect the string value of its first member, which seems 
> less than useful in this context, or at least no great guide for us.
> 
> I think there's probably a good case for a cast from xml[] to text[] if 
> we don't have one.
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Nikolay Samokhvalov"
Date:


On Sep 25, 2007 10:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
               FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;


CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));

Does anybody know better solution?

Alternative (and maybe better) approach would be:
 - create comparison functions that work in the same way as string comparison functions do (currently, it's straight forward since XML is stored as string);
 - do NOT create comparison operators to avoid explicit comparing XML values (to follow standard ways to work with XML and to avoid possible unexpected behaviors);
 - create opclass based on these functions and, therefore, obtain GIN indexes support for xml[];
 - describe in the docs, that one can use GIN indexes over XPath expressions, but should be aware that comparison with non-trivial XML constants have to be used carefully because of possible problems with whitespaces, etc (in other words, comparison here is doing letter by letter, as for varchar).

If there are no objections I'll send patch for this.

Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Pavel Stehule"
Date:
On 11/11/2007, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
>
>
> On Sep 25, 2007 10:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > Hello
> >
> > Current result from xpath function isn't indexable. It cannot be
> > problem with possibility cast it to some base types.
> >
> > CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
> > RETURNS int[] AS $$
> > SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
> >                FROM generate_series(1, array_upper($1,1)) g(i))
> > $$ LANGUAGE SQL IMMUTABLE;
> >
> >
> > CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);
> >
> > -- now I can build functional index
> > CREATE INDEX fx ON foo USING
> > GIN((xpath('//id/text()',objednavka_v_xml)::int[]));
> >
> > Does anybody know better solution?
> >
>
> Alternative (and maybe better) approach would be:
>  - create comparison functions that work in the same way as string
> comparison functions do (currently, it's straight forward since XML is
> stored as string);
>  - do NOT create comparison operators to avoid explicit comparing XML values
> (to follow standard ways to work with XML and to avoid possible unexpected
> behaviors);
>  - create opclass based on these functions and, therefore, obtain GIN
> indexes support for xml[];
>  - describe in the docs, that one can use GIN indexes over XPath
> expressions, but should be aware that comparison with non-trivial XML
> constants have to be used carefully because of possible problems with
> whitespaces, etc (in other words, comparison here is doing letter by letter,
> as for varchar).
>
> If there are no objections I'll send patch for this.
>
It's good proposal. So only this is solution for indexing. I belive so
casting from xml[] to any others (mainly varchar[] and numeric[] can
be usefull.

Regards
Pavel


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Tom Lane
Date:
"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes:
> Alternative (and maybe better) approach would be:
>  - create comparison functions that work in the same way as string
> comparison functions do (currently, it's straight forward since XML is
> stored as string);
>  - do NOT create comparison operators to avoid explicit comparing XML values
> (to follow standard ways to work with XML and to avoid possible unexpected
> behaviors);
>  - create opclass based on these functions and, therefore, obtain GIN
> indexes support for xml[];

I'm not clear on what you're proposing.  There is no such thing as an
opclass with no operators (or at least, not a useful one), so this seems
mutually contradictory.
        regards, tom lane


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Nikolay Samokhvalov"
Date:
On Nov 12, 2007 12:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not clear on what you're proposing.  There is no such thing as an
> opclass with no operators (or at least, not a useful one), so this seems
> mutually contradictory.
>
>                         regards, tom lane
>

You're right, that's my mistake, sorry. So, having casting rules seems
to be the only option..


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Andrew Dunstan
Date:

Nikolay Samokhvalov wrote:
> On Nov 12, 2007 12:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>   
>> I'm not clear on what you're proposing.  There is no such thing as an
>> opclass with no operators (or at least, not a useful one), so this seems
>> mutually contradictory.
>>
>>                         regards, tom lane
>>
>>     
>
> You're right, that's my mistake, sorry. So, having casting rules seems
> to be the only option..
>   


We can already cast as text[], and so we can do this:

andrew=# select 
xpath('//foo/text()','<a><foo>1</foo><foo>2</foo></a>')::text[]::int[];xpath
-------{1,2}
(1 row)


So why do we desperately need anything extra at all?

cheers

andrew


Re: proposal casting from XML[] to int[], numeric[], text[]

From
"Pavel Stehule"
Date:
> >
> > You're right, that's my mistake, sorry. So, having casting rules seems
> > to be the only option..
> >
>
>
> We can already cast as text[], and so we can do this:
>
> andrew=# select
> xpath('//foo/text()','<a><foo>1</foo><foo>2</foo></a>')::text[]::int[];
>  xpath
> -------
>  {1,2}
> (1 row)
>
>
> So why do we desperately need anything extra at all?
>

I was blind. My problem was with function index over xml array, that
isn't indexable.

I didn't find multiple casting.

Regards
Pavel

> cheers
>
> andrew
>


Re: proposal casting from XML[] to int[], numeric[], text[]

From
Bruce Momjian
Date:
Added to TODO:

* Allow xml arrays to be cast to other data types
 http://archives.postgresql.org/pgsql-hackers/2007-09/msg00981.php
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00231.php
http://archives.postgresql.org/pgsql-hackers/2007-11/msg00471.php



---------------------------------------------------------------------------

Pavel Stehule wrote:
> Hello
> 
> Current result from xpath function isn't indexable. It cannot be
> problem with possibility cast it to some base types.
> 
> CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
> RETURNS int[] AS $$
> SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
>                 FROM generate_series(1, array_upper($1,1)) g(i))
> $$ LANGUAGE SQL IMMUTABLE;
> 
> 
> CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);
> 
> -- now I can build functional index
> CREATE INDEX fx ON foo USING
> GIN((xpath('//id/text()',objednavka_v_xml)::int[]));
> 
> Does anybody know better solution?
> 
> Regards
> Pavel Stehule
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +