Thread: proposal casting from XML[] to int[], numeric[], text[]
Hello Current result from xpath function isn't indexable. It cannot be problem with possibility cast it to some base types. CREATE OR REPLACE FUNCTION xml_list_to_int(xml[]) RETURNS int[] AS $$ SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int FROM generate_series(1, array_upper($1,1)) g(i)) $$ LANGUAGE SQL IMMUTABLE; CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]); -- now I can build functional index CREATE INDEX fx ON foo USING GIN((xpath('//id/text()',objednavka_v_xml)::int[])); Does anybody know better solution? Regards Pavel Stehule
Am Dienstag, 25. September 2007 schrieb Pavel Stehule: > Current result from xpath function isn't indexable. It cannot be > problem with possibility cast it to some base types. Nikolay might be able to remind us what happened to the proposed functions xpath_bool, xpath_text, etc. -- Peter Eisentraut http://developer.postgresql.org/~petere/
The problem with contrib/xml2's xpath_* functions (that return scalars) was that they are very specific. If XPath expression evaluation returns array of values (set of XML pieces), but the function returns only the first, significant information is lost, while there is no any gain in speed at all. The key idea was to create only one generic function at the first stage -- xpath(), returning an array of XML pieces. We would create wrappers returning int[], bool[], string[], but there are several issues with such functions: - if the type of the data located on nodes that match XPath expression differs from what is expected, what should we do? - in XML world, if you request for a text under some node, all descendants should be involved in generating result string (example: what should be returned for XML like "<em><strong>PostgreSQL</strong> is a powerful, open source relational database system</em>" if user requests for text under "em" node? In XML world, the correct answer is "PostgreSQL is a powerful, open source relational database system" -- concatenation of all strings from the node itself and all its descendants, in the correct order. Will be this expected for RDBMS users?). Regarding GIN indexes, alternative approach would be creating opclass for xml[], it should be pretty simple (and better than creating implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this for 8.3 or it's too late? It would be very helpful feature. Without that, the only way to have indexes is to use functional btree indexes over XPath expression (smth like "...btree(((xpath('...', field)[1]::text))" -- pretty ugly construction...) On 9/25/07, Peter Eisentraut <peter_e@gmx.net> wrote: > Am Dienstag, 25. September 2007 schrieb Pavel Stehule: > > Current result from xpath function isn't indexable. It cannot be > > problem with possibility cast it to some base types. > > Nikolay might be able to remind us what happened to the proposed functions > xpath_bool, xpath_text, etc. > > -- > Peter Eisentraut > http://developer.postgresql.org/~petere/ > -- Best regards, Nikolay
> We would create wrappers returning int[], bool[], string[], but there > are several issues with such functions: > - if the type of the data located on nodes that match XPath > expression differs from what is expected, what should we do? raise exception > - in XML world, if you request for a text under some node, all > descendants should be involved in generating result string (example: > what should be returned for XML like "<em><strong>PostgreSQL</strong> > is a powerful, open source relational database system</em>" if user > requests for text under "em" node? In XML world, the correct answer is > "PostgreSQL is a powerful, open source relational database system" -- > concatenation of all strings from the node itself and all its > descendants, in the correct order. Will be this expected for RDBMS > users?). It is corect. Or we can disallow any nested elements in casting array. It's poblem only for text type. Numeric types are clear. > Regarding GIN indexes, alternative approach would be creating opclass > for xml[], it should be pretty simple (and better than creating > implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this > for 8.3 or it's too late? It would be very helpful feature. It's not practic. If I would to use it for functional indexes for xpath functions I need constructor for xml[], and I have not it currently: xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222] > > Without that, the only way to have indexes is to use functional btree > indexes over XPath expression (smth like "...btree(((xpath('...', > field)[1]::text))" -- pretty ugly construction...) It's not usefull, if xpath returns more values Regards Pavel Stehule
On 9/28/07, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > We would create wrappers returning int[], bool[], string[], but there > > are several issues with such functions: > > - if the type of the data located on nodes that match XPath > > expression differs from what is expected, what should we do? > > raise exception Will it be convenient for cases when there are many different (various structures) XMLs in one column (no single DTD)? > > > - in XML world, if you request for a text under some node, all > > descendants should be involved in generating result string (example: > > what should be returned for XML like "<em><strong>PostgreSQL</strong> > > is a powerful, open source relational database system</em>" if user > > requests for text under "em" node? In XML world, the correct answer is > > "PostgreSQL is a powerful, open source relational database system" -- > > concatenation of all strings from the node itself and all its > > descendants, in the correct order. Will be this expected for RDBMS > > users?). > > It is corect. Or we can disallow any nested elements in casting array. > It's poblem only for text type. Numeric types are clear. Actually, casting to numeric types might seem to be odd. But there is some sense from practical point of view -- it works and that's better that nothing (like now). But it's too late for 8.3, isn't it? > > > Regarding GIN indexes, alternative approach would be creating opclass > > for xml[], it should be pretty simple (and better than creating > > implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this > > for 8.3 or it's too late? It would be very helpful feature. > > It's not practic. If I would to use it for functional indexes for > xpath functions I need constructor for xml[], and I have not it > currently: > > xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222] I do not understand. Do you mean that there is no equality comparison operator for type xml yet? To implement GIN for xml[] we need to have comparison operator for xml. Standard says "XML values are not comparable" (subclause 4.2.4 of the latest draft from wiscorp.com), but without that cannot implement straight GIN support, what is not good :-/ -- Best regards, Nikolay
2007/9/28, Nikolay Samokhvalov <nikolay@samokhvalov.com>: > On 9/28/07, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > We would create wrappers returning int[], bool[], string[], but there > > > are several issues with such functions: > > > - if the type of the data located on nodes that match XPath > > > expression differs from what is expected, what should we do? > > > > raise exception > > Will it be convenient for cases when there are many different (various > structures) XMLs in one column (no single DTD)? > I don't know > > > > > - in XML world, if you request for a text under some node, all > > > descendants should be involved in generating result string (example: > > > what should be returned for XML like "<em><strong>PostgreSQL</strong> > > > is a powerful, open source relational database system</em>" if user > > > requests for text under "em" node? In XML world, the correct answer is > > > "PostgreSQL is a powerful, open source relational database system" -- > > > concatenation of all strings from the node itself and all its > > > descendants, in the correct order. Will be this expected for RDBMS > > > users?). > > > > It is corect. Or we can disallow any nested elements in casting array. > > It's poblem only for text type. Numeric types are clear. > > Actually, casting to numeric types might seem to be odd. But there is > some sense from practical point of view -- it works and that's better > that nothing (like now). But it's too late for 8.3, isn't it? > I thing so SQL based casting like my cust functions are relative simple for adding to core now. > > > > > Regarding GIN indexes, alternative approach would be creating opclass > > > for xml[], it should be pretty simple (and better than creating > > > implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this > > > for 8.3 or it's too late? It would be very helpful feature. > > > > It's not practic. If I would to use it for functional indexes for > > xpath functions I need constructor for xml[], and I have not it > > currently: > > > > xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222] > > I do not understand. Do you mean that there is no equality comparison > operator for type xml yet? > No, I mean some different. Nobody will construct special xml nodes for quality comparision with xpath function when expect xpath's result as int[], or float. So when result of xpath is xml[] but is with possible casting to int[] it's more simple do casting and build index on int[] because I can search int[]. > To implement GIN for xml[] we need to have comparison operator for > xml. Standard says "XML values are not comparable" (subclause 4.2.4 of > the latest draft from wiscorp.com), but without that cannot implement > straight GIN support, what is not good :-/ > I belive so xml values are not comparable, but I belive so the are transferable to some of base types. Pavel
Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov: > what should be returned for XML like "<em><strong>PostgreSQL</strong> > is a powerful, open source relational database system</em>" if user > requests for text under "em" node? In XML world, the correct answer is > "PostgreSQL is a powerful, open source relational database system" -- > concatenation of all strings from the node itself and all its > descendants, in the correct order. Will be this expected for RDBMS > users?). Well, if that is the defined behavior for XPath, then that's what we should do. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov: > >> what should be returned for XML like "<em><strong>PostgreSQL</strong> >> is a powerful, open source relational database system</em>" if user >> requests for text under "em" node? In XML world, the correct answer is >> "PostgreSQL is a powerful, open source relational database system" -- >> concatenation of all strings from the node itself and all its >> descendants, in the correct order. Will be this expected for RDBMS >> users?). >> > > Well, if that is the defined behavior for XPath, then that's what we should > do. > > The xpath string value of a single node is the concatentation of the text children of the node and all its children in document order, IIRC. But that's not what we're dealing with here. xpath() doesn't return a single node but a node set (or so say the docs). The string value of a node set is in effect the string value of its first member, which seems less than useful in this context, or at least no great guide for us. I think there's probably a good case for a cast from xml[] to text[] if we don't have one. cheers andrew
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > Peter Eisentraut wrote: > > Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov: > > > >> what should be returned for XML like "<em><strong>PostgreSQL</strong> > >> is a powerful, open source relational database system</em>" if user > >> requests for text under "em" node? In XML world, the correct answer is > >> "PostgreSQL is a powerful, open source relational database system" -- > >> concatenation of all strings from the node itself and all its > >> descendants, in the correct order. Will be this expected for RDBMS > >> users?). > >> > > > > Well, if that is the defined behavior for XPath, then that's what we should > > do. > > > > > > The xpath string value of a single node is the concatentation of the > text children of the node and all its children in document order, IIRC. > But that's not what we're dealing with here. xpath() doesn't return a > single node but a node set (or so say the docs). The string value of a > node set is in effect the string value of its first member, which seems > less than useful in this context, or at least no great guide for us. > > I think there's probably a good case for a cast from xml[] to text[] if > we don't have one. > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sep 25, 2007 10:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Alternative (and maybe better) approach would be:
- create comparison functions that work in the same way as string comparison functions do (currently, it's straight forward since XML is stored as string);
- do NOT create comparison operators to avoid explicit comparing XML values (to follow standard ways to work with XML and to avoid possible unexpected behaviors);
- create opclass based on these functions and, therefore, obtain GIN indexes support for xml[];
- describe in the docs, that one can use GIN indexes over XPath expressions, but should be aware that comparison with non-trivial XML constants have to be used carefully because of possible problems with whitespaces, etc (in other words, comparison here is doing letter by letter, as for varchar).
If there are no objections I'll send patch for this.
Hello
Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.
CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;
CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);
-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));
Does anybody know better solution?
Alternative (and maybe better) approach would be:
- create comparison functions that work in the same way as string comparison functions do (currently, it's straight forward since XML is stored as string);
- do NOT create comparison operators to avoid explicit comparing XML values (to follow standard ways to work with XML and to avoid possible unexpected behaviors);
- create opclass based on these functions and, therefore, obtain GIN indexes support for xml[];
- describe in the docs, that one can use GIN indexes over XPath expressions, but should be aware that comparison with non-trivial XML constants have to be used carefully because of possible problems with whitespaces, etc (in other words, comparison here is doing letter by letter, as for varchar).
If there are no objections I'll send patch for this.
On 11/11/2007, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > > > On Sep 25, 2007 10:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Hello > > > > Current result from xpath function isn't indexable. It cannot be > > problem with possibility cast it to some base types. > > > > CREATE OR REPLACE FUNCTION xml_list_to_int(xml[]) > > RETURNS int[] AS $$ > > SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int > > FROM generate_series(1, array_upper($1,1)) g(i)) > > $$ LANGUAGE SQL IMMUTABLE; > > > > > > CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]); > > > > -- now I can build functional index > > CREATE INDEX fx ON foo USING > > GIN((xpath('//id/text()',objednavka_v_xml)::int[])); > > > > Does anybody know better solution? > > > > Alternative (and maybe better) approach would be: > - create comparison functions that work in the same way as string > comparison functions do (currently, it's straight forward since XML is > stored as string); > - do NOT create comparison operators to avoid explicit comparing XML values > (to follow standard ways to work with XML and to avoid possible unexpected > behaviors); > - create opclass based on these functions and, therefore, obtain GIN > indexes support for xml[]; > - describe in the docs, that one can use GIN indexes over XPath > expressions, but should be aware that comparison with non-trivial XML > constants have to be used carefully because of possible problems with > whitespaces, etc (in other words, comparison here is doing letter by letter, > as for varchar). > > If there are no objections I'll send patch for this. > It's good proposal. So only this is solution for indexing. I belive so casting from xml[] to any others (mainly varchar[] and numeric[] can be usefull. Regards Pavel
"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes: > Alternative (and maybe better) approach would be: > - create comparison functions that work in the same way as string > comparison functions do (currently, it's straight forward since XML is > stored as string); > - do NOT create comparison operators to avoid explicit comparing XML values > (to follow standard ways to work with XML and to avoid possible unexpected > behaviors); > - create opclass based on these functions and, therefore, obtain GIN > indexes support for xml[]; I'm not clear on what you're proposing. There is no such thing as an opclass with no operators (or at least, not a useful one), so this seems mutually contradictory. regards, tom lane
On Nov 12, 2007 12:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not clear on what you're proposing. There is no such thing as an > opclass with no operators (or at least, not a useful one), so this seems > mutually contradictory. > > regards, tom lane > You're right, that's my mistake, sorry. So, having casting rules seems to be the only option..
Nikolay Samokhvalov wrote: > On Nov 12, 2007 12:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm not clear on what you're proposing. There is no such thing as an >> opclass with no operators (or at least, not a useful one), so this seems >> mutually contradictory. >> >> regards, tom lane >> >> > > You're right, that's my mistake, sorry. So, having casting rules seems > to be the only option.. > We can already cast as text[], and so we can do this: andrew=# select xpath('//foo/text()','<a><foo>1</foo><foo>2</foo></a>')::text[]::int[];xpath -------{1,2} (1 row) So why do we desperately need anything extra at all? cheers andrew
> > > > You're right, that's my mistake, sorry. So, having casting rules seems > > to be the only option.. > > > > > We can already cast as text[], and so we can do this: > > andrew=# select > xpath('//foo/text()','<a><foo>1</foo><foo>2</foo></a>')::text[]::int[]; > xpath > ------- > {1,2} > (1 row) > > > So why do we desperately need anything extra at all? > I was blind. My problem was with function index over xml array, that isn't indexable. I didn't find multiple casting. Regards Pavel > cheers > > andrew >
Added to TODO: * Allow xml arrays to be cast to other data types http://archives.postgresql.org/pgsql-hackers/2007-09/msg00981.php http://archives.postgresql.org/pgsql-hackers/2007-10/msg00231.php http://archives.postgresql.org/pgsql-hackers/2007-11/msg00471.php --------------------------------------------------------------------------- Pavel Stehule wrote: > Hello > > Current result from xpath function isn't indexable. It cannot be > problem with possibility cast it to some base types. > > CREATE OR REPLACE FUNCTION xml_list_to_int(xml[]) > RETURNS int[] AS $$ > SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int > FROM generate_series(1, array_upper($1,1)) g(i)) > $$ LANGUAGE SQL IMMUTABLE; > > > CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]); > > -- now I can build functional index > CREATE INDEX fx ON foo USING > GIN((xpath('//id/text()',objednavka_v_xml)::int[])); > > Does anybody know better solution? > > Regards > Pavel Stehule > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +