Thread: possible to create multivalued index from xpath() results in 8.3?

possible to create multivalued index from xpath() results in 8.3?

From
"Matt Magoffin"
Date:
I've working with XML in Postgres 8.3 and am trying to find a way to
create a text-based index from an XPath that returns multiple nodes. For
example, if I have an XPath like

/elem[@key="mykey"]/text()

which might return a few text nodes like

value1
value2
value3

I'd like 3 index values associated with this row's index key: {value1,
value2, value3). I was trying to figure out a way to define an index like
this but ran into a couple of issues:

1) The xpath() function returns an array of XML type, but in the above
example the text nodes are joined together into a single xml result node,
like {value1value2value3}. How can I get it to return 3 individual text
nodes, so an array of 3 values instead of 1?

2) Even if I could have an xpath() result return an array with multiple
values, like {value1,value2,value3} I wasn't able to define a GIN index
against the xml[] type. Should this be possible?

-- m@

Re: possible to create multivalued index from xpath() results in 8.3?

From
Tom Lane
Date:
"Matt Magoffin" <postgresql.org@msqr.us> writes:
> 2) Even if I could have an xpath() result return an array with multiple
> values, like {value1,value2,value3} I wasn't able to define a GIN index
> against the xml[] type. Should this be possible?

Dunno about your other questions, but the answer to this one is "no"
--- AFAICS there is no indexing support of any kind for the xml type
in 8.3.  Try casting to text[] instead.

            regards, tom lane

Re: possible to create multivalued index from xpath() results in 8.3?

From
"Matt Magoffin"
Date:
> "Matt Magoffin" <postgresql.org@msqr.us> writes:
>> 2) Even if I could have an xpath() result return an array with multiple
>> values, like {value1,value2,value3} I wasn't able to define a GIN index
>> against the xml[] type. Should this be possible?
>
> Dunno about your other questions, but the answer to this one is "no"
> --- AFAICS there is no indexing support of any kind for the xml type
> in 8.3.  Try casting to text[] instead.

I had tried that, but it does not actually work because of my first issue,
of a way for the XPath to return 3 individual text nodes. If I return 3
elements, like

/elem[@key="mykey"] => {
  <elem key="mykey">value1</elem>,
  <elem key="mykey">value2</elem>,
  <elem key="mykey">value3</elem>
}

and cast that to text[], I get 3 XML strings, including the <elem></elem>.
I want only the element text content.

Should the xpath() function return 3 individual text nodes like this:

/elem[@key="mykey"]/text() => {
  value1,
  value2,
  value3
}

rather than concatenating these into a single text node result? I also
tried something like

string(/elem[@key="mykey"])

but that throws an XPath error. It looks like the code is converting this
to /x/string(/elem[@key="mykey"] internally, which is not a valid XPath.

So if xpath() cannot return individual text node results, would a possible
solution be to write a trigger function that generates a tsvector from the
XML array, and then use text search to locate matches?

-- m@

Re: possible to create multivalued index from xpath() results in 8.3?

From
Tom Lane
Date:
"Matt Magoffin" <postgresql.org@msqr.us> writes:
> Should the xpath() function return 3 individual text nodes like this:

> /elem[@key="mykey"]/text() => {
>   value1,
>   value2,
>   value3
> }

> rather than concatenating these into a single text node result?

AFAICT that's exactly what it does.

regression=#  select xpath('//foo[@key="mykey"]/text()', '<value>ABC<foo key="mykey">XYZ</foo></value><foo
key="mykey">RST</foo><foo>DEF</foo>');
   xpath
-----------
 {XYZ,RST}
(1 row)

regression=#

Of course this is of type xml[], but you can cast to text[] and then
index.

            regards, tom lane

Re: possible to create multivalued index from xpath() results in 8.3?

From
"Matt Magoffin"
Date:
> AFAICT that's exactly what it does.
>
> regression=#  select xpath('//foo[@key="mykey"]/text()', '<value>ABC<foo
> key="mykey">XYZ</foo></value><foo key="mykey">RST</foo><foo>DEF</foo>');
>    xpath
> -----------
>  {XYZ,RST}
> (1 row)
>
> regression=#
>
> Of course this is of type xml[], but you can cast to text[] and then
> index.

Ugh, you're right of course! Somehow I had this wrong. So I tried to
create an index on the xml[] result by casting to text[] but I got the
"function must be immutable" error. Is there any reason the xml[] to
text[] cast is not immutable?

I worked around it by writing a function like

CREATE OR REPLACE FUNCTION xpath_to_text(xml_array xml[]) RETURNS text[] AS
$BODY$
BEGIN
    RETURN xml_array::text[];
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;

and wrapping my CREATE INDEX call with that, like:

create index type_flag_idx on lead using gin (
    (xpath_to_text(xpath('/element[@key="foo"]/text()', xml)))
);

-- m@

Re: possible to create multivalued index from xpath() results in 8.3?

From
Tom Lane
Date:
"Matt Magoffin" <postgresql.org@msqr.us> writes:
> Ugh, you're right of course! Somehow I had this wrong. So I tried to
> create an index on the xml[] result by casting to text[] but I got the
> "function must be immutable" error. Is there any reason the xml[] to
> text[] cast is not immutable?

Hmm ... I see that xmltotext() is marked 'stable' in pg_proc.h,
but texttoxml() is marked 'immutable', which is at best inconsistent.
It looks to me like they both depend on the GUC setting "xmloption",
which would mean they should both be stable.  Peter, is there a bug
there?  Also, is there a way to get rid of the GUC dependency so that
there's a reasonably safe way to index XML values?

            regards, tom lane