Thread: Nested literal parsing rules?

Nested literal parsing rules?

From
Maciek Sakrejda
Date:
Hi,

I have a UDT:

CREATE TYPE foo AS (a integer, b integer, c hstore);

I need to construct the literal representation of an array of these
for input to a textual COPY. The hstore escaping rules are fairly
clear (I can quote all keys and values with double quotes, and escape
double quotes in keys and values with a backslash (two backslahses,
actually, due to standard_conforming_strings being off). Putting this
into my UDT literal is also fairly clear (I quote the full hstore
literal with double quotes, I double the double quotes around hstore
keys and values, and add four more backslashes (one for the hstore
literal, one for the UDT literal, and double that for
standard_conforming_strings). So each quote in an hstore key or value
is preceded by six backslashes (and each backslash is preceded by
seven). However, this is where I get lost. I tried following both the
array docs and trying to reverse engineer the behavior by using the
ARRAY[...] syntax, but I can't figure out the logic. Any suggestions
(other than to run screaming from this affront to proper relational
design--I have my reasons for it)?

Thanks,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Nested literal parsing rules?

From
Tom Lane
Date:
Maciek Sakrejda <msakrejda@truviso.com> writes:
> I have a UDT:

> CREATE TYPE foo AS (a integer, b integer, c hstore);

> I need to construct the literal representation of an array of these
> for input to a textual COPY. The hstore escaping rules are fairly
> clear (I can quote all keys and values with double quotes, and escape
> double quotes in keys and values with a backslash (two backslahses,
> actually, due to standard_conforming_strings being off). Putting this
> into my UDT literal is also fairly clear (I quote the full hstore
> literal with double quotes, I double the double quotes around hstore
> keys and values, and add four more backslashes (one for the hstore
> literal, one for the UDT literal, and double that for
> standard_conforming_strings). So each quote in an hstore key or value
> is preceded by six backslashes (and each backslash is preceded by
> seven). However, this is where I get lost.

On the whole I think you'd be best off doing that in three passes.
First, run the appropriate escaping rules on each hstore key and
value string, and actually assemble a valid input string for hstore.
Then, run the value escaping rules for records on that string, and
assemble it together with the other field values into a valid input
string for a record type.  Finally, run the array escaping rules
on that string, and assemble it together with others into a valid
array input string.

The record and array escaping rules are here:
http://www.postgresql.org/docs/9.0/static/rowtypes.html#AEN7091
http://www.postgresql.org/docs/9.0/static/arrays.html#ARRAYS-IO
and I think the hstore rules are spelled out in the docs for that
contrib module.

> I tried following both the
> array docs and trying to reverse engineer the behavior by using the
> ARRAY[...] syntax, but I can't figure out the logic.

Neither record nor hstore quote exactly like arrays, so that's probably
part of your problem.  (No doubt it'd be better if they did, but we've
got too much historical baggage :-()

            regards, tom lane

Re: Nested literal parsing rules?

From
Maciek Sakrejda
Date:
> The record and array escaping rules are here:
> http://www.postgresql.org/docs/9.0/static/rowtypes.html#AEN7091
> http://www.postgresql.org/docs/9.0/static/arrays.html#ARRAYS-IO
> and I think the hstore rules are spelled out in the docs for that
> contrib module.

Thanks, that helped a lot. I think I was going off of what I read in
Section 8.15.2, but 8.15.5 is much more helpful. Should have kept
reading. And I just realized I'd been doubling backslashes too many
times for standard_conforming_strings, which certainly didn't help
things.

> Neither record nor hstore quote exactly like arrays, so that's probably
> part of your problem.  (No doubt it'd be better if they did, but we've
> got too much historical baggage :-()

I understand completely.

Thanks for your help,
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Nested literal parsing rules?

From
Maciek Sakrejda
Date:
Ok, so, stupid question: how does all this interact with COPY escaping
rules[1]?:

    Backslash characters (\) can be used in the COPY  data to quote
data characters that might otherwise be taken as row or column
delimiters. In particular, the following characters must  be preceded
by a backslash if they appear as part of a column value: backslash
itself, newline, carriage return, and the current delimiter character.


I imagined that this would be just another layer, but in my initial
tests, I was not doing this, and things worked. When I add it in, I
get too many levels of escaping. I can see this with even something
fairly simple:

postgres=# select version();
                                                  version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.4 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit
(1 row)

postgres=# create type foo as (a text);
CREATE TYPE
postgres=# create table bar(a foo[]);
CREATE TABLE
postgres=# copy bar from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {"(\\\\"baz\\\\")}
>> \.
ERROR:  malformed array literal: "{"(\\"baz\\")}"
CONTEXT:  COPY bar, line 1, column a: "{"(\\"baz\\")}"
postgres=# show standard_conforming_strings;
 standard_conforming_strings
-----------------------------
 off
(1 row)

My escaping logic for the above COPY input: "baz" is double-quoted as
a value in a UDT. When I put that UDT in an array literal, I enclose
it in double quotes and I precede all double quote characters with a
backslash. Because of standard_conforming_strings being off, I escape
all backslashes with backslashes (so I now have two backslashes and a
quote). Because of text-mode COPY, I escape each of those again (so I
now have four backslashes and a quote).

I seem to be missing something. Does standard_conforming_strings not
apply during COPY? Or is there something else I'm missing?

Thanks,

[1]: http://www.postgresql.org/docs/8.3/static/sql-copy.html
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
www.truviso.com

Re: Nested literal parsing rules?

From
Tom Lane
Date:
Maciek Sakrejda <msakrejda@truviso.com> writes:
> I seem to be missing something. Does standard_conforming_strings not
> apply during COPY?

It does not.  That setting is about literal strings in SQL commands.
The COPY escaping rules do not depend on it.

            regards, tom lane