Thread: Differences in Escaped bytea's when creating a plain pg_dump

Differences in Escaped bytea's when creating a plain pg_dump

From

Date:

23 June 2022, 15:13:50

Hello community,

I've some trouble in restoring a plain text pg_dump.
Postgres version is 13 x64 running on Windows10x64 installed from EDB 
package.

The database has bytea_output = 'escape' option because of some 
processing software needs it for historical reasons.

Dump command is:

pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 
--schema=public --host=localhost --username=myuser --inserts dbname

We have two tables that have a bytea-row.

But when I look at the dumpfile there is a difference between the 
escaped bytea-string. (note: both INSERT's from the same run of pg_dump 
in the dumpfile)


SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

[snip]

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, 

'\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000',

500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);

[snip]

INSERT INTO public.profiles VALUES (1, 1, 's', 152, 

'\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000',

'2016-08-25 00:00:00+02');

[snip]

When I restore them via pgadmin4 query tool, both INSERTS do work.

But when I read them with my c++ written software from file and send 
them with pqxx-Library as a transaction, the first bytea-string 
generates a fault 21020. (0x00 is not a valid utf8 sequence). I also 
checked the read string in c++ debugger, the single backslashes in the 
one case and the double backslashes in the other case are there.

So my questions are:
Why do we get one bytea-string with double backslashes (which works) and 
another one with single backslashes (which actually not works with 
pqxx-transactions)?

Can I convince pg_dump somehow, to generate double backslashes in all 
bytea-strings?

Why does pgadmin understand both formats. pqxx-transaction does not?


Thank you for this great database-system. I really like it.
Wolfgang



-- 
May the source be with you

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

Date:

23 June 2022, 16:21:37

Am 23.06.2022 um 17:13 schrieb WR:
> Hello community,
> 
> I've some trouble in restoring a plain text pg_dump.
> Postgres version is 13 x64 running on Windows10x64 installed from EDB 
> package.
> 
> The database has bytea_output = 'escape' option because of some 
> processing software needs it for historical reasons.
> 
> Dump command is:
> 
> pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 
> --schema=public --host=localhost --username=myuser --inserts dbname
> 
> We have two tables that have a bytea-row.
> 
> But when I look at the dumpfile there is a difference between the 
> escaped bytea-string. (note: both INSERT's from the same run of pg_dump 
> in the dumpfile)
> 
> 
> SET statement_timeout = 0;
> SET lock_timeout = 0;
> SET idle_in_transaction_session_timeout = 0;
> SET client_encoding = 'UTF8';
> SET standard_conforming_strings = on;
> SELECT pg_catalog.set_config('search_path', '', false);
> SET check_function_bodies = false;
> SET xmloption = content;
> SET client_min_messages = warning;
> SET row_security = off;
> 
> [snip]
> 
> INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
> 2000, 2400, 2400, 
>
'\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000',

> 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);
> 
> [snip]
> 
> INSERT INTO public.profiles VALUES (1, 1, 's', 152, 
>
'\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000',

> '2016-08-25 00:00:00+02');
> 
> [snip]
> 
> When I restore them via pgadmin4 query tool, both INSERTS do work.
> 
> But when I read them with my c++ written software from file and send 
> them with pqxx-Library as a transaction, the first bytea-string 
> generates a fault 21020. (0x00 is not a valid utf8 sequence). I also 
> checked the read string in c++ debugger, the single backslashes in the 
> one case and the double backslashes in the other case are there.
> 
> So my questions are:
> Why do we get one bytea-string with double backslashes (which works) and 
> another one with single backslashes (which actually not works with 
> pqxx-transactions)?
> 
> Can I convince pg_dump somehow, to generate double backslashes in all 
> bytea-strings?
> 
> Why does pgadmin understand both formats. pqxx-transaction does not?
> 
> 
> Thank you for this great database-system. I really like it.
> Wolfgang
> 
> 
> 

Hello again,

I've found one mistake in the data of the second table 
(public.profiles). They seem to be really "double escaped" somehow. So 
they are not valid anymore.

Now I know pg_dump doesn't make any difference between the two tables. 
The only valid data is from table (public.oned_figures) with one 
backslash. That was my fault, sorry.

But one question is left.

When I read this valid data into a c++ std::string (and I checked that 
the single backslashes are still there). Why can't I put this 
SQL-command to a pqxx-transaction and execute it. It looks like the 
pqxx-transaction unescapes the bytea-string and then it finds the 0x00 
bytes, which are not allowed in text-strings but should be in bytea-strings.



-- 
May the source be with you

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

"Daniel Verite"

Date:

23 June 2022, 21:14:27

    WR wrote:

> But one question is left.
>
> When I read this valid data into a c++ std::string (and I checked that
> the single backslashes are still there). Why can't I put this
> SQL-command to a pqxx-transaction and execute it. It looks like the
> pqxx-transaction unescapes the bytea-string and then it finds the 0x00
> bytes, which are not allowed in text-strings but should be in bytea-strings.

You may check your server logs. They would have the error messages
with the queries as received by the server.

Note that passing '\000'::bytea with a single backslash can be
incorrect if standard_conforming_strings if set to off (not
recommended since version 8.1).
In that case the backslashes need to be doubled.

Example:

postgres=# set standard_conforming_strings to off;
SET

postgres=# set escape_string_warning to off;
SET

postgres=# select '\000'::bytea;
ERROR:    invalid byte sequence for encoding "UTF8": 0x00

postgres=# select '\\000'::bytea;
 bytea
-------
 \x00
(1 row)

There might also be a problem with how the string is being fed with
the C++ code, but you'd need to show us the code to get feedback
on it.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

Date:

27 June 2022, 06:44:59

I'm back at my problem today:



> Example:
> 
> 
> postgres=# set standard_conforming_strings to off;
> SET
> 
> 
> postgres=# set escape_string_warning to off;
> SET
> 
> 
> postgres=# select '\000'::bytea;
> ERROR:    invalid byte sequence for encoding "UTF8": 0x00
> 
> 
> postgres=# select '\\000'::bytea;
>  bytea 
> -------
>  \x00
> (1 row)

I made some test with pgadmin. Pgadmin (5.2) also reports this error 
now. And it doesn't matter if standard_conforming_strings is on or off.


SET standard_conforming_strings = off;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', 
'2021-08-31 11:53:22.442801', 0, 1);

And

SET standard_conforming_strings = on;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', 
'2021-08-31 11:53:22.442801', 0, 1);


Both do report:

WARNUNG:  nicht standardkonforme Verwendung von Escape in 
Zeichenkettenkonstante
LINE 8: ...(1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&...
                                                              ^
HINT:  Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.

ERROR: FEHLER:  ungültige Byte-Sequenz für Kodierung »UTF8«: 0x00


The warning can be avoided by adding the E before the string constant.

The only solution to avoid the error is, to double-backslash.

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, E'\\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', 
'2021-08-31 11:53:22.442801', 0, 1);

I also  dumped the table again after INSERT, and the data was correct 
(but again without E and with single backslash)

What I cant understand: why does pg_dump produce the string without the 
E and without double-backslash, when it is needed? Now I have to write a 
correction routine in c++, what fixes the dumps, before using them.


-- 
May the source be with you

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

Date:

27 June 2022, 07:21:23

Another strange thing is:

in my first mail I wrote: running the dump in in pgadmin works, in the 
last mail I wrote pgadmin also produces the error. I played a little bit 
how this could be happend.

Everytime ich used the following sql text in the querytool:

SET standard_conforming_strings = on;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', 
'2021-08-31 11:53:22.442801', 0, 1);

After each run I deleted the line with a View/Edit Data Panel of the Table.

First run worked.
Second run worked.
Then I changed to SET standard_conforming_strings = off;
Third run worked.
Fourth run throw the error
Then I changed back to  SET standard_conforming_strings = on;
Fifth run throw the error too.
And only adding E and second backslash helped.

I could reproduce this behavior everytime I close the query tool and 
opened it again.

But this looks more like a pgadmin-bug.



-- 
May the source be with you

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

"David G. Johnston"

Date:

27 June 2022, 07:32:05

On Sunday, June 26, 2022, WR <wolle321@freenet.de> wrote:

I made some test with pgadmin. Pgadmin (5.2) also reports this error now. And it doesn't matter if standard_conforming_strings is on or off.

SET standard_conforming_strings = off;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);

And

SET standard_conforming_strings = on;

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);

Both do report:

WARNUNG: nicht standardkonforme Verwendung von Escape in Zeichenkettenkonstante
LINE 8: ...(1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&...
^
HINT: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.

ERROR: FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0x00

I’m not in a position to test/experiment on any guaranteed timeframe but your observation that the outcome of those two commands is independent of value of standard_conforming_strings is either a straight up bug or you’ve made a mistake somewhere. The warning is fully conditioned on that setting being off.

E’\\000’ and ‘\000’ passed as string literals to the bytea input routine are defined to be equivalent inputs under standard_conforming_strings and neither can produce the warning in that case.

I suggest doing self-contained examples that demonstrate the documented behavior not working as documented (or not being functional even if intended) to pinpoint any bug that might be lurking here. With only fragments and statements that seem impossible we are left to assume operator error. pg_dump is completely correct in what it is producing (non-escape literal \000).

I also suggest using psql and pg_dump directly, and not pgAdmin, to demonstrate a core PostgreSQL bug.

David J.

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

Wolfgang Rißler

Date:

27 June 2022, 09:31:54

Am 27.06.2022 um 09:32 schrieb David G. Johnston:

[snip]


> I suggest doing self-contained examples that demonstrate the documented 
> behavior not working as documented (or not being functional even if 
> intended) to pinpoint any bug that might be lurking here.  With only 
> fragments and statements that seem impossible we are left to assume 
> operator error.  pg_dump is completely correct in what it is producing 
> (non-escape literal \000).
> 
> I also suggest using psql and pg_dump directly, and not pgAdmin, to 
> demonstrate a core PostgreSQL bug.
> 
> David J.
> 

Thank you David,
I followed you advice, using pg_dump and psql directly. And the in     in 
contrast to pgAdmin psql works like expected and reproducable again and 
again.
With
SET standard_conforming_strings = on;

an INSERT without E and double backslash works.

SET standard_conforming_strings = off;

I get the warning and the error. So there is no core PostgreSQL bug, I 
think.

PgAdmin has different result, when running the same sql commands 
repeatedly. Before filing a bug there, I should update to the actual 
release.

Now I will test our c++ code and will hopefully find out, why I can't 
run the dump from a sql-file (where is SET standard_conforming_strings = 
on;) as a pqxx-transaction...


-- 

Wolfgang Rißler
mailto: wolfgang.rissler@freenet.de
mobil: +49 1520 9191759
- may the source be with you -

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

"Daniel Verite"

Date:

27 June 2022, 10:12:53

    WR wrote:

> First run worked.
> Second run worked.
> Then I changed to SET standard_conforming_strings = off;
> Third run worked.
> Fourth run throw the error
> Then I changed back to  SET standard_conforming_strings = on;
> Fifth run throw the error too.
> And only adding E and second backslash helped.

This kind of unstable behavior can be seen if the SET may not be
executed by the same session (same connection to the server) as the
subsequent queries.
SET affects only the session it's being run in.

For instance a connection pooler configured in statement mode may
produce that behavior. The solution in the case of a connection pooler
is to group related statements into a transaction.

Maybe pgAdmin has a pooler like that, but if you're issuing the statements
in the same SQL window, I would find it quite surprising that it doesn't
run them by the same session.
Or maybe you're mixing queries from different SQL windows that each
have their own connection, and in some cases you do the SET
in a window and the INSERT in another window.
Or maybe it's a pgAdmin bug.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Differences in Escaped bytea's when creating a plain pg_dump

From

Wolfgang Rißler

Date:

27 June 2022, 11:00:45

Am 27.06.2022 um 12:12 schrieb Daniel Verite:
>     WR wrote:
> 
>> First run worked.
>> Second run worked.
>> Then I changed to SET standard_conforming_strings = off;
>> Third run worked.
>> Fourth run throw the error
>> Then I changed back to  SET standard_conforming_strings = on;
>> Fifth run throw the error too.
>> And only adding E and second backslash helped.
> 
> This kind of unstable behavior can be seen if the SET may not be
> executed by the same session (same connection to the server) as the
> subsequent queries.
> SET affects only the session it's being run in.
> 
> For instance a connection pooler configured in statement mode may
> produce that behavior. The solution in the case of a connection pooler
> is to group related statements into a transaction.
> 
> Maybe pgAdmin has a pooler like that, but if you're issuing the statements
> in the same SQL window, I would find it quite surprising that it doesn't
> run them by the same session.
> Or maybe you're mixing queries from different SQL windows that each
> have their own connection, and in some cases you do the SET
> in a window and the INSERT in another window.
> Or maybe it's a pgAdmin bug.
> 
> 

Ok, thank you Daniel,

in all tests I put the SET statement before the INSERT in the same query 
tool and run it as a whole. (no statement was marked by mouse for single 
execution). So I agree with you, that one transmission is used by 
pgAdmin to run SET and INSERT.
I always had a SET before each INSERT, so there could be no unknown 
state of standard_conforming_strings, no matter if pdAdmin uses 
different connections for each run of the script or the same again.

The supplement of the second backslash in the c++ code costs me some 
nerves. Sending an statement std::string without doublebackslashes to 
pqxx::transaction results in the "0x00 is not a valid UTF8 sequence" 
error 21020 (as I wrote already). I'm afraid, I can't simply replace all 
backslashes in the whole dumptext with two of them.
Since I found out now, that psql works nice with my dumpfiles, maybe its 
easier, to call psql from my code and not to use a pqxx::transaction. So 
I can also save myself loading the dump from the sql-file. The more I 
think about it, the more I love this idea.

Thanks a lot, Wolfgang

-- 

Wolfgang Rißler
mailto: wolfgang.rissler@freenet.de
mobil: +49 1520 9191759
- may the source be with you -