Thread: Differences in Escaped bytea's when creating a plain pg_dump
Hello community, I've some trouble in restoring a plain text pg_dump. Postgres version is 13 x64 running on Windows10x64 installed from EDB package. The database has bytea_output = 'escape' option because of some processing software needs it for historical reasons. Dump command is: pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 --schema=public --host=localhost --username=myuser --inserts dbname We have two tables that have a bytea-row. But when I look at the dumpfile there is a difference between the escaped bytea-string. (note: both INSERT's from the same run of pg_dump in the dumpfile) SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; [snip] INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); [snip] INSERT INTO public.profiles VALUES (1, 1, 's', 152, '\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000', '2016-08-25 00:00:00+02'); [snip] When I restore them via pgadmin4 query tool, both INSERTS do work. But when I read them with my c++ written software from file and send them with pqxx-Library as a transaction, the first bytea-string generates a fault 21020. (0x00 is not a valid utf8 sequence). I also checked the read string in c++ debugger, the single backslashes in the one case and the double backslashes in the other case are there. So my questions are: Why do we get one bytea-string with double backslashes (which works) and another one with single backslashes (which actually not works with pqxx-transactions)? Can I convince pg_dump somehow, to generate double backslashes in all bytea-strings? Why does pgadmin understand both formats. pqxx-transaction does not? Thank you for this great database-system. I really like it. Wolfgang -- May the source be with you
Am 23.06.2022 um 17:13 schrieb WR: > Hello community, > > I've some trouble in restoring a plain text pg_dump. > Postgres version is 13 x64 running on Windows10x64 installed from EDB > package. > > The database has bytea_output = 'escape' option because of some > processing software needs it for historical reasons. > > Dump command is: > > pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 > --schema=public --host=localhost --username=myuser --inserts dbname > > We have two tables that have a bytea-row. > > But when I look at the dumpfile there is a difference between the > escaped bytea-string. (note: both INSERT's from the same run of pg_dump > in the dumpfile) > > > SET statement_timeout = 0; > SET lock_timeout = 0; > SET idle_in_transaction_session_timeout = 0; > SET client_encoding = 'UTF8'; > SET standard_conforming_strings = on; > SELECT pg_catalog.set_config('search_path', '', false); > SET check_function_bodies = false; > SET xmloption = content; > SET client_min_messages = warning; > SET row_security = off; > > [snip] > > INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, > 2000, 2400, 2400, > '\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000', > 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); > > [snip] > > INSERT INTO public.profiles VALUES (1, 1, 's', 152, > '\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000', > '2016-08-25 00:00:00+02'); > > [snip] > > When I restore them via pgadmin4 query tool, both INSERTS do work. > > But when I read them with my c++ written software from file and send > them with pqxx-Library as a transaction, the first bytea-string > generates a fault 21020. (0x00 is not a valid utf8 sequence). I also > checked the read string in c++ debugger, the single backslashes in the > one case and the double backslashes in the other case are there. > > So my questions are: > Why do we get one bytea-string with double backslashes (which works) and > another one with single backslashes (which actually not works with > pqxx-transactions)? > > Can I convince pg_dump somehow, to generate double backslashes in all > bytea-strings? > > Why does pgadmin understand both formats. pqxx-transaction does not? > > > Thank you for this great database-system. I really like it. > Wolfgang > > > Hello again, I've found one mistake in the data of the second table (public.profiles). They seem to be really "double escaped" somehow. So they are not valid anymore. Now I know pg_dump doesn't make any difference between the two tables. The only valid data is from table (public.oned_figures) with one backslash. That was my fault, sorry. But one question is left. When I read this valid data into a c++ std::string (and I checked that the single backslashes are still there). Why can't I put this SQL-command to a pqxx-transaction and execute it. It looks like the pqxx-transaction unescapes the bytea-string and then it finds the 0x00 bytes, which are not allowed in text-strings but should be in bytea-strings. -- May the source be with you
WR wrote: > But one question is left. > > When I read this valid data into a c++ std::string (and I checked that > the single backslashes are still there). Why can't I put this > SQL-command to a pqxx-transaction and execute it. It looks like the > pqxx-transaction unescapes the bytea-string and then it finds the 0x00 > bytes, which are not allowed in text-strings but should be in bytea-strings. You may check your server logs. They would have the error messages with the queries as received by the server. Note that passing '\000'::bytea with a single backslash can be incorrect if standard_conforming_strings if set to off (not recommended since version 8.1). In that case the backslashes need to be doubled. Example: postgres=# set standard_conforming_strings to off; SET postgres=# set escape_string_warning to off; SET postgres=# select '\000'::bytea; ERROR: invalid byte sequence for encoding "UTF8": 0x00 postgres=# select '\\000'::bytea; bytea ------- \x00 (1 row) There might also be a problem with how the string is being fed with the C++ code, but you'd need to show us the code to get feedback on it. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
I'm back at my problem today: > Example: > > > postgres=# set standard_conforming_strings to off; > SET > > > postgres=# set escape_string_warning to off; > SET > > > postgres=# select '\000'::bytea; > ERROR: invalid byte sequence for encoding "UTF8": 0x00 > > > postgres=# select '\\000'::bytea; > bytea > ------- > \x00 > (1 row) I made some test with pgadmin. Pgadmin (5.2) also reports this error now. And it doesn't matter if standard_conforming_strings is on or off. SET standard_conforming_strings = off; INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); And SET standard_conforming_strings = on; INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); Both do report: WARNUNG: nicht standardkonforme Verwendung von Escape in Zeichenkettenkonstante LINE 8: ...(1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&... ^ HINT: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'. ERROR: FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0x00 The warning can be avoided by adding the E before the string constant. The only solution to avoid the error is, to double-backslash. INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, E'\\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); I also dumped the table again after INSERT, and the data was correct (but again without E and with single backslash) What I cant understand: why does pg_dump produce the string without the E and without double-backslash, when it is needed? Now I have to write a correction routine in c++, what fixes the dumps, before using them. -- May the source be with you
Another strange thing is: in my first mail I wrote: running the dump in in pgadmin works, in the last mail I wrote pgadmin also produces the error. I played a little bit how this could be happend. Everytime ich used the following sql text in the querytool: SET standard_conforming_strings = on; INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); After each run I deleted the line with a View/Edit Data Panel of the Table. First run worked. Second run worked. Then I changed to SET standard_conforming_strings = off; Third run worked. Fourth run throw the error Then I changed back to SET standard_conforming_strings = on; Fifth run throw the error too. And only adding E and second backslash helped. I could reproduce this behavior everytime I close the query tool and opened it again. But this looks more like a pgadmin-bug. -- May the source be with you
On Sunday, June 26, 2022, WR <wolle321@freenet.de> wrote:
I made some test with pgadmin. Pgadmin (5.2) also reports this error now. And it doesn't matter if standard_conforming_strings is on or off.
SET standard_conforming_strings = off;
INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);
And
SET standard_conforming_strings = on;
INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);
Both do report:
WARNUNG: nicht standardkonforme Verwendung von Escape in Zeichenkettenkonstante
LINE 8: ...(1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&...
^
HINT: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.
ERROR: FEHLER: ungültige Byte-Sequenz für Kodierung »UTF8«: 0x00
I’m not in a position to test/experiment on any guaranteed timeframe but your observation that the outcome of those two commands is independent of value of standard_conforming_strings is either a straight up bug or you’ve made a mistake somewhere. The warning is fully conditioned on that setting being off.
E’\\000’ and ‘\000’ passed as string literals to the bytea input routine are defined to be equivalent inputs under standard_conforming_strings and neither can produce the warning in that case.
I suggest doing self-contained examples that demonstrate the documented behavior not working as documented (or not being functional even if intended) to pinpoint any bug that might be lurking here. With only fragments and statements that seem impossible we are left to assume operator error. pg_dump is completely correct in what it is producing (non-escape literal \000).
I also suggest using psql and pg_dump directly, and not pgAdmin, to demonstrate a core PostgreSQL bug.
David J.
Am 27.06.2022 um 09:32 schrieb David G. Johnston: [snip] > I suggest doing self-contained examples that demonstrate the documented > behavior not working as documented (or not being functional even if > intended) to pinpoint any bug that might be lurking here. With only > fragments and statements that seem impossible we are left to assume > operator error. pg_dump is completely correct in what it is producing > (non-escape literal \000). > > I also suggest using psql and pg_dump directly, and not pgAdmin, to > demonstrate a core PostgreSQL bug. > > David J. > Thank you David, I followed you advice, using pg_dump and psql directly. And the in in contrast to pgAdmin psql works like expected and reproducable again and again. With SET standard_conforming_strings = on; an INSERT without E and double backslash works. SET standard_conforming_strings = off; I get the warning and the error. So there is no core PostgreSQL bug, I think. PgAdmin has different result, when running the same sql commands repeatedly. Before filing a bug there, I should update to the actual release. Now I will test our c++ code and will hopefully find out, why I can't run the dump from a sql-file (where is SET standard_conforming_strings = on;) as a pqxx-transaction... -- Wolfgang Rißler mailto: wolfgang.rissler@freenet.de mobil: +49 1520 9191759 - may the source be with you -
WR wrote: > First run worked. > Second run worked. > Then I changed to SET standard_conforming_strings = off; > Third run worked. > Fourth run throw the error > Then I changed back to SET standard_conforming_strings = on; > Fifth run throw the error too. > And only adding E and second backslash helped. This kind of unstable behavior can be seen if the SET may not be executed by the same session (same connection to the server) as the subsequent queries. SET affects only the session it's being run in. For instance a connection pooler configured in statement mode may produce that behavior. The solution in the case of a connection pooler is to group related statements into a transaction. Maybe pgAdmin has a pooler like that, but if you're issuing the statements in the same SQL window, I would find it quite surprising that it doesn't run them by the same session. Or maybe you're mixing queries from different SQL windows that each have their own connection, and in some cases you do the SET in a window and the INSERT in another window. Or maybe it's a pgAdmin bug. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Am 27.06.2022 um 12:12 schrieb Daniel Verite: > WR wrote: > >> First run worked. >> Second run worked. >> Then I changed to SET standard_conforming_strings = off; >> Third run worked. >> Fourth run throw the error >> Then I changed back to SET standard_conforming_strings = on; >> Fifth run throw the error too. >> And only adding E and second backslash helped. > > This kind of unstable behavior can be seen if the SET may not be > executed by the same session (same connection to the server) as the > subsequent queries. > SET affects only the session it's being run in. > > For instance a connection pooler configured in statement mode may > produce that behavior. The solution in the case of a connection pooler > is to group related statements into a transaction. > > Maybe pgAdmin has a pooler like that, but if you're issuing the statements > in the same SQL window, I would find it quite surprising that it doesn't > run them by the same session. > Or maybe you're mixing queries from different SQL windows that each > have their own connection, and in some cases you do the SET > in a window and the INSERT in another window. > Or maybe it's a pgAdmin bug. > > Ok, thank you Daniel, in all tests I put the SET statement before the INSERT in the same query tool and run it as a whole. (no statement was marked by mouse for single execution). So I agree with you, that one transmission is used by pgAdmin to run SET and INSERT. I always had a SET before each INSERT, so there could be no unknown state of standard_conforming_strings, no matter if pdAdmin uses different connections for each run of the script or the same again. The supplement of the second backslash in the c++ code costs me some nerves. Sending an statement std::string without doublebackslashes to pqxx::transaction results in the "0x00 is not a valid UTF8 sequence" error 21020 (as I wrote already). I'm afraid, I can't simply replace all backslashes in the whole dumptext with two of them. Since I found out now, that psql works nice with my dumpfiles, maybe its easier, to call psql from my code and not to use a pqxx::transaction. So I can also save myself loading the dump from the sql-file. The more I think about it, the more I love this idea. Thanks a lot, Wolfgang -- Wolfgang Rißler mailto: wolfgang.rissler@freenet.de mobil: +49 1520 9191759 - may the source be with you -