Thread: Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

On 2025-Apr-04, Andrew Dunstan wrote:

> Non text modes for pg_dumpall, correspondingly change pg_restore

I think the new oid_string_list stuff in this commit is unnecessary, and
we can remove a bunch of lines by simplifying that to using
SimplePtrList, as the attached illustrates.  (Perhaps the names of
members of the proposed struct can be improved.)

I also propose to remove the open-coded implementation of pg_get_line.

WDYT?

I'm also not sure about the double sscanf() business there ... There
must be a better way to do this.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Attachment
On 2025-04-14 Mo 8:20 AM, Álvaro Herrera wrote:
> On 2025-Apr-04, Andrew Dunstan wrote:
>
>> Non text modes for pg_dumpall, correspondingly change pg_restore
> I think the new oid_string_list stuff in this commit is unnecessary, and
> we can remove a bunch of lines by simplifying that to using
> SimplePtrList, as the attached illustrates.  (Perhaps the names of
> members of the proposed struct can be improved.)


Sure, we can do it that way.


>
> I also propose to remove the open-coded implementation of pg_get_line.
>
> WDYT?


seems reasonable


>
> I'm also not sure about the double sscanf() business there ... There
> must be a better way to do this.



Yes, probably. I'll look into that if you like.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On 2025-04-14 Mo 12:28 PM, Andrew Dunstan wrote:
>
>
>>
>> I'm also not sure about the double sscanf() business there ... There
>> must be a better way to do this.
>
>
>
> Yes, probably. I'll look into that if you like.
>
>
>

something like this?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment

Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

From
Mahendra Singh Thalor
Date:
On Mon, 14 Apr 2025 at 21:39, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Apr-04, Andrew Dunstan wrote:
>
> > Non text modes for pg_dumpall, correspondingly change pg_restore
>
> I think the new oid_string_list stuff in this commit is unnecessary, and
> we can remove a bunch of lines by simplifying that to using
> SimplePtrList, as the attached illustrates.  (Perhaps the names of
> members of the proposed struct can be improved.)

Thanks Álvaro for the patch.

I took this patch and did some testing. Patch looks good to me. I was not able to use "git am" in my setup due to CR's in patch but no warning in the patch when I used "patch -p".

One review comment:
We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In code, many places we are using this hard coded value of 64 size for names.

On Tue, 15 Apr 2025 at 01:44, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 2025-04-14 Mo 12:28 PM, Andrew Dunstan wrote:
> >
> >
> >>
> >> I'm also not sure about the double sscanf() business there ... There
> >> must be a better way to do this.
> >
> >
> >
> > Yes, probably. I'll look into that if you like.
> >
> >
> >
>
> something like this?
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Thanks Andrew.

Your patch looks good to me. I took your patch and did 1-2 line adjustments as per below. Please have a look.

--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -1053,15 +1053,19 @@ get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimpleOidStringList *dbn
        while ((fgets(line, MAXPGPATH, pfile)) != NULL)
        {
                Oid                     db_oid = InvalidOid;
-               char            db_oid_str[MAXPGPATH + 1] = "";
                char       *dbname;
+               char            *p = line;
 
                /* Extract dboid. */
                sscanf(line, "%u", &db_oid);
-               sscanf(line, "%20s", db_oid_str);
+
+               while(isdigit(*p))
+                       p++;
+
+               Assert(*p == ' ');
 
                /* dbname is the rest of the line */
-               dbname = line + strlen(db_oid_str) + 1;
+               dbname = ++p;
 
                /* Remove \n from dbname. */
                dbname[strlen(dbname) - 1] = '\0';

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


On 2025-04-14 Mo 6:55 PM, Mahendra Singh Thalor wrote:


--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -1053,15 +1053,19 @@ get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimpleOidStringList *dbn
        while ((fgets(line, MAXPGPATH, pfile)) != NULL)
        {
                Oid                     db_oid = InvalidOid;
-               char            db_oid_str[MAXPGPATH + 1] = "";
                char       *dbname;
+               char            *p = line;
 
                /* Extract dboid. */
                sscanf(line, "%u", &db_oid);
-               sscanf(line, "%20s", db_oid_str);
+
+               while(isdigit(*p))
+                       p++;
+
+               Assert(*p == ' ');
 
                /* dbname is the rest of the line */
-               dbname = line + strlen(db_oid_str) + 1;
+               dbname = ++p;
 
                /* Remove \n from dbname. */
                dbname[strlen(dbname) - 1] = '\0';


I don't think an Assert is the right thing here. It's for things that should not be possible, and won't trigger anything in a non-assertion build. This condition should cause a pg_fatal().


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 2025-Apr-15, Mahendra Singh Thalor wrote:

> I took this patch and did some testing. Patch looks good to me.

Hello, thank you very much for looking.

> I was not able to use "git am" in my setup due to CR's in patch but no
> warning in the patch when I used "patch -p".

Hmm, don't blame me; rather I think your email rig is broken and
produces CRs for no reason (pretty normal on Windows I think).  Here's
what hexdump shows for the attached file direct from the email:

000002a0  2d 2d 2d 20 61 2f 73 72  63 2f 62 69 6e 2f 70 67  |--- a/src/bin/pg|
000002b0  5f 64 75 6d 70 2f 70 67  5f 72 65 73 74 6f 72 65  |_dump/pg_restore|
000002c0  2e 63 0a 2b 2b 2b 20 62  2f 73 72 63 2f 62 69 6e  |.c.+++ b/src/bin|
000002d0  2f 70 67 5f 64 75 6d 70  2f 70 67 5f 72 65 73 74  |/pg_dump/pg_rest|
000002e0  6f 72 65 2e 63 0a 40 40  20 2d 36 37 2c 31 30 20  |ore.c.@@ -67,10 |
000002f0  2b 36 37 2c 32 30 20 40  40 20 73 74 61 74 69 63  |+67,20 @@ static|
00000300  20 69 6e 74 09 70 72 6f  63 65 73 73 5f 67 6c 6f  | int.process_glo|
00000310  62 61 6c 5f 73 71 6c 5f  63 6f 6d 6d 61 6e 64 73  |bal_sql_commands|
00000320  28 50 47 63 6f 6e 6e 20  2a 63 6f 6e 6e 2c 20 63  |(PGconn *conn, c|
00000330  6f 6e 73 74 20 63 68 61  72 20 2a 64 75 6d 70 64  |onst char *dumpd|
00000340  69 72 70 61 74 68 2c 0a  20 09 09 09 09 09 09 09  |irpath,. .......|
00000350  09 09 09 63 6f 6e 73 74  20 63 68 61 72 20 2a 6f  |...const char *o|
00000360  75 74 66 69 6c 65 29 3b  0a 20 73 74 61 74 69 63  |utfile);. static|

In bytes 340ff you can see that the comma (2c) is followed by 0a ("line
feed" or newline) and then a bunch of tabs (09).  There's no carriage
return (0d) anywhere.

Maybe you would be able to convince git not to complain by downloading
the file from the email[1] directly onto it somehow, without letting
Windows mess things up.  Maybe something like this on a command line
  curl "https://www.postgresql.org/message-id/attachment/175873/0001-remove-unnecessary-oid_string-list-stuff.patch" |
gitam -
 
assuming you can get curl to run on a command line at all.

[1] https://postgr.es/m/202504141220.343fmoxfsbj4@alvherre.pgsql

> *One review comment:*
> We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In
> code, many places we are using this hard coded value of 64 size for names.

Well, yes we can, but then we'd be allocating a bunch of null bytes for
no reason.  The current allocation strategy, which comes from the
original commit not my code, is to allocate a struct with the OID and
just enough bytes to store the database name, which is known.  I don't
see any reason to modify this tbh.  The reason to use NAMEDATALEN in
various places is to have enough room to store whatever name the user
specifies, without having to know the length before the allocation
occurs.

FLEXIBLE_ARRAY_MEMBER is there to document exactly what's happening
here, and used thoroughly across the codebase.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"E pur si muove" (Galileo Galilei)



Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore

From
Mahendra Singh Thalor
Date:
On Wed, 16 Apr 2025 at 01:52, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Apr-15, Mahendra Singh Thalor wrote:
>
> > I took this patch and did some testing. Patch looks good to me.
>
> Hello, thank you very much for looking.
>
> > I was not able to use "git am" in my setup due to CR's in patch but no
> > warning in the patch when I used "patch -p".
>
> Hmm, don't blame me; rather I think your email rig is broken and
> produces CRs for no reason (pretty normal on Windows I think).  Here's
> what hexdump shows for the attached file direct from the email:

Sorry for the noise. My mistake, I will fix my git setup.

Actually "git am" was working but "git apply" was failing. I think my
git setup is missing something.

I am downloading on a Mac and I am doing testing in a VM with Centos.

Below are the outputs of git commands.

[mst@localhost postgres]$ git apply
0001-remove-unnecessary-oid_string-list-stuff.patch
error: patch failed: src/bin/pg_dump/pg_restore.c:67
error: src/bin/pg_dump/pg_restore.c: patch does not apply
error: patch failed: src/fe_utils/simple_list.c:192
error: src/fe_utils/simple_list.c: patch does not apply
error: patch failed: src/include/fe_utils/simple_list.h:55
error: src/include/fe_utils/simple_list.h: patch does not apply
error: patch failed: src/tools/pgindent/typedefs.list:615
error: src/tools/pgindent/typedefs.list: patch does not apply
[mst@localhost postgres]$

[mst@localhost postgres]$ git am
0001-remove-unnecessary-oid_string-list-stuff.patch
Applying: remove unnecessary oid_string list stuff
[mst@localhost postgres]$

[mst@localhost postgres]$ git reset --hard HEAD~1
HEAD is now at 3b35f9a4c5e Fix an incorrect check in get_memoize_path
[mst@localhost postgres]$
[mst@localhost postgres]$ patch -p 1 <
0001-remove-unnecessary-oid_string-list-stuff.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/bin/pg_dump/pg_restore.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/fe_utils/simple_list.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/include/fe_utils/simple_list.h
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/tools/pgindent/typedefs.list

>
> 000002a0  2d 2d 2d 20 61 2f 73 72  63 2f 62 69 6e 2f 70 67  |--- a/src/bin/pg|
> 000002b0  5f 64 75 6d 70 2f 70 67  5f 72 65 73 74 6f 72 65  |_dump/pg_restore|
> 000002c0  2e 63 0a 2b 2b 2b 20 62  2f 73 72 63 2f 62 69 6e  |.c.+++ b/src/bin|
> 000002d0  2f 70 67 5f 64 75 6d 70  2f 70 67 5f 72 65 73 74  |/pg_dump/pg_rest|
> 000002e0  6f 72 65 2e 63 0a 40 40  20 2d 36 37 2c 31 30 20  |ore.c.@@ -67,10 |
> 000002f0  2b 36 37 2c 32 30 20 40  40 20 73 74 61 74 69 63  |+67,20 @@ static|
> 00000300  20 69 6e 74 09 70 72 6f  63 65 73 73 5f 67 6c 6f  | int.process_glo|
> 00000310  62 61 6c 5f 73 71 6c 5f  63 6f 6d 6d 61 6e 64 73  |bal_sql_commands|
> 00000320  28 50 47 63 6f 6e 6e 20  2a 63 6f 6e 6e 2c 20 63  |(PGconn *conn, c|
> 00000330  6f 6e 73 74 20 63 68 61  72 20 2a 64 75 6d 70 64  |onst char *dumpd|
> 00000340  69 72 70 61 74 68 2c 0a  20 09 09 09 09 09 09 09  |irpath,. .......|
> 00000350  09 09 09 63 6f 6e 73 74  20 63 68 61 72 20 2a 6f  |...const char *o|
> 00000360  75 74 66 69 6c 65 29 3b  0a 20 73 74 61 74 69 63  |utfile);. static|
>
> In bytes 340ff you can see that the comma (2c) is followed by 0a ("line
> feed" or newline) and then a bunch of tabs (09).  There's no carriage
> return (0d) anywhere.
>
> Maybe you would be able to convince git not to complain by downloading
> the file from the email[1] directly onto it somehow, without letting
> Windows mess things up.  Maybe something like this on a command line
>   curl "https://www.postgresql.org/message-id/attachment/175873/0001-remove-unnecessary-oid_string-list-stuff.patch"
|git am - 
> assuming you can get curl to run on a command line at all.
>
> [1] https://postgr.es/m/202504141220.343fmoxfsbj4@alvherre.pgsql
>
> > *One review comment:*
> > We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In
> > code, many places we are using this hard coded value of 64 size for names.
>
> Well, yes we can, but then we'd be allocating a bunch of null bytes for
> no reason.  The current allocation strategy, which comes from the
> original commit not my code, is to allocate a struct with the OID and
> just enough bytes to store the database name, which is known.  I don't
> see any reason to modify this tbh.  The reason to use NAMEDATALEN in
> various places is to have enough room to store whatever name the user
> specifies, without having to know the length before the allocation
> occurs.

Thanks for feedback. Yes, FLEXIBLE_ARRAY_MEMBER is added in the original commit.

In other code parts also, we can use FLEXIBLE_ARRAY_MEMBER instead
NAMEDATALEN. I will try if this is possible to do in other code parts.

>
> FLEXIBLE_ARRAY_MEMBER is there to document exactly what's happening
> here, and used thoroughly across the codebase.
>
> --
> Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
> "E pur si muove" (Galileo Galilei)

-
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com