Thread: Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Álvaro Herrera
Date:
On 2025-Apr-04, Andrew Dunstan wrote: > Non text modes for pg_dumpall, correspondingly change pg_restore I think the new oid_string_list stuff in this commit is unnecessary, and we can remove a bunch of lines by simplifying that to using SimplePtrList, as the attached illustrates. (Perhaps the names of members of the proposed struct can be improved.) I also propose to remove the open-coded implementation of pg_get_line. WDYT? I'm also not sure about the double sscanf() business there ... There must be a better way to do this. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Andrew Dunstan
Date:
On 2025-04-14 Mo 8:20 AM, Álvaro Herrera wrote: > On 2025-Apr-04, Andrew Dunstan wrote: > >> Non text modes for pg_dumpall, correspondingly change pg_restore > I think the new oid_string_list stuff in this commit is unnecessary, and > we can remove a bunch of lines by simplifying that to using > SimplePtrList, as the attached illustrates. (Perhaps the names of > members of the proposed struct can be improved.) Sure, we can do it that way. > > I also propose to remove the open-coded implementation of pg_get_line. > > WDYT? seems reasonable > > I'm also not sure about the double sscanf() business there ... There > must be a better way to do this. Yes, probably. I'll look into that if you like. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Andrew Dunstan
Date:
On 2025-04-14 Mo 12:28 PM, Andrew Dunstan wrote: > > >> >> I'm also not sure about the double sscanf() business there ... There >> must be a better way to do this. > > > > Yes, probably. I'll look into that if you like. > > > something like this? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Mahendra Singh Thalor
Date:
On Mon, 14 Apr 2025 at 21:39, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Apr-04, Andrew Dunstan wrote:
>
> > Non text modes for pg_dumpall, correspondingly change pg_restore
>
> I think the new oid_string_list stuff in this commit is unnecessary, and
> we can remove a bunch of lines by simplifying that to using
> SimplePtrList, as the attached illustrates. (Perhaps the names of
> members of the proposed struct can be improved.)
>
> On 2025-Apr-04, Andrew Dunstan wrote:
>
> > Non text modes for pg_dumpall, correspondingly change pg_restore
>
> I think the new oid_string_list stuff in this commit is unnecessary, and
> we can remove a bunch of lines by simplifying that to using
> SimplePtrList, as the attached illustrates. (Perhaps the names of
> members of the proposed struct can be improved.)
Thanks Álvaro for the patch.
I took this patch and did some testing. Patch looks good to me. I was not able to use "git am" in my setup due to CR's in patch but no warning in the patch when I used "patch -p".
One review comment:
We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In code, many places we are using this hard coded value of 64 size for names.
>
>
> On 2025-04-14 Mo 12:28 PM, Andrew Dunstan wrote:
> >
> >
> >>
> >> I'm also not sure about the double sscanf() business there ... There
> >> must be a better way to do this.
> >
> >
> >
> > Yes, probably. I'll look into that if you like.
> >
> >
> >
>
> something like this?
Your patch looks good to me. I took your patch and did 1-2 line adjustments as per below. Please have a look.
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -1053,15 +1053,19 @@ get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimpleOidStringList *dbn
while ((fgets(line, MAXPGPATH, pfile)) != NULL)
{
Oid db_oid = InvalidOid;
- char db_oid_str[MAXPGPATH + 1] = "";
char *dbname;
+ char *p = line;
/* Extract dboid. */
sscanf(line, "%u", &db_oid);
- sscanf(line, "%20s", db_oid_str);
+
+ while(isdigit(*p))
+ p++;
+
+ Assert(*p == ' ');
/* dbname is the rest of the line */
- dbname = line + strlen(db_oid_str) + 1;
+ dbname = ++p;
/* Remove \n from dbname. */
dbname[strlen(dbname) - 1] = '\0';
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Andrew Dunstan
Date:
On 2025-04-14 Mo 6:55 PM, Mahendra Singh Thalor wrote:
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -1053,15 +1053,19 @@ get_dbname_oid_list_from_mfile(const char *dumpdirpath, SimpleOidStringList *dbn
while ((fgets(line, MAXPGPATH, pfile)) != NULL)
{
Oid db_oid = InvalidOid;
- char db_oid_str[MAXPGPATH + 1] = "";
char *dbname;
+ char *p = line;
/* Extract dboid. */
sscanf(line, "%u", &db_oid);
- sscanf(line, "%20s", db_oid_str);
+
+ while(isdigit(*p))
+ p++;
+
+ Assert(*p == ' ');
/* dbname is the rest of the line */
- dbname = line + strlen(db_oid_str) + 1;
+ dbname = ++p;
/* Remove \n from dbname. */
dbname[strlen(dbname) - 1] = '\0';
I don't think an Assert is the right thing here. It's for things that should not be possible, and won't trigger anything in a non-assertion build. This condition should cause a pg_fatal().
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Álvaro Herrera
Date:
On 2025-Apr-15, Mahendra Singh Thalor wrote: > I took this patch and did some testing. Patch looks good to me. Hello, thank you very much for looking. > I was not able to use "git am" in my setup due to CR's in patch but no > warning in the patch when I used "patch -p". Hmm, don't blame me; rather I think your email rig is broken and produces CRs for no reason (pretty normal on Windows I think). Here's what hexdump shows for the attached file direct from the email: 000002a0 2d 2d 2d 20 61 2f 73 72 63 2f 62 69 6e 2f 70 67 |--- a/src/bin/pg| 000002b0 5f 64 75 6d 70 2f 70 67 5f 72 65 73 74 6f 72 65 |_dump/pg_restore| 000002c0 2e 63 0a 2b 2b 2b 20 62 2f 73 72 63 2f 62 69 6e |.c.+++ b/src/bin| 000002d0 2f 70 67 5f 64 75 6d 70 2f 70 67 5f 72 65 73 74 |/pg_dump/pg_rest| 000002e0 6f 72 65 2e 63 0a 40 40 20 2d 36 37 2c 31 30 20 |ore.c.@@ -67,10 | 000002f0 2b 36 37 2c 32 30 20 40 40 20 73 74 61 74 69 63 |+67,20 @@ static| 00000300 20 69 6e 74 09 70 72 6f 63 65 73 73 5f 67 6c 6f | int.process_glo| 00000310 62 61 6c 5f 73 71 6c 5f 63 6f 6d 6d 61 6e 64 73 |bal_sql_commands| 00000320 28 50 47 63 6f 6e 6e 20 2a 63 6f 6e 6e 2c 20 63 |(PGconn *conn, c| 00000330 6f 6e 73 74 20 63 68 61 72 20 2a 64 75 6d 70 64 |onst char *dumpd| 00000340 69 72 70 61 74 68 2c 0a 20 09 09 09 09 09 09 09 |irpath,. .......| 00000350 09 09 09 63 6f 6e 73 74 20 63 68 61 72 20 2a 6f |...const char *o| 00000360 75 74 66 69 6c 65 29 3b 0a 20 73 74 61 74 69 63 |utfile);. static| In bytes 340ff you can see that the comma (2c) is followed by 0a ("line feed" or newline) and then a bunch of tabs (09). There's no carriage return (0d) anywhere. Maybe you would be able to convince git not to complain by downloading the file from the email[1] directly onto it somehow, without letting Windows mess things up. Maybe something like this on a command line curl "https://www.postgresql.org/message-id/attachment/175873/0001-remove-unnecessary-oid_string-list-stuff.patch" | gitam - assuming you can get curl to run on a command line at all. [1] https://postgr.es/m/202504141220.343fmoxfsbj4@alvherre.pgsql > *One review comment:* > We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In > code, many places we are using this hard coded value of 64 size for names. Well, yes we can, but then we'd be allocating a bunch of null bytes for no reason. The current allocation strategy, which comes from the original commit not my code, is to allocate a struct with the OID and just enough bytes to store the database name, which is known. I don't see any reason to modify this tbh. The reason to use NAMEDATALEN in various places is to have enough room to store whatever name the user specifies, without having to know the length before the allocation occurs. FLEXIBLE_ARRAY_MEMBER is there to document exactly what's happening here, and used thoroughly across the codebase. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "E pur si muove" (Galileo Galilei)
Re: pgsql: Non text modes for pg_dumpall, correspondingly change pg_restore
From
Mahendra Singh Thalor
Date:
On Wed, 16 Apr 2025 at 01:52, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Apr-15, Mahendra Singh Thalor wrote: > > > I took this patch and did some testing. Patch looks good to me. > > Hello, thank you very much for looking. > > > I was not able to use "git am" in my setup due to CR's in patch but no > > warning in the patch when I used "patch -p". > > Hmm, don't blame me; rather I think your email rig is broken and > produces CRs for no reason (pretty normal on Windows I think). Here's > what hexdump shows for the attached file direct from the email: Sorry for the noise. My mistake, I will fix my git setup. Actually "git am" was working but "git apply" was failing. I think my git setup is missing something. I am downloading on a Mac and I am doing testing in a VM with Centos. Below are the outputs of git commands. [mst@localhost postgres]$ git apply 0001-remove-unnecessary-oid_string-list-stuff.patch error: patch failed: src/bin/pg_dump/pg_restore.c:67 error: src/bin/pg_dump/pg_restore.c: patch does not apply error: patch failed: src/fe_utils/simple_list.c:192 error: src/fe_utils/simple_list.c: patch does not apply error: patch failed: src/include/fe_utils/simple_list.h:55 error: src/include/fe_utils/simple_list.h: patch does not apply error: patch failed: src/tools/pgindent/typedefs.list:615 error: src/tools/pgindent/typedefs.list: patch does not apply [mst@localhost postgres]$ [mst@localhost postgres]$ git am 0001-remove-unnecessary-oid_string-list-stuff.patch Applying: remove unnecessary oid_string list stuff [mst@localhost postgres]$ [mst@localhost postgres]$ git reset --hard HEAD~1 HEAD is now at 3b35f9a4c5e Fix an incorrect check in get_memoize_path [mst@localhost postgres]$ [mst@localhost postgres]$ patch -p 1 < 0001-remove-unnecessary-oid_string-list-stuff.patch (Stripping trailing CRs from patch; use --binary to disable.) patching file src/bin/pg_dump/pg_restore.c (Stripping trailing CRs from patch; use --binary to disable.) patching file src/fe_utils/simple_list.c (Stripping trailing CRs from patch; use --binary to disable.) patching file src/include/fe_utils/simple_list.h (Stripping trailing CRs from patch; use --binary to disable.) patching file src/tools/pgindent/typedefs.list > > 000002a0 2d 2d 2d 20 61 2f 73 72 63 2f 62 69 6e 2f 70 67 |--- a/src/bin/pg| > 000002b0 5f 64 75 6d 70 2f 70 67 5f 72 65 73 74 6f 72 65 |_dump/pg_restore| > 000002c0 2e 63 0a 2b 2b 2b 20 62 2f 73 72 63 2f 62 69 6e |.c.+++ b/src/bin| > 000002d0 2f 70 67 5f 64 75 6d 70 2f 70 67 5f 72 65 73 74 |/pg_dump/pg_rest| > 000002e0 6f 72 65 2e 63 0a 40 40 20 2d 36 37 2c 31 30 20 |ore.c.@@ -67,10 | > 000002f0 2b 36 37 2c 32 30 20 40 40 20 73 74 61 74 69 63 |+67,20 @@ static| > 00000300 20 69 6e 74 09 70 72 6f 63 65 73 73 5f 67 6c 6f | int.process_glo| > 00000310 62 61 6c 5f 73 71 6c 5f 63 6f 6d 6d 61 6e 64 73 |bal_sql_commands| > 00000320 28 50 47 63 6f 6e 6e 20 2a 63 6f 6e 6e 2c 20 63 |(PGconn *conn, c| > 00000330 6f 6e 73 74 20 63 68 61 72 20 2a 64 75 6d 70 64 |onst char *dumpd| > 00000340 69 72 70 61 74 68 2c 0a 20 09 09 09 09 09 09 09 |irpath,. .......| > 00000350 09 09 09 63 6f 6e 73 74 20 63 68 61 72 20 2a 6f |...const char *o| > 00000360 75 74 66 69 6c 65 29 3b 0a 20 73 74 61 74 69 63 |utfile);. static| > > In bytes 340ff you can see that the comma (2c) is followed by 0a ("line > feed" or newline) and then a bunch of tabs (09). There's no carriage > return (0d) anywhere. > > Maybe you would be able to convince git not to complain by downloading > the file from the email[1] directly onto it somehow, without letting > Windows mess things up. Maybe something like this on a command line > curl "https://www.postgresql.org/message-id/attachment/175873/0001-remove-unnecessary-oid_string-list-stuff.patch" |git am - > assuming you can get curl to run on a command line at all. > > [1] https://postgr.es/m/202504141220.343fmoxfsbj4@alvherre.pgsql > > > *One review comment:* > > We are using FLEXIBLE_ARRAY_MEMBER for dbname. Can we use NAMEDATALEN? In > > code, many places we are using this hard coded value of 64 size for names. > > Well, yes we can, but then we'd be allocating a bunch of null bytes for > no reason. The current allocation strategy, which comes from the > original commit not my code, is to allocate a struct with the OID and > just enough bytes to store the database name, which is known. I don't > see any reason to modify this tbh. The reason to use NAMEDATALEN in > various places is to have enough room to store whatever name the user > specifies, without having to know the length before the allocation > occurs. Thanks for feedback. Yes, FLEXIBLE_ARRAY_MEMBER is added in the original commit. In other code parts also, we can use FLEXIBLE_ARRAY_MEMBER instead NAMEDATALEN. I will try if this is possible to do in other code parts. > > FLEXIBLE_ARRAY_MEMBER is there to document exactly what's happening > here, and used thoroughly across the codebase. > > -- > Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ > "E pur si muove" (Galileo Galilei) - Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com