Thread: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18828
Logged by:          Robins Tharakan
Email address:      tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system:   Ubuntu
Description:

Passing an empty string to the recently added extension function
pg_get_logical_snapshot_meta() triggers PANIC / Abort.

SQL
===
CREATE EXTENSION pg_logicalinspect;
SELECT pg_get_logical_snapshot_meta('');

Commit
======
.
.
Checking (56ba0463d38~950) - 3c7d78427e - fail
Checking (56ba0463d38~1110) - dc68515968 - Pass
Checking (56ba0463d38~1030) - 67a54b9e83 - Pass
Checking (56ba0463d38~990) - e2fd615ecc - Pass
Checking (56ba0463d38~970) - 665785d85f - fail
Checking (56ba0463d38~980) - b360d1762b - fail
Checking (56ba0463d38~985) - 04bec894a0 - fail
Checking (56ba0463d38~987) - d5ca15ee54 - fail
Checking (56ba0463d38~988) - 2453196107 - fail
Checking (56ba0463d38~989) - 7cdfeee320 - fail
Current=7cdfeee320e72162b62dddddee638e713c2b8680 - Commit Found.


Error Log
=========
2025-03-02 04:28:02.355 ACDT [3826851] PANIC:  could not open file
"pg_logical/snapshots/": Is a directory
2025-03-02 04:28:02.355 ACDT [3826851] STATEMENT:  select
pg_get_logical_snapshot_meta('');
2025-03-02 04:28:02.896 ACDT [3745459] LOG:  client backend (PID 3826851)
was terminated by signal 6: Aborted
2025-03-02 04:28:02.896 ACDT [3745459] DETAIL:  Failed process was running:
select pg_get_logical_snapshot_meta('');
2025-03-02 04:28:02.896 ACDT [3745459] LOG:  terminating any other active
server processes


Output
======
smith@camry:~/proj/sqith$ psql -c "select
pg_get_logical_snapshot_meta('');"
PANIC:  could not open file "pg_logical/snapshots/": Is a directory
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost


PG Bug reporting form <noreply@postgresql.org> writes:
> Passing an empty string to the recently added extension function
> pg_get_logical_snapshot_meta() triggers PANIC / Abort.

Thanks for noticing.  It looks like this function is far too trusting
of its input.  Another way to crash it is to point to a directory:

regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
PANIC:  could not open file "pg_logical/snapshots/../snapshots": Is a directory
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

This example incidentally shows that it's not sanitizing the request
to prevent reference to files outside the snapshots directory, which
possibly would be a security bug down the road.  I realize that we
restrict access to the function, but still we shouldn't be *this*
trusting of the argument.

Possibly some of the defense for this ought to be in
SnapBuildRestoreSnapshot: it looks like that function isn't
considering the possibility that OpenTransientFile will succeed on a
directory.  Does it have any other callers passing
less-than-fully-trustworthy paths?

Another interesting question is why the error is being promoted
to PANIC.  That sure seems unnecessary, and there's a comment
in SnapBuildRestoreSnapshot that claims it won't happen.

            regards, tom lane



On Sat, Mar 1, 2025 at 12:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> PG Bug reporting form <noreply@postgresql.org> writes:
> > Passing an empty string to the recently added extension function
> > pg_get_logical_snapshot_meta() triggers PANIC / Abort.
>
> Thanks for noticing.  It looks like this function is far too trusting
> of its input.  Another way to crash it is to point to a directory:
>
> regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
> PANIC:  could not open file "pg_logical/snapshots/../snapshots": Is a directory
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
>
> This example incidentally shows that it's not sanitizing the request
> to prevent reference to files outside the snapshots directory, which
> possibly would be a security bug down the road.  I realize that we
> restrict access to the function, but still we shouldn't be *this*
> trusting of the argument.

Agreed.

>
> Possibly some of the defense for this ought to be in
> SnapBuildRestoreSnapshot: it looks like that function isn't
> considering the possibility that OpenTransientFile will succeed on a
> directory.  Does it have any other callers passing
> less-than-fully-trustworthy paths?

SnapBuildRestoreSnapshot() is called by pg_logicalinspect functions
(pg_get_logical_snapshot_meta() and pg_get_logical_snapshot_info())
and SnapBuildRestore(). For the latter, SnapBuildRestore() surely
passes the file name, so it would not be a problem.

One way to fix this issue is that SnapBuildRestoreSnapshot() takes an
LSN corresponding to the serialized snapshot instead of a path, and it
constructs the .snap file path correctly. This fix would also prevent
it from being a security bug.

pg_logicalinspect functions, pg_get_logical_snapshot_info() and
pg_get_logical_snapshot_meta(), would also need to be changed so that
they extract the LSN from the given file name. If the given file name
is incorrect, it should raise an error.

>
> Another interesting question is why the error is being promoted
> to PANIC.  That sure seems unnecessary, and there's a comment
> in SnapBuildRestoreSnapshot that claims it won't happen.
>

This PANIC occurred because fsync_fname() with isdir=false was called
for the directory, pg_logical/snapshots/' in the above case
(data_sync_retry is false by default). I think you referred to the
following comment in SnapBuildRestoreSnapshot():

    /* ----
     * Make sure the snapshot had been stored safely to disk, that's normally
     * cheap.
     * Note that we do not need PANIC here, nobody will be able to use the
     * slot without fsyncing, and saving it won't succeed without an fsync()
     * either...
     * ----
     */
    fsync_fname(path, false);
    fsync_fname(PG_LOGICAL_SNAPSHOTS_DIR, true);

These comments have been present since the initial logical decoding
commit, whereas data_sync_retry was introduced at a later stage. While
the comment's reference to PANIC suggests that a critical section
isn't necessary here, the actual PANIC occurred in an unanticipated
manner. SnapBuildRestoreSnapshot() works correctly as long as it takes
a correct file path but 7cdfeee320e72 violated this assumption.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Hi,

On Sat, Mar 01, 2025 at 03:47:12PM -0500, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > Passing an empty string to the recently added extension function
> > pg_get_logical_snapshot_meta() triggers PANIC / Abort.

Thanks for the report!

> Thanks for noticing.  It looks like this function is far too trusting
> of its input.  Another way to crash it is to point to a directory:
> 
> regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
> PANIC:  could not open file "pg_logical/snapshots/../snapshots": Is a directory
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> 
> This example incidentally shows that it's not sanitizing the request
> to prevent reference to files outside the snapshots directory, which
> possibly would be a security bug down the road.  I realize that we
> restrict access to the function, but still we shouldn't be *this*
> trusting of the argument.
>

Indeed that's bad.

> Possibly some of the defense for this ought to be in
> SnapBuildRestoreSnapshot: it looks like that function isn't
> considering the possibility that OpenTransientFile will succeed on a
> directory.  Does it have any other callers passing
> less-than-fully-trustworthy paths?

I don't think so. The other caller is in snapbuild.c where the input is
build based on a lsn. 

Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info()
which suffers for the same issues reported above) should take a lsn as input
parameter. That's how it has been proposed initially and that would simplify
things. Thoughts?

> Another interesting question is why the error is being promoted
> to PANIC.  That sure seems unnecessary, and there's a comment
> in SnapBuildRestoreSnapshot that claims it won't happen.
> 

Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes
to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in 
9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate).

I'm not sure we should change the logic here, maybe just update the comment?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 3, 2025 at 12:50 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Sat, Mar 01, 2025 at 03:47:12PM -0500, Tom Lane wrote:
> > PG Bug reporting form <noreply@postgresql.org> writes:
> > > Passing an empty string to the recently added extension function
> > > pg_get_logical_snapshot_meta() triggers PANIC / Abort.
>
> Thanks for the report!
>
> > Thanks for noticing.  It looks like this function is far too trusting
> > of its input.  Another way to crash it is to point to a directory:
> >
> > regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
> > PANIC:  could not open file "pg_logical/snapshots/../snapshots": Is a directory
> > server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> >
> > This example incidentally shows that it's not sanitizing the request
> > to prevent reference to files outside the snapshots directory, which
> > possibly would be a security bug down the road.  I realize that we
> > restrict access to the function, but still we shouldn't be *this*
> > trusting of the argument.
> >
>
> Indeed that's bad.
>
> > Possibly some of the defense for this ought to be in
> > SnapBuildRestoreSnapshot: it looks like that function isn't
> > considering the possibility that OpenTransientFile will succeed on a
> > directory.  Does it have any other callers passing
> > less-than-fully-trustworthy paths?
>
> I don't think so. The other caller is in snapbuild.c where the input is
> build based on a lsn.
>
> Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info()
> which suffers for the same issues reported above) should take a lsn as input
> parameter. That's how it has been proposed initially and that would simplify
> things. Thoughts?

Yeah, that's one solution. But it would make pg_logicalinspect
functions hard to work with the pg_ls_loigcalsnapdir(). Users would
need to parse the filename by themselve and pass the LSN to them. I've
attached a draft patch for a different approach where we extract LSN
from the given file name in pg_logicalinspect functions. Thoughts?

>
> > Another interesting question is why the error is being promoted
> > to PANIC.  That sure seems unnecessary, and there's a comment
> > in SnapBuildRestoreSnapshot that claims it won't happen.
> >
>
> Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes
> to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in
> 9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate).
>
> I'm not sure we should change the logic here, maybe just update the comment?

I think that we don't need to change the logic. It might be a good
idea to clarify the comment, though.

Regards,

--
Masahiko Sawada

Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Mon, Mar 03, 2025 at 12:42:01PM -0800, Masahiko Sawada wrote:
> On Mon, Mar 3, 2025 at 12:50 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info()
> > which suffers for the same issues reported above) should take a lsn as input
> > parameter. That's how it has been proposed initially and that would simplify
> > things. Thoughts?
> 
> Yeah, that's one solution. But it would make pg_logicalinspect
> functions hard to work with the pg_ls_loigcalsnapdir(). Users would
> need to parse the filename by themselve and pass the LSN to them. I've
> attached a draft patch for a different approach where we extract LSN
> from the given file name in pg_logicalinspect functions. Thoughts?

Yeah I agree that your proposal is more user friendly. Thanks for the patch,
you beat me on it! (I was waiting that we agree on the approach before working
on it). 

Looking at your patch, some comments:

=== 1

+       if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)
+               ereport(ERROR,

sscanf would not return an error if the file being used is say "0-40796E18.foo":

postgres=# SELECT pg_get_logical_snapshot_info('0-40796E18.foo');
                      pg_get_logical_snapshot_info
-------------------------------------------------------------------------
 (consistent,751,751,0/40796AF8,0/40796AF8,0,f,f,0/0,0,0,,2,"{751,752}")

I think that as long as it finds the &hi and &lo then it does not return an
error.

PFA 0001 a new version to also parse the ".snap".

=== 2

+ errmsg("invalid serialized snapshot file name \"%s\"", filename));

In the extension doc we mention "a snapshot file" so I think it makes sense to
remove "serialized" from the error message: also done in 0001 attached.

=== 3

+ * Extract LSN from the given serialized snapshot

s/Extract LSN/Extract the LSN/? (done in 0001 attached).

=== 4

Also adding some regression tests in 0001 attached to validate the imputs
and the permissions.

> > Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes
> > to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in
> > 9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate).
> >
> > I'm not sure we should change the logic here, maybe just update the comment?
> 
> I think that we don't need to change the logic. It might be a good
> idea to clarify the comment, though.

Agree. I think that we can just get rid of it: done in 0002 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:
> PFA 0001 a new version to also parse the ".snap".

PFA v3 (a slightly different version) to "correctly" use the new report_error
introduced in v2.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Tue, Mar 04, 2025 at 10:45:54AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:
> > PFA 0001 a new version to also parse the ".snap".
> 
> PFA v3 (a slightly different version) to "correctly" use the new report_error
> introduced in v2.

It looks like that CheckPointSnapBuild() also rely on sscanf() to check
for the snapshot file extension. As seen, up-thread we can't rely on sscanf()
to do so. 

Attached a c file to show the "issue":

$ gcc -o sscanf_file_ext sscanf_file_ext.c
$ ./sscanf_file_ext 0-40796E18.snap
File: "0-40796E18.snap"
Parse result: 2 (2 means success)

$ ./sscanf_file_ext 0-40796E18.foo
File: "0-40796E18.foo"
Parse result: 2 (2 means success)

So it looks like that, in reality, in CheckPointSnapBuild() we report a parsing
message and "continue" even if the file name extension is not ".snap".

That's not a big deal because, in pratice, it still removes the .$pid.tmp files
(given they have the "hex" part well formated). But, it does not look like it was
the original intent, so  maybe we should also re-think CheckPointSnapBuild() and
open a dedicated thread? (If so, I'm happy to do so).

Note that there might be other places that may need the same kind of attention:
pg_archivecleanup.c?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Tue, Mar 4, 2025 at 2:45 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:
> > PFA 0001 a new version to also parse the ".snap".
>
> PFA v3 (a slightly different version) to "correctly" use the new report_error
> introduced in v2.

Thank you for the patch! I agree to have some regression tests for that.

Looking at the 0001 patch, it seems that the pg_logicalinspect
functions still accepts invalid filenames as follows:

postgres(1:2713976)=# select
pg_get_logical_snapshot_info('0-40796E18.foo.snap');
ERROR:  could not open file "pg_logical/snapshots/0-40796E18.snap": No
such file or directory

Regarding the 0002 patch, TBH I'm confused about what the original
comment intended to mean. I thought that it means that we don't need
to use a critical section for fsync_fname() calls to promote the fsync
error. If that's right, it would be better to update the comment
instead of removing it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 4, 2025 at 7:05 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Mar 04, 2025 at 10:45:54AM +0000, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:
> > > PFA 0001 a new version to also parse the ".snap".
> >
> > PFA v3 (a slightly different version) to "correctly" use the new report_error
> > introduced in v2.
>
> It looks like that CheckPointSnapBuild() also rely on sscanf() to check
> for the snapshot file extension. As seen, up-thread we can't rely on sscanf()
> to do so.
>
> Attached a c file to show the "issue":
>
> $ gcc -o sscanf_file_ext sscanf_file_ext.c
> $ ./sscanf_file_ext 0-40796E18.snap
> File: "0-40796E18.snap"
> Parse result: 2 (2 means success)
>
> $ ./sscanf_file_ext 0-40796E18.foo
> File: "0-40796E18.foo"
> Parse result: 2 (2 means success)
>
> So it looks like that, in reality, in CheckPointSnapBuild() we report a parsing
> message and "continue" even if the file name extension is not ".snap".
>
> That's not a big deal because, in pratice, it still removes the .$pid.tmp files
> (given they have the "hex" part well formated). But, it does not look like it was
> the original intent, so  maybe we should also re-think CheckPointSnapBuild() and
> open a dedicated thread? (If so, I'm happy to do so).

I believe that the situation is a bit different in these cases; in
CheckPointSnapBuild () we iterate over all files in
pg_logical/snapshots directory and extract the LSN from the filenames
whereas in pg_logicalinspect case, user can pass arbitrary filename.
Therefore, as for the former cases, it might make sense to check the
file extension as well if we want to prevent a problem where someone
injects files into pg_logical/snapshots directory. But it might be too
much and I guess that there are other places where we assume that
there are no malformed files injected under $PGDATA.

>
> Note that there might be other places that may need the same kind of attention:
> pg_archivecleanup.c?

In pg_archivecleanup.c, we use sscanf() two places: for partial WAL
files and for backup history files. IIUC we check the format of the
filename using IsPartialXLogFileName() and IsBackupHistoryFileName()
before using sscanf() so I think it's okay.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Hi,

On Tue, Mar 04, 2025 at 12:11:07PM -0800, Masahiko Sawada wrote:
> On Tue, Mar 4, 2025 at 2:45 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:
> > > PFA 0001 a new version to also parse the ".snap".
> >
> > PFA v3 (a slightly different version) to "correctly" use the new report_error
> > introduced in v2.
> 
> Thank you for the patch! I agree to have some regression tests for that.
> 
> Looking at the 0001 patch, it seems that the pg_logicalinspect
> functions still accepts invalid filenames as follows:
> 
> postgres(1:2713976)=# select
> pg_get_logical_snapshot_info('0-40796E18.foo.snap');
> ERROR:  could not open file "pg_logical/snapshots/0-40796E18.snap": No
> such file or directory

Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
in parse_snapshot_filename() is not needed per say because the function is 
currently executed in a short-lived memory context. It's there for safety reason
in case it's called outside those SQL apis in the future.

> Regarding the 0002 patch, TBH I'm confused about what the original
> comment intended to mean. I thought that it means that we don't need
> to use a critical section for fsync_fname() calls to promote the fsync
> error.

Yeah I think the same.

> If that's right, it would be better to update the comment
> instead of removing it.

hmm I'm not sure what to say then. It was saying that there is not need to promote
to PANIC because it won't be usable anyway (if not fsynced).

Since 9ccdd7f66e3 we do promote to PANIC so I thought the most simple to avoid
any extra confusion is to remove the comment (since it's wrong since 9ccdd7f66e3).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:
> Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
> in parse_snapshot_filename() is not needed per say because the function is 
> currently executed in a short-lived memory context. It's there for safety reason
> in case it's called outside those SQL apis in the future.

After sleeping on it, PFA a simplified version.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:
> > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
> > in parse_snapshot_filename() is not needed per say because the function is
> > currently executed in a short-lived memory context. It's there for safety reason
> > in case it's called outside those SQL apis in the future.
>
> After sleeping on it, PFA a simplified version.
>

Thank you for updating the patch.

I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead. I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

On Thu, 6 Mar 2025 at 17:13, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> After sleeping on it, PFA a simplified version.

Running tests on v5 patch for a few hours was uneventful (which is good).

 
I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead. I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.


I've updated that with this patch too. Will update if something comes up.

-
robins
Hi,

On Wed, Mar 05, 2025 at 10:42:35PM -0800, Masahiko Sawada wrote:
> On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:
> > > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
> > > in parse_snapshot_filename() is not needed per say because the function is
> > > currently executed in a short-lived memory context. It's there for safety reason
> > > in case it's called outside those SQL apis in the future.
> >
> > After sleeping on it, PFA a simplified version.
> >
> 
> Thank you for updating the patch.
> 
> I think we don't need to even do palloc() for the buffer as we can use
> the char[MAXPGPATH] instead.

Sure.

> I've attached the patch to improve the
> parse_snapshot_filename() function and add some regression tests.
> Please review these changes.

Thanks for the patch!

=== 1

-parse_snapshot_filename(const char *filename)
+parse_snapshot_filename(char *filename)

Why?

=== 2

-   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
+   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)

We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
(sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
still pass.

So, I think it's better to remove the .snap here as it could give the "wrong"
impression that it's "useful".

The attached removes the .snap and adds a comment like:

"
 * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
 * converted values (%X), not literal text matches.
"

I think it makes sense to document this behavior.

=== 3

+   /*
+    * Bring back the LSN to the snapshot file format and compare
+    * it to  the given name to see if the extracted LSN is sane.
+    */
+   sprintf(tmpfname, "%X-%X.snap", hi, lo);
+   if (strcmp(tmpfname, filename) != 0)

The idea was also to ensure that there are no extra characters between the LSN
values and the .snap extension: Adding this as an extra comment in the attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Mar 05, 2025 at 10:42:35PM -0800, Masahiko Sawada wrote:
> > On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:
> > > > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
> > > > in parse_snapshot_filename() is not needed per say because the function is
> > > > currently executed in a short-lived memory context. It's there for safety reason
> > > > in case it's called outside those SQL apis in the future.
> > >
> > > After sleeping on it, PFA a simplified version.
> > >
> >
> > Thank you for updating the patch.
> >
> > I think we don't need to even do palloc() for the buffer as we can use
> > the char[MAXPGPATH] instead.
>
> Sure.
>
> > I've attached the patch to improve the
> > parse_snapshot_filename() function and add some regression tests.
> > Please review these changes.
>
> Thanks for the patch!
>
> === 1
>
> -parse_snapshot_filename(const char *filename)
> +parse_snapshot_filename(char *filename)
>
> Why?

Sorry, that's a leftover that I attempted. We should use 'const'.

>
> === 2
>
> -   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
> +   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)
>
> We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
> (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
> still pass.
>
> So, I think it's better to remove the .snap here as it could give the "wrong"
> impression that it's "useful".
>
> The attached removes the .snap and adds a comment like:
>
> "
>  * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
>  * converted values (%X), not literal text matches.
> "

Yes, but we include the suffix when doing sscanf() in many other
places. I don't see the specific benefit of doing it in another way
only in this case.

>
> I think it makes sense to document this behavior.
>
> === 3
>
> +   /*
> +    * Bring back the LSN to the snapshot file format and compare
> +    * it to  the given name to see if the extracted LSN is sane.
> +    */
> +   sprintf(tmpfname, "%X-%X.snap", hi, lo);
> +   if (strcmp(tmpfname, filename) != 0)
>
> The idea was also to ensure that there are no extra characters between the LSN
> values and the .snap extension:

Right. How about the following comment?

Bring back the extracted LSN to the snapshot file format and compare
it to the given filename. This check strictly checks if the given filename
follows the format of the snapshot filename.

FYI I've analyzed the idea of extracting LSN from the filename and
bringing it back to the format to validate the given filename. IIUC we
strictly validate the given filename. For example, we accept
'0-ABC.snap' but not '0-0ABC.snap, which is probably fine. Also, I was
concerned that sscanf() could be affected by locale, but it's fine
particularly in this case as we scan only hex numbers.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 06, 2025 at 11:35:58AM -0800, Masahiko Sawada wrote:
> On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > === 1
> >
> > -parse_snapshot_filename(const char *filename)
> > +parse_snapshot_filename(char *filename)
> >
> > Why?
> 
> Sorry, that's a leftover that I attempted. We should use 'const'.

No problem at all! Okay, re-added in the attached.

> >
> > === 2
> >
> > -   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
> > +   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)
> >
> > We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
> > (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
> > still pass.
> >
> > So, I think it's better to remove the .snap here as it could give the "wrong"
> > impression that it's "useful".
> >
> > The attached removes the .snap and adds a comment like:
> >
> > "
> >  * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
> >  * converted values (%X), not literal text matches.
> > "
> 
> Yes, but we include the suffix when doing sscanf() in many other
> places. I don't see the specific benefit of doing it in another way
> only in this case.

The idea was to not give the "wrong" impression that it's "useful" to add the
suffix. That said I agree, it would be better to also update the other places
where the suffix is being used. I think I'll start a dedicated thread for
that (to at least put some comments around those places).

As far our current patch then I propose to add a comment though, like in the
attached?

> > === 3
> >
> > +   /*
> > +    * Bring back the LSN to the snapshot file format and compare
> > +    * it to  the given name to see if the extracted LSN is sane.
> > +    */
> > +   sprintf(tmpfname, "%X-%X.snap", hi, lo);
> > +   if (strcmp(tmpfname, filename) != 0)
> >
> > The idea was also to ensure that there are no extra characters between the LSN
> > values and the .snap extension:
> 
> Right. How about the following comment?
> 
> Bring back the extracted LSN to the snapshot file format and compare
> it to the given filename. This check strictly checks if the given filename
> follows the format of the snapshot filename.

Yeah sounds good, done that way in the attached.

> FYI I've analyzed the idea of extracting LSN from the filename and
> bringing it back to the format to validate the given filename. IIUC we
> strictly validate the given filename. For example, we accept
> '0-ABC.snap' but not '0-0ABC.snap',

Yeah, and so '0A-ABC.snap' for example.

> which is probably fine.

I think so, as long as the snapshot files are generated with %X (and not %08X
for example) in SnapBuildSerialize():

"
    sprintf(path, "%s/%X-%X.snap",
            PG_LOGICAL_SNAPSHOTS_DIR,
            LSN_FORMAT_ARGS(lsn));
"

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Mar 6, 2025 at 11:02 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Mar 06, 2025 at 11:35:58AM -0800, Masahiko Sawada wrote:
> > On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > === 1
> > >
> > > -parse_snapshot_filename(const char *filename)
> > > +parse_snapshot_filename(char *filename)
> > >
> > > Why?
> >
> > Sorry, that's a leftover that I attempted. We should use 'const'.
>
> No problem at all! Okay, re-added in the attached.
>
> > >
> > > === 2
> > >
> > > -   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
> > > +   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)
> > >
> > > We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
> > > (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
> > > still pass.
> > >
> > > So, I think it's better to remove the .snap here as it could give the "wrong"
> > > impression that it's "useful".
> > >
> > > The attached removes the .snap and adds a comment like:
> > >
> > > "
> > >  * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
> > >  * converted values (%X), not literal text matches.
> > > "
> >
> > Yes, but we include the suffix when doing sscanf() in many other
> > places. I don't see the specific benefit of doing it in another way
> > only in this case.
>
> The idea was to not give the "wrong" impression that it's "useful" to add the
> suffix. That said I agree, it would be better to also update the other places
> where the suffix is being used. I think I'll start a dedicated thread for
> that (to at least put some comments around those places).
>
> As far our current patch then I propose to add a comment though, like in the
> attached?
>
> > > === 3
> > >
> > > +   /*
> > > +    * Bring back the LSN to the snapshot file format and compare
> > > +    * it to  the given name to see if the extracted LSN is sane.
> > > +    */
> > > +   sprintf(tmpfname, "%X-%X.snap", hi, lo);
> > > +   if (strcmp(tmpfname, filename) != 0)
> > >
> > > The idea was also to ensure that there are no extra characters between the LSN
> > > values and the .snap extension:
> >
> > Right. How about the following comment?
> >
> > Bring back the extracted LSN to the snapshot file format and compare
> > it to the given filename. This check strictly checks if the given filename
> > follows the format of the snapshot filename.
>
> Yeah sounds good, done that way in the attached.
>
> > FYI I've analyzed the idea of extracting LSN from the filename and
> > bringing it back to the format to validate the given filename. IIUC we
> > strictly validate the given filename. For example, we accept
> > '0-ABC.snap' but not '0-0ABC.snap',
>
> Yeah, and so '0A-ABC.snap' for example.
>
> > which is probably fine.
>
> I think so, as long as the snapshot files are generated with %X (and not %08X
> for example) in SnapBuildSerialize():
>
> "
>     sprintf(path, "%s/%X-%X.snap",
>             PG_LOGICAL_SNAPSHOTS_DIR,
>             LSN_FORMAT_ARGS(lsn));
> "
>

Thank you for updating the patch. I've made some cosmetic changes and
attached a new version patch. Could you review it?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Fri, Mar 07, 2025 at 03:47:14PM -0800, Masahiko Sawada wrote:
> Thank you for updating the patch. I've made some cosmetic changes and
> attached a new version patch. Could you review it?

LGTM, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Sat, Mar 8, 2025 at 12:02 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Mar 07, 2025 at 03:47:14PM -0800, Masahiko Sawada wrote:
> > Thank you for updating the patch. I've made some cosmetic changes and
> > attached a new version patch. Could you review it?
>
> LGTM, thanks!
>

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 11, 2025 at 03:34:18PM -0700, Masahiko Sawada wrote:
> Pushed.

I'm seeing warnings for the CI build on Windows [0] that appear to be
related to this commit:

    c:\cirrus\contrib\pg_logicalinspect\pg_logicalinspect.c(88) : warning C4715: 'parse_snapshot_filename': not all
controlpaths return a value
 

I suspect we just need to do something like commit fe4ecd0 and return
InvalidXLogRecPtr after the ERROR.

[0] https://api.cirrus-ci.com/v1/task/4876492255002624/logs/build.log

-- 
nathan



On Wed, Mar 12, 2025 at 12:03 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 03:34:18PM -0700, Masahiko Sawada wrote:
> > Pushed.
>
> I'm seeing warnings for the CI build on Windows [0] that appear to be
> related to this commit:
>
>         c:\cirrus\contrib\pg_logicalinspect\pg_logicalinspect.c(88) : warning C4715: 'parse_snapshot_filename': not
allcontrol paths return a value 
>
> I suspect we just need to do something like commit fe4ecd0 and return
> InvalidXLogRecPtr after the ERROR.

Thank you for the report! David also reported the same issue[1]. I've
just pushed the fix.

BTW how did you catch the warning? I checked before the push if the
change had passed the CI tests (completed with green status). But when
checking the build log now, I can see the warning in the build logs.

Regards,

[1] https://www.postgresql.org/message-id/CAApHDvqrhFfnetbcwgGkJ=z63T8HfQ_OyP=vX8BYiXyxFKt67w@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 12, 2025 at 02:33:36PM -0700, Masahiko Sawada wrote:
> Thank you for the report! David also reported the same issue[1]. I've
> just pushed the fix.

Oh, sorry, I should've checked that list first.

> BTW how did you catch the warning? I checked before the push if the
> change had passed the CI tests (completed with green status). But when
> checking the build log now, I can see the warning in the build logs.

I noticed a ⚠ next to the results on http://cfbot.cputube.org/.

-- 
nathan



On Wed, Mar 12, 2025 at 3:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Mar 12, 2025 at 02:33:36PM -0700, Masahiko Sawada wrote:
> > Thank you for the report! David also reported the same issue[1]. I've
> > just pushed the fix.
>
> Oh, sorry, I should've checked that list first.
>
> > BTW how did you catch the warning? I checked before the push if the
> > change had passed the CI tests (completed with green status). But when
> > checking the build log now, I can see the warning in the build logs.
>
> I noticed a ⚠ next to the results on http://cfbot.cputube.org/.

That's good to know. Thank you. I'll add it to my pre-commit check routine.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Thu, 13 Mar 2025 at 10:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> BTW how did you catch the warning? I checked before the push if the
> change had passed the CI tests (completed with green status). But when
> checking the build log now, I can see the warning in the build logs.

My workflow is, when still working in the feature branch, to always do
"perl src/tools/pgindent/pgindent --commit HEAD" once I'm ready for it
to be pushed (assuming there's only 1 commit). If there's any diff
I'll "git commit -a --fixup HEAD && git rebase -i master" and adjust
the commit type of the fixup commit as "fixup" (or "f"). I'll then do
a final "git format-patch" and "git am" that to git master and run the
tests before pushing. I tend to never do any fixups in git master as
I'm too scared I'll one day push one of them accidentally.

If the pgindent diff has anything unexpected in it, I'll consider if I
need to update typedefs.list to fix.

David



On Wed, Mar 12, 2025 at 5:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 13 Mar 2025 at 10:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > BTW how did you catch the warning? I checked before the push if the
> > change had passed the CI tests (completed with green status). But when
> > checking the build log now, I can see the warning in the build logs.
>
> My workflow is, when still working in the feature branch, to always do
> "perl src/tools/pgindent/pgindent --commit HEAD" once I'm ready for it
> to be pushed (assuming there's only 1 commit). If there's any diff
> I'll "git commit -a --fixup HEAD && git rebase -i master" and adjust
> the commit type of the fixup commit as "fixup" (or "f"). I'll then do
> a final "git format-patch" and "git am" that to git master and run the
> tests before pushing.

Thank you for sharing. I'll integrate the workflow with mine.

>  I tend to never do any fixups in git master as
> I'm too scared I'll one day push one of them accidentally.

Yeah, that's important.

Thank you for fixing the indentation issue (b955df44340). I've checked
the issue with CI before pushing but I missed the fix for some reason.
That's my fault.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com