Thread: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18828 Logged by: Robins Tharakan Email address: tharakan@gmail.com PostgreSQL version: Unsupported/Unknown Operating system: Ubuntu Description: Passing an empty string to the recently added extension function pg_get_logical_snapshot_meta() triggers PANIC / Abort. SQL === CREATE EXTENSION pg_logicalinspect; SELECT pg_get_logical_snapshot_meta(''); Commit ====== . . Checking (56ba0463d38~950) - 3c7d78427e - fail Checking (56ba0463d38~1110) - dc68515968 - Pass Checking (56ba0463d38~1030) - 67a54b9e83 - Pass Checking (56ba0463d38~990) - e2fd615ecc - Pass Checking (56ba0463d38~970) - 665785d85f - fail Checking (56ba0463d38~980) - b360d1762b - fail Checking (56ba0463d38~985) - 04bec894a0 - fail Checking (56ba0463d38~987) - d5ca15ee54 - fail Checking (56ba0463d38~988) - 2453196107 - fail Checking (56ba0463d38~989) - 7cdfeee320 - fail Current=7cdfeee320e72162b62dddddee638e713c2b8680 - Commit Found. Error Log ========= 2025-03-02 04:28:02.355 ACDT [3826851] PANIC: could not open file "pg_logical/snapshots/": Is a directory 2025-03-02 04:28:02.355 ACDT [3826851] STATEMENT: select pg_get_logical_snapshot_meta(''); 2025-03-02 04:28:02.896 ACDT [3745459] LOG: client backend (PID 3826851) was terminated by signal 6: Aborted 2025-03-02 04:28:02.896 ACDT [3745459] DETAIL: Failed process was running: select pg_get_logical_snapshot_meta(''); 2025-03-02 04:28:02.896 ACDT [3745459] LOG: terminating any other active server processes Output ====== smith@camry:~/proj/sqith$ psql -c "select pg_get_logical_snapshot_meta('');" PANIC: could not open file "pg_logical/snapshots/": Is a directory server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost
PG Bug reporting form <noreply@postgresql.org> writes: > Passing an empty string to the recently added extension function > pg_get_logical_snapshot_meta() triggers PANIC / Abort. Thanks for noticing. It looks like this function is far too trusting of its input. Another way to crash it is to point to a directory: regression=# SELECT pg_get_logical_snapshot_meta('../snapshots'); PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. This example incidentally shows that it's not sanitizing the request to prevent reference to files outside the snapshots directory, which possibly would be a security bug down the road. I realize that we restrict access to the function, but still we shouldn't be *this* trusting of the argument. Possibly some of the defense for this ought to be in SnapBuildRestoreSnapshot: it looks like that function isn't considering the possibility that OpenTransientFile will succeed on a directory. Does it have any other callers passing less-than-fully-trustworthy paths? Another interesting question is why the error is being promoted to PANIC. That sure seems unnecessary, and there's a comment in SnapBuildRestoreSnapshot that claims it won't happen. regards, tom lane
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Sat, Mar 1, 2025 at 12:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > PG Bug reporting form <noreply@postgresql.org> writes: > > Passing an empty string to the recently added extension function > > pg_get_logical_snapshot_meta() triggers PANIC / Abort. > > Thanks for noticing. It looks like this function is far too trusting > of its input. Another way to crash it is to point to a directory: > > regression=# SELECT pg_get_logical_snapshot_meta('../snapshots'); > PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > This example incidentally shows that it's not sanitizing the request > to prevent reference to files outside the snapshots directory, which > possibly would be a security bug down the road. I realize that we > restrict access to the function, but still we shouldn't be *this* > trusting of the argument. Agreed. > > Possibly some of the defense for this ought to be in > SnapBuildRestoreSnapshot: it looks like that function isn't > considering the possibility that OpenTransientFile will succeed on a > directory. Does it have any other callers passing > less-than-fully-trustworthy paths? SnapBuildRestoreSnapshot() is called by pg_logicalinspect functions (pg_get_logical_snapshot_meta() and pg_get_logical_snapshot_info()) and SnapBuildRestore(). For the latter, SnapBuildRestore() surely passes the file name, so it would not be a problem. One way to fix this issue is that SnapBuildRestoreSnapshot() takes an LSN corresponding to the serialized snapshot instead of a path, and it constructs the .snap file path correctly. This fix would also prevent it from being a security bug. pg_logicalinspect functions, pg_get_logical_snapshot_info() and pg_get_logical_snapshot_meta(), would also need to be changed so that they extract the LSN from the given file name. If the given file name is incorrect, it should raise an error. > > Another interesting question is why the error is being promoted > to PANIC. That sure seems unnecessary, and there's a comment > in SnapBuildRestoreSnapshot that claims it won't happen. > This PANIC occurred because fsync_fname() with isdir=false was called for the directory, pg_logical/snapshots/' in the above case (data_sync_retry is false by default). I think you referred to the following comment in SnapBuildRestoreSnapshot(): /* ---- * Make sure the snapshot had been stored safely to disk, that's normally * cheap. * Note that we do not need PANIC here, nobody will be able to use the * slot without fsyncing, and saving it won't succeed without an fsync() * either... * ---- */ fsync_fname(path, false); fsync_fname(PG_LOGICAL_SNAPSHOTS_DIR, true); These comments have been present since the initial logical decoding commit, whereas data_sync_retry was introduced at a later stage. While the comment's reference to PANIC suggests that a critical section isn't necessary here, the actual PANIC occurred in an unanticipated manner. SnapBuildRestoreSnapshot() works correctly as long as it takes a correct file path but 7cdfeee320e72 violated this assumption. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Sat, Mar 01, 2025 at 03:47:12PM -0500, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: > > Passing an empty string to the recently added extension function > > pg_get_logical_snapshot_meta() triggers PANIC / Abort. Thanks for the report! > Thanks for noticing. It looks like this function is far too trusting > of its input. Another way to crash it is to point to a directory: > > regression=# SELECT pg_get_logical_snapshot_meta('../snapshots'); > PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > This example incidentally shows that it's not sanitizing the request > to prevent reference to files outside the snapshots directory, which > possibly would be a security bug down the road. I realize that we > restrict access to the function, but still we shouldn't be *this* > trusting of the argument. > Indeed that's bad. > Possibly some of the defense for this ought to be in > SnapBuildRestoreSnapshot: it looks like that function isn't > considering the possibility that OpenTransientFile will succeed on a > directory. Does it have any other callers passing > less-than-fully-trustworthy paths? I don't think so. The other caller is in snapbuild.c where the input is build based on a lsn. Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info() which suffers for the same issues reported above) should take a lsn as input parameter. That's how it has been proposed initially and that would simplify things. Thoughts? > Another interesting question is why the error is being promoted > to PANIC. That sure seems unnecessary, and there's a comment > in SnapBuildRestoreSnapshot that claims it won't happen. > Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in 9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate). I'm not sure we should change the logic here, maybe just update the comment? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Mon, Mar 3, 2025 at 12:50 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Sat, Mar 01, 2025 at 03:47:12PM -0500, Tom Lane wrote: > > PG Bug reporting form <noreply@postgresql.org> writes: > > > Passing an empty string to the recently added extension function > > > pg_get_logical_snapshot_meta() triggers PANIC / Abort. > > Thanks for the report! > > > Thanks for noticing. It looks like this function is far too trusting > > of its input. Another way to crash it is to point to a directory: > > > > regression=# SELECT pg_get_logical_snapshot_meta('../snapshots'); > > PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > > > This example incidentally shows that it's not sanitizing the request > > to prevent reference to files outside the snapshots directory, which > > possibly would be a security bug down the road. I realize that we > > restrict access to the function, but still we shouldn't be *this* > > trusting of the argument. > > > > Indeed that's bad. > > > Possibly some of the defense for this ought to be in > > SnapBuildRestoreSnapshot: it looks like that function isn't > > considering the possibility that OpenTransientFile will succeed on a > > directory. Does it have any other callers passing > > less-than-fully-trustworthy paths? > > I don't think so. The other caller is in snapbuild.c where the input is > build based on a lsn. > > Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info() > which suffers for the same issues reported above) should take a lsn as input > parameter. That's how it has been proposed initially and that would simplify > things. Thoughts? Yeah, that's one solution. But it would make pg_logicalinspect functions hard to work with the pg_ls_loigcalsnapdir(). Users would need to parse the filename by themselve and pass the LSN to them. I've attached a draft patch for a different approach where we extract LSN from the given file name in pg_logicalinspect functions. Thoughts? > > > Another interesting question is why the error is being promoted > > to PANIC. That sure seems unnecessary, and there's a comment > > in SnapBuildRestoreSnapshot that claims it won't happen. > > > > Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes > to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in > 9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate). > > I'm not sure we should change the logic here, maybe just update the comment? I think that we don't need to change the logic. It might be a good idea to clarify the comment, though. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Mon, Mar 03, 2025 at 12:42:01PM -0800, Masahiko Sawada wrote: > On Mon, Mar 3, 2025 at 12:50 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info() > > which suffers for the same issues reported above) should take a lsn as input > > parameter. That's how it has been proposed initially and that would simplify > > things. Thoughts? > > Yeah, that's one solution. But it would make pg_logicalinspect > functions hard to work with the pg_ls_loigcalsnapdir(). Users would > need to parse the filename by themselve and pass the LSN to them. I've > attached a draft patch for a different approach where we extract LSN > from the given file name in pg_logicalinspect functions. Thoughts? Yeah I agree that your proposal is more user friendly. Thanks for the patch, you beat me on it! (I was waiting that we agree on the approach before working on it). Looking at your patch, some comments: === 1 + if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) + ereport(ERROR, sscanf would not return an error if the file being used is say "0-40796E18.foo": postgres=# SELECT pg_get_logical_snapshot_info('0-40796E18.foo'); pg_get_logical_snapshot_info ------------------------------------------------------------------------- (consistent,751,751,0/40796AF8,0/40796AF8,0,f,f,0/0,0,0,,2,"{751,752}") I think that as long as it finds the &hi and &lo then it does not return an error. PFA 0001 a new version to also parse the ".snap". === 2 + errmsg("invalid serialized snapshot file name \"%s\"", filename)); In the extension doc we mention "a snapshot file" so I think it makes sense to remove "serialized" from the error message: also done in 0001 attached. === 3 + * Extract LSN from the given serialized snapshot s/Extract LSN/Extract the LSN/? (done in 0001 attached). === 4 Also adding some regression tests in 0001 attached to validate the imputs and the permissions. > > Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes > > to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in > > 9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate). > > > > I'm not sure we should change the logic here, maybe just update the comment? > > I think that we don't need to change the logic. It might be a good > idea to clarify the comment, though. Agree. I think that we can just get rid of it: done in 0002 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote: > PFA 0001 a new version to also parse the ".snap". PFA v3 (a slightly different version) to "correctly" use the new report_error introduced in v2. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Tue, Mar 04, 2025 at 10:45:54AM +0000, Bertrand Drouvot wrote: > Hi, > > On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote: > > PFA 0001 a new version to also parse the ".snap". > > PFA v3 (a slightly different version) to "correctly" use the new report_error > introduced in v2. It looks like that CheckPointSnapBuild() also rely on sscanf() to check for the snapshot file extension. As seen, up-thread we can't rely on sscanf() to do so. Attached a c file to show the "issue": $ gcc -o sscanf_file_ext sscanf_file_ext.c $ ./sscanf_file_ext 0-40796E18.snap File: "0-40796E18.snap" Parse result: 2 (2 means success) $ ./sscanf_file_ext 0-40796E18.foo File: "0-40796E18.foo" Parse result: 2 (2 means success) So it looks like that, in reality, in CheckPointSnapBuild() we report a parsing message and "continue" even if the file name extension is not ".snap". That's not a big deal because, in pratice, it still removes the .$pid.tmp files (given they have the "hex" part well formated). But, it does not look like it was the original intent, so maybe we should also re-think CheckPointSnapBuild() and open a dedicated thread? (If so, I'm happy to do so). Note that there might be other places that may need the same kind of attention: pg_archivecleanup.c? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Tue, Mar 4, 2025 at 2:45 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote: > > PFA 0001 a new version to also parse the ".snap". > > PFA v3 (a slightly different version) to "correctly" use the new report_error > introduced in v2. Thank you for the patch! I agree to have some regression tests for that. Looking at the 0001 patch, it seems that the pg_logicalinspect functions still accepts invalid filenames as follows: postgres(1:2713976)=# select pg_get_logical_snapshot_info('0-40796E18.foo.snap'); ERROR: could not open file "pg_logical/snapshots/0-40796E18.snap": No such file or directory Regarding the 0002 patch, TBH I'm confused about what the original comment intended to mean. I thought that it means that we don't need to use a critical section for fsync_fname() calls to promote the fsync error. If that's right, it would be better to update the comment instead of removing it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Tue, Mar 4, 2025 at 7:05 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Tue, Mar 04, 2025 at 10:45:54AM +0000, Bertrand Drouvot wrote: > > Hi, > > > > On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote: > > > PFA 0001 a new version to also parse the ".snap". > > > > PFA v3 (a slightly different version) to "correctly" use the new report_error > > introduced in v2. > > It looks like that CheckPointSnapBuild() also rely on sscanf() to check > for the snapshot file extension. As seen, up-thread we can't rely on sscanf() > to do so. > > Attached a c file to show the "issue": > > $ gcc -o sscanf_file_ext sscanf_file_ext.c > $ ./sscanf_file_ext 0-40796E18.snap > File: "0-40796E18.snap" > Parse result: 2 (2 means success) > > $ ./sscanf_file_ext 0-40796E18.foo > File: "0-40796E18.foo" > Parse result: 2 (2 means success) > > So it looks like that, in reality, in CheckPointSnapBuild() we report a parsing > message and "continue" even if the file name extension is not ".snap". > > That's not a big deal because, in pratice, it still removes the .$pid.tmp files > (given they have the "hex" part well formated). But, it does not look like it was > the original intent, so maybe we should also re-think CheckPointSnapBuild() and > open a dedicated thread? (If so, I'm happy to do so). I believe that the situation is a bit different in these cases; in CheckPointSnapBuild () we iterate over all files in pg_logical/snapshots directory and extract the LSN from the filenames whereas in pg_logicalinspect case, user can pass arbitrary filename. Therefore, as for the former cases, it might make sense to check the file extension as well if we want to prevent a problem where someone injects files into pg_logical/snapshots directory. But it might be too much and I guess that there are other places where we assume that there are no malformed files injected under $PGDATA. > > Note that there might be other places that may need the same kind of attention: > pg_archivecleanup.c? In pg_archivecleanup.c, we use sscanf() two places: for partial WAL files and for backup history files. IIUC we check the format of the filename using IsPartialXLogFileName() and IsBackupHistoryFileName() before using sscanf() so I think it's okay. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Tue, Mar 04, 2025 at 12:11:07PM -0800, Masahiko Sawada wrote: > On Tue, Mar 4, 2025 at 2:45 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote: > > > PFA 0001 a new version to also parse the ".snap". > > > > PFA v3 (a slightly different version) to "correctly" use the new report_error > > introduced in v2. > > Thank you for the patch! I agree to have some regression tests for that. > > Looking at the 0001 patch, it seems that the pg_logicalinspect > functions still accepts invalid filenames as follows: > > postgres(1:2713976)=# select > pg_get_logical_snapshot_info('0-40796E18.foo.snap'); > ERROR: could not open file "pg_logical/snapshots/0-40796E18.snap": No > such file or directory Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree() in parse_snapshot_filename() is not needed per say because the function is currently executed in a short-lived memory context. It's there for safety reason in case it's called outside those SQL apis in the future. > Regarding the 0002 patch, TBH I'm confused about what the original > comment intended to mean. I thought that it means that we don't need > to use a critical section for fsync_fname() calls to promote the fsync > error. Yeah I think the same. > If that's right, it would be better to update the comment > instead of removing it. hmm I'm not sure what to say then. It was saying that there is not need to promote to PANIC because it won't be usable anyway (if not fsynced). Since 9ccdd7f66e3 we do promote to PANIC so I thought the most simple to avoid any extra confusion is to remove the comment (since it's wrong since 9ccdd7f66e3). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote: > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree() > in parse_snapshot_filename() is not needed per say because the function is > currently executed in a short-lived memory context. It's there for safety reason > in case it's called outside those SQL apis in the future. After sleeping on it, PFA a simplified version. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote: > > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree() > > in parse_snapshot_filename() is not needed per say because the function is > > currently executed in a short-lived memory context. It's there for safety reason > > in case it's called outside those SQL apis in the future. > > After sleeping on it, PFA a simplified version. > Thank you for updating the patch. I think we don't need to even do palloc() for the buffer as we can use the char[MAXPGPATH] instead. I've attached the patch to improve the parse_snapshot_filename() function and add some regression tests. Please review these changes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Robins Tharakan
Date:
On Thu, 6 Mar 2025 at 17:13, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> After sleeping on it, PFA a simplified version.
Running tests on v5 patch for a few hours was uneventful (which is good).
I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead. I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.
I've updated that with this patch too. Will update if something comes up.
-
robins
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Wed, Mar 05, 2025 at 10:42:35PM -0800, Masahiko Sawada wrote: > On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote: > > > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree() > > > in parse_snapshot_filename() is not needed per say because the function is > > > currently executed in a short-lived memory context. It's there for safety reason > > > in case it's called outside those SQL apis in the future. > > > > After sleeping on it, PFA a simplified version. > > > > Thank you for updating the patch. > > I think we don't need to even do palloc() for the buffer as we can use > the char[MAXPGPATH] instead. Sure. > I've attached the patch to improve the > parse_snapshot_filename() function and add some regression tests. > Please review these changes. Thanks for the patch! === 1 -parse_snapshot_filename(const char *filename) +parse_snapshot_filename(char *filename) Why? === 2 - if (sscanf(filename, "%X-%X", &hi, &lo) != 2) + if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would still pass. So, I think it's better to remove the .snap here as it could give the "wrong" impression that it's "useful". The attached removes the .snap and adds a comment like: " * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts * converted values (%X), not literal text matches. " I think it makes sense to document this behavior. === 3 + /* + * Bring back the LSN to the snapshot file format and compare + * it to the given name to see if the extracted LSN is sane. + */ + sprintf(tmpfname, "%X-%X.snap", hi, lo); + if (strcmp(tmpfname, filename) != 0) The idea was also to ensure that there are no extra characters between the LSN values and the .snap extension: Adding this as an extra comment in the attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Mar 05, 2025 at 10:42:35PM -0800, Masahiko Sawada wrote: > > On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > Hi, > > > > > > On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote: > > > > Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree() > > > > in parse_snapshot_filename() is not needed per say because the function is > > > > currently executed in a short-lived memory context. It's there for safety reason > > > > in case it's called outside those SQL apis in the future. > > > > > > After sleeping on it, PFA a simplified version. > > > > > > > Thank you for updating the patch. > > > > I think we don't need to even do palloc() for the buffer as we can use > > the char[MAXPGPATH] instead. > > Sure. > > > I've attached the patch to improve the > > parse_snapshot_filename() function and add some regression tests. > > Please review these changes. > > Thanks for the patch! > > === 1 > > -parse_snapshot_filename(const char *filename) > +parse_snapshot_filename(char *filename) > > Why? Sorry, that's a leftover that I attempted. We should use 'const'. > > === 2 > > - if (sscanf(filename, "%X-%X", &hi, &lo) != 2) > + if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) > > We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with > (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would > still pass. > > So, I think it's better to remove the .snap here as it could give the "wrong" > impression that it's "useful". > > The attached removes the .snap and adds a comment like: > > " > * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts > * converted values (%X), not literal text matches. > " Yes, but we include the suffix when doing sscanf() in many other places. I don't see the specific benefit of doing it in another way only in this case. > > I think it makes sense to document this behavior. > > === 3 > > + /* > + * Bring back the LSN to the snapshot file format and compare > + * it to the given name to see if the extracted LSN is sane. > + */ > + sprintf(tmpfname, "%X-%X.snap", hi, lo); > + if (strcmp(tmpfname, filename) != 0) > > The idea was also to ensure that there are no extra characters between the LSN > values and the .snap extension: Right. How about the following comment? Bring back the extracted LSN to the snapshot file format and compare it to the given filename. This check strictly checks if the given filename follows the format of the snapshot filename. FYI I've analyzed the idea of extracting LSN from the filename and bringing it back to the format to validate the given filename. IIUC we strictly validate the given filename. For example, we accept '0-ABC.snap' but not '0-0ABC.snap, which is probably fine. Also, I was concerned that sscanf() could be affected by locale, but it's fine particularly in this case as we scan only hex numbers. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Thu, Mar 06, 2025 at 11:35:58AM -0800, Masahiko Sawada wrote: > On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > === 1 > > > > -parse_snapshot_filename(const char *filename) > > +parse_snapshot_filename(char *filename) > > > > Why? > > Sorry, that's a leftover that I attempted. We should use 'const'. No problem at all! Okay, re-added in the attached. > > > > === 2 > > > > - if (sscanf(filename, "%X-%X", &hi, &lo) != 2) > > + if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) > > > > We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with > > (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would > > still pass. > > > > So, I think it's better to remove the .snap here as it could give the "wrong" > > impression that it's "useful". > > > > The attached removes the .snap and adds a comment like: > > > > " > > * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts > > * converted values (%X), not literal text matches. > > " > > Yes, but we include the suffix when doing sscanf() in many other > places. I don't see the specific benefit of doing it in another way > only in this case. The idea was to not give the "wrong" impression that it's "useful" to add the suffix. That said I agree, it would be better to also update the other places where the suffix is being used. I think I'll start a dedicated thread for that (to at least put some comments around those places). As far our current patch then I propose to add a comment though, like in the attached? > > === 3 > > > > + /* > > + * Bring back the LSN to the snapshot file format and compare > > + * it to the given name to see if the extracted LSN is sane. > > + */ > > + sprintf(tmpfname, "%X-%X.snap", hi, lo); > > + if (strcmp(tmpfname, filename) != 0) > > > > The idea was also to ensure that there are no extra characters between the LSN > > values and the .snap extension: > > Right. How about the following comment? > > Bring back the extracted LSN to the snapshot file format and compare > it to the given filename. This check strictly checks if the given filename > follows the format of the snapshot filename. Yeah sounds good, done that way in the attached. > FYI I've analyzed the idea of extracting LSN from the filename and > bringing it back to the format to validate the given filename. IIUC we > strictly validate the given filename. For example, we accept > '0-ABC.snap' but not '0-0ABC.snap', Yeah, and so '0A-ABC.snap' for example. > which is probably fine. I think so, as long as the snapshot files are generated with %X (and not %08X for example) in SnapBuildSerialize(): " sprintf(path, "%s/%X-%X.snap", PG_LOGICAL_SNAPSHOTS_DIR, LSN_FORMAT_ARGS(lsn)); " Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Thu, Mar 6, 2025 at 11:02 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Thu, Mar 06, 2025 at 11:35:58AM -0800, Masahiko Sawada wrote: > > On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > === 1 > > > > > > -parse_snapshot_filename(const char *filename) > > > +parse_snapshot_filename(char *filename) > > > > > > Why? > > > > Sorry, that's a leftover that I attempted. We should use 'const'. > > No problem at all! Okay, re-added in the attached. > > > > > > > === 2 > > > > > > - if (sscanf(filename, "%X-%X", &hi, &lo) != 2) > > > + if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) > > > > > > We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with > > > (sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would > > > still pass. > > > > > > So, I think it's better to remove the .snap here as it could give the "wrong" > > > impression that it's "useful". > > > > > > The attached removes the .snap and adds a comment like: > > > > > > " > > > * Note: We deliberately don't use "%X-%X.snap" because sscanf only counts > > > * converted values (%X), not literal text matches. > > > " > > > > Yes, but we include the suffix when doing sscanf() in many other > > places. I don't see the specific benefit of doing it in another way > > only in this case. > > The idea was to not give the "wrong" impression that it's "useful" to add the > suffix. That said I agree, it would be better to also update the other places > where the suffix is being used. I think I'll start a dedicated thread for > that (to at least put some comments around those places). > > As far our current patch then I propose to add a comment though, like in the > attached? > > > > === 3 > > > > > > + /* > > > + * Bring back the LSN to the snapshot file format and compare > > > + * it to the given name to see if the extracted LSN is sane. > > > + */ > > > + sprintf(tmpfname, "%X-%X.snap", hi, lo); > > > + if (strcmp(tmpfname, filename) != 0) > > > > > > The idea was also to ensure that there are no extra characters between the LSN > > > values and the .snap extension: > > > > Right. How about the following comment? > > > > Bring back the extracted LSN to the snapshot file format and compare > > it to the given filename. This check strictly checks if the given filename > > follows the format of the snapshot filename. > > Yeah sounds good, done that way in the attached. > > > FYI I've analyzed the idea of extracting LSN from the filename and > > bringing it back to the format to validate the given filename. IIUC we > > strictly validate the given filename. For example, we accept > > '0-ABC.snap' but not '0-0ABC.snap', > > Yeah, and so '0A-ABC.snap' for example. > > > which is probably fine. > > I think so, as long as the snapshot files are generated with %X (and not %08X > for example) in SnapBuildSerialize(): > > " > sprintf(path, "%s/%X-%X.snap", > PG_LOGICAL_SNAPSHOTS_DIR, > LSN_FORMAT_ARGS(lsn)); > " > Thank you for updating the patch. I've made some cosmetic changes and attached a new version patch. Could you review it? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Bertrand Drouvot
Date:
Hi, On Fri, Mar 07, 2025 at 03:47:14PM -0800, Masahiko Sawada wrote: > Thank you for updating the patch. I've made some cosmetic changes and > attached a new version patch. Could you review it? LGTM, thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Sat, Mar 8, 2025 at 12:02 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Fri, Mar 07, 2025 at 03:47:14PM -0800, Masahiko Sawada wrote: > > Thank you for updating the patch. I've made some cosmetic changes and > > attached a new version patch. Could you review it? > > LGTM, thanks! > Pushed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Nathan Bossart
Date:
On Tue, Mar 11, 2025 at 03:34:18PM -0700, Masahiko Sawada wrote: > Pushed. I'm seeing warnings for the CI build on Windows [0] that appear to be related to this commit: c:\cirrus\contrib\pg_logicalinspect\pg_logicalinspect.c(88) : warning C4715: 'parse_snapshot_filename': not all controlpaths return a value I suspect we just need to do something like commit fe4ecd0 and return InvalidXLogRecPtr after the ERROR. [0] https://api.cirrus-ci.com/v1/task/4876492255002624/logs/build.log -- nathan
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Wed, Mar 12, 2025 at 12:03 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Mar 11, 2025 at 03:34:18PM -0700, Masahiko Sawada wrote: > > Pushed. > > I'm seeing warnings for the CI build on Windows [0] that appear to be > related to this commit: > > c:\cirrus\contrib\pg_logicalinspect\pg_logicalinspect.c(88) : warning C4715: 'parse_snapshot_filename': not allcontrol paths return a value > > I suspect we just need to do something like commit fe4ecd0 and return > InvalidXLogRecPtr after the ERROR. Thank you for the report! David also reported the same issue[1]. I've just pushed the fix. BTW how did you catch the warning? I checked before the push if the change had passed the CI tests (completed with green status). But when checking the build log now, I can see the warning in the build logs. Regards, [1] https://www.postgresql.org/message-id/CAApHDvqrhFfnetbcwgGkJ=z63T8HfQ_OyP=vX8BYiXyxFKt67w@mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Nathan Bossart
Date:
On Wed, Mar 12, 2025 at 02:33:36PM -0700, Masahiko Sawada wrote: > Thank you for the report! David also reported the same issue[1]. I've > just pushed the fix. Oh, sorry, I should've checked that list first. > BTW how did you catch the warning? I checked before the push if the > change had passed the CI tests (completed with green status). But when > checking the build log now, I can see the warning in the build logs. I noticed a ⚠ next to the results on http://cfbot.cputube.org/. -- nathan
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Wed, Mar 12, 2025 at 3:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Mar 12, 2025 at 02:33:36PM -0700, Masahiko Sawada wrote: > > Thank you for the report! David also reported the same issue[1]. I've > > just pushed the fix. > > Oh, sorry, I should've checked that list first. > > > BTW how did you catch the warning? I checked before the push if the > > change had passed the CI tests (completed with green status). But when > > checking the build log now, I can see the warning in the build logs. > > I noticed a ⚠ next to the results on http://cfbot.cputube.org/. That's good to know. Thank you. I'll add it to my pre-commit check routine. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
David Rowley
Date:
On Thu, 13 Mar 2025 at 10:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > BTW how did you catch the warning? I checked before the push if the > change had passed the CI tests (completed with green status). But when > checking the build log now, I can see the warning in the build logs. My workflow is, when still working in the feature branch, to always do "perl src/tools/pgindent/pgindent --commit HEAD" once I'm ready for it to be pushed (assuming there's only 1 commit). If there's any diff I'll "git commit -a --fixup HEAD && git rebase -i master" and adjust the commit type of the fixup commit as "fixup" (or "f"). I'll then do a final "git format-patch" and "git am" that to git master and run the tests before pushing. I tend to never do any fixups in git master as I'm too scared I'll one day push one of them accidentally. If the pgindent diff has anything unexpected in it, I'll consider if I need to update typedefs.list to fix. David
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string
From
Masahiko Sawada
Date:
On Wed, Mar 12, 2025 at 5:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 13 Mar 2025 at 10:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > BTW how did you catch the warning? I checked before the push if the > > change had passed the CI tests (completed with green status). But when > > checking the build log now, I can see the warning in the build logs. > > My workflow is, when still working in the feature branch, to always do > "perl src/tools/pgindent/pgindent --commit HEAD" once I'm ready for it > to be pushed (assuming there's only 1 commit). If there's any diff > I'll "git commit -a --fixup HEAD && git rebase -i master" and adjust > the commit type of the fixup commit as "fixup" (or "f"). I'll then do > a final "git format-patch" and "git am" that to git master and run the > tests before pushing. Thank you for sharing. I'll integrate the workflow with mine. > I tend to never do any fixups in git master as > I'm too scared I'll one day push one of them accidentally. Yeah, that's important. Thank you for fixing the indentation issue (b955df44340). I've checked the issue with CI before pushing but I missed the fix for some reason. That's my fault. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com