Thread: Re: Small memory fixes for pg_createsubcriber
On Mon, Feb 10, 2025, at 1:31 PM, Ranier Vilela wrote:
Coverity has some reports about pg_createsubcriber.CID 1591322: (#1 of 1): Resource leak (RESOURCE_LEAK)10. leaked_storage: Variable dbname going out of scope leaks the storage it points to.Additionally there are several calls that are out of the ordinary, according to the Postgres API.According to the documentation:The correct function to free memory when using PQescapeLiteral and PQescapeIdentifier would be PQfreemem.
Hi,
From src/common/fe_memutils.c:
void
pg_free(void *ptr)
{
free(ptr);
}
From src/interfaces/libpq/fe-exec.c:
void
PQfreemem(void *ptr)
{
free(ptr);
}
There is no bug. They are the same behind the scenes. I'm fine changing it. It
is a new code and it wouldn't cause a lot of pain to backpatch patches in the
future.
@@ -1130,6 +1130,7 @@ check_and_drop_existing_subscriptions(PGconn *conn,
PQclear(res);
destroyPQExpBuffer(query);
+ PQfreemem(dbname);
}
Even if the pg_createsubscriber aims to run in a small amount of time, hence,
it is fine to leak memory, the initial commit cleaned up all variables but a
subsequent commit 4867f8a555c apparently didn't. Although it is just a low
impact improvement, it is better to be strict and shut up SASTs.
Em qua., 12 de fev. de 2025 às 00:54, Michael Paquier <michael@paquier.xyz> escreveu:
On Tue, Feb 11, 2025 at 01:32:32PM -0300, Euler Taveira wrote:
> There is no bug. They are the same behind the scenes. I'm fine changing it. It
> is a new code and it wouldn't cause a lot of pain to backpatch patches in the
> future.
On consistency grounds, and as this is documented in fe-exec.c at the
top of PQfreemem(), I can get behind the switch.
> Even if the pg_createsubscriber aims to run in a small amount of time, hence,
> it is fine to leak memory, the initial commit cleaned up all variables but a
> subsequent commit 4867f8a555c apparently didn't. Although it is just a low
> impact improvement, it is better to be strict and shut up SASTs.
check_and_drop_existing_subscriptions() is called once per database in
setup_subscriber(), and we are not going to have millions of them in
this list. We don't usually care for such short-lived things, but as
the original commit did the effort in d44032d01463, I don't see why we
cannot do it here, either.
Thanks Michael.
best regards,
Ranier Vilela
Hi, On 2025-02-11 13:32:32 -0300, Euler Taveira wrote: > On Mon, Feb 10, 2025, at 1:31 PM, Ranier Vilela wrote: > > Coverity has some reports about pg_createsubcriber. > > > > CID 1591322: (#1 of 1): Resource leak (RESOURCE_LEAK) > > 10. leaked_storage: Variable dbname going out of scope leaks the storage it points to. > > Additionally there are several calls that are out of the ordinary, according to the Postgres API. > > > > According to the documentation: > > libpq-exec <https://www.postgresql.org/docs/current/libpq-exec.html> > > > > > > The correct function to free memory when using PQescapeLiteral and PQescapeIdentifier would be PQfreemem. > > > > Hi, > > From src/common/fe_memutils.c: > > void > pg_free(void *ptr) > { > free(ptr); > } > > From src/interfaces/libpq/fe-exec.c: > > void > PQfreemem(void *ptr) > { > free(ptr); > } > > There is no bug. They are the same behind the scenes. I'm fine changing it. It > is a new code and it wouldn't cause a lot of pain to backpatch patches in the > future. That *is* a bug. On windows the allocator that a shared library (i.e. libpq) uses, may *not* be the same as the one that an executable (i.e. pg_createsubscriber). It's not correct to free memory allocated in a shared library just with free, it has to go through the library's free. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2025-02-11 13:32:32 -0300, Euler Taveira wrote: >> There is no bug. They are the same behind the scenes. > That *is* a bug. On windows the allocator that a shared library (i.e. libpq) > uses, may *not* be the same as the one that an executable > (i.e. pg_createsubscriber). It's not correct to free memory allocated in a > shared library just with free, it has to go through the library's free. Indeed. This is particularly pernicious because it will work even on Windows under common scenarios (which no doubt explains the lack of field reports). From the last time we discussed this [1]: It seems to work fine as long as a debug-readline is paired with a debug-psql or a release-readline is paired with a release-psql. I wish we had some way to detect misuses automatically ... This seems like the sort of bug that Coverity could detect if only it knew to look, but I have no idea if it could be configured that way. Maybe some weird lashup with a debugging malloc library would be another way. regards, tom lane [1] https://www.postgresql.org/message-id/20240709225934.746y5fg3kgxkyant@awork3.anarazel.de
Hi, On 2025-02-12 11:02:04 -0500, Tom Lane wrote: > I wish we had some way to detect misuses automatically ... > > This seems like the sort of bug that Coverity could detect if only it > knew to look, but I have no idea if it could be configured that way. > Maybe some weird lashup with a debugging malloc library would be > another way. Recent gcc actually has a fairly good way to detect this kind of issue: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute in particular, the variant of the attribute with "deallocator". I'd started on a patch to add those, but ran out of cycles for 18. I quickly checked out my branch and added the relevant attributes to PQescapeLiteral(), PQescapeIdentifier() and that indeed finds the issue pointed out in this thread: ../../../../../home/andres/src/postgresql/src/bin/pg_basebackup/pg_createsubscriber.c: In function 'create_logical_replication_slot': ../../../../../home/andres/src/postgresql/src/bin/pg_basebackup/pg_createsubscriber.c:1332:9: warning: 'pg_free' called onpointer returned from a mismatched allocation function [-Wmismatched-dealloc] 1332 | pg_free(slot_name_esc); | ^~~~~~~~~~~~~~~~~~~~~~ ../../../../../home/andres/src/postgresql/src/bin/pg_basebackup/pg_createsubscriber.c:1326:25: note: returned from 'PQescapeLiteral' 1326 | slot_name_esc = PQescapeLiteral(conn, slot_name, strlen(slot_name)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... but also finds one more: [62/214 42 28%] Compiling C object src/bin/pg_amcheck/pg_amcheck.p/pg_amcheck.c.o ../../../../../home/andres/src/postgresql/src/bin/pg_amcheck/pg_amcheck.c: In function 'main': ../../../../../home/andres/src/postgresql/src/bin/pg_amcheck/pg_amcheck.c:563:25: warning: 'pfree' called on pointer returnedfrom a mismatched allocation function [-Wmismatched-dealloc] 563 | pfree(schema); | ^~~~~~~~~~~~~ ../../../../../home/andres/src/postgresql/src/bin/pg_amcheck/pg_amcheck.c:556:34: note: returned from 'PQescapeIdentifier' 556 | schema = PQescapeIdentifier(conn, opts.install_schema, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 557 | strlen(opts.install_schema)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that that doesn't just require adding the attributes to PQescapeIdentifier() etc, but also to pg_malloc(), as the latter is how gcc knows that pg_pfree() is a deallocating function. Particularly for something like libpq it's not quitetrivial to add attributes like this, of course. We can't even depend on pg_config.h. One way would be to define them in libpq-fe.h, guarded by an #ifdef, that's "armed" by a commandline -D flag, if the compiler is supported? Greetings, Andres Freund
Michael Paquier <michael@paquier.xyz> writes: > I have looked at bit at the uses of PQescapeLiteral() and > PQescapeIdentifier() in the tree. On top of the one in pg_amcheck you > are just pointing to, there is an inconsistency in pg_upgrade.c for > set_locale_and_encoding() where datlocale_literal may be allocated > with a pg_strdup() or a PQescapeLiteral() depending on the path. The > code has been using PQfreemem() for the pg_strdup() allocation, which > is logically incorrect. Yeah, I suspected there would be places like that. It just hasn't mattered in practice up to now. (I have a vague recollection that Windows used to be pickier about this, but evidently not in recent years.) I spent a little time earlier today seeing what I could do with the use-dmalloc patch I posted earlier. It turns out you can get through initdb after s/free/PQfreemem/ in just two places, and then the backend works fine. But psql is a frickin' disaster --- there's free's of strings made with PQExpBuffer all over its backslash-command handling, and no easy way to clean it up. Maybe other clients will be less of a mess, but I'm not betting on that. regards, tom lane
On Thu, Feb 13, 2025 at 01:50:29PM +0900, Michael Paquier wrote: > On Wed, Feb 12, 2025 at 07:08:31PM -0500, Tom Lane wrote: >> I spent a little time earlier today seeing what I could do with the >> use-dmalloc patch I posted earlier. It turns out you can get through >> initdb after s/free/PQfreemem/ in just two places, and then the >> backend works fine. But psql is a frickin' disaster --- there's >> free's of strings made with PQExpBuffer all over its backslash-command >> handling, and no easy way to clean it up. Maybe other clients will >> be less of a mess, but I'm not betting on that. > > Hmm. Okay. It sounds like it would be better to group everything > that has been pointed together towards what should be a more generic > solution than what I have posted. So I am holding on touching > anything. Two weeks later. Would there be any objections for doing something in the lines of [1] for pg_upgrade and pg_amcheck? The pg_upgrade bit is a bit strange, for sure, still it's better than the current state of things. [1]: https://www.postgresql.org/message-id/Z601RQxTmIUohdkV@paquier.xyz -- Michael
Attachment
Em ter., 25 de fev. de 2025 às 03:02, Michael Paquier <michael@paquier.xyz> escreveu:
On Thu, Feb 13, 2025 at 01:50:29PM +0900, Michael Paquier wrote:
> On Wed, Feb 12, 2025 at 07:08:31PM -0500, Tom Lane wrote:
>> I spent a little time earlier today seeing what I could do with the
>> use-dmalloc patch I posted earlier. It turns out you can get through
>> initdb after s/free/PQfreemem/ in just two places, and then the
>> backend works fine. But psql is a frickin' disaster --- there's
>> free's of strings made with PQExpBuffer all over its backslash-command
>> handling, and no easy way to clean it up. Maybe other clients will
>> be less of a mess, but I'm not betting on that.
>
> Hmm. Okay. It sounds like it would be better to group everything
> that has been pointed together towards what should be a more generic
> solution than what I have posted. So I am holding on touching
> anything.
Two weeks later. Would there be any objections for doing something in
the lines of [1] for pg_upgrade and pg_amcheck?
I prefer not to mix api.
We already have a branch for decision.
Let's use it then.
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index d95c491fb5..b2c8e8e420 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -455,7 +455,9 @@ set_locale_and_encoding(void)
locale->db_locale,
strlen(locale->db_locale));
else
- datlocale_literal = pg_strdup("NULL");
+ datlocale_literal = PQescapeLiteral(conn_new_template1,
+ "NULL",
+ strlen("NULL"));
index d95c491fb5..b2c8e8e420 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -455,7 +455,9 @@ set_locale_and_encoding(void)
locale->db_locale,
strlen(locale->db_locale));
else
- datlocale_literal = pg_strdup("NULL");
+ datlocale_literal = PQescapeLiteral(conn_new_template1,
+ "NULL",
+ strlen("NULL"));
best regards,
Ranier Vilela
Attachment
Hi, On 2025-02-12 19:08:31 -0500, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > I have looked at bit at the uses of PQescapeLiteral() and > > PQescapeIdentifier() in the tree. On top of the one in pg_amcheck you > > are just pointing to, there is an inconsistency in pg_upgrade.c for > > set_locale_and_encoding() where datlocale_literal may be allocated > > with a pg_strdup() or a PQescapeLiteral() depending on the path. The > > code has been using PQfreemem() for the pg_strdup() allocation, which > > is logically incorrect. > > Yeah, I suspected there would be places like that. It just hasn't > mattered in practice up to now. (I have a vague recollection that > Windows used to be pickier about this, but evidently not in recent > years.) It's a question of compiler / link options. If you e.g. have a debug psql runtime linked against a non-debug libpq, the allocators for both will differ. Or if you link an application statically while a library is built for a shared C runtime. Or a few other such things. But when building in the BF / CI we'll not typically encounter this issue between libraries and binaries in our source tree, because they'll all be built the same. However, we have more recently hit this with external libraries we link to. While it's not recommended, we commonly build (in BF/CI) a debug postgres against production openssl, zstd etc. Which means they don't share allocators, file descriptors etc. Greetings, Andres Freund
On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote: > @@ -455,7 +455,9 @@ set_locale_and_encoding(void) > locale->db_locale, > strlen(locale->db_locale)); > else > - datlocale_literal = pg_strdup("NULL"); > + datlocale_literal = PQescapeLiteral(conn_new_template1, > + "NULL", > + strlen("NULL")); Yeah, I've considered that but hardcoding NULL twice felt a bit weird, as well. Perhaps it's slightly cleaner to use an intermediate variable given then as an argument of PQescapeLiteral()? -- Michael
Attachment
Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier <michael@paquier.xyz> escreveu:
On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> locale->db_locale,
> strlen(locale->db_locale));
> else
> - datlocale_literal = pg_strdup("NULL");
> + datlocale_literal = PQescapeLiteral(conn_new_template1,
> + "NULL",
> + strlen("NULL"));
Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
as well. Perhaps it's slightly cleaner to use an intermediate
variable given then as an argument of PQescapeLiteral()?
Yeah, I also think it would look good like this.
v1 attached.
best regards,
Ranier Vilela
Attachment
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote: > Yeah, I also think it would look good like this. It's the least confusing option, indeed. I've reduced a bit the diffs and done that down to v16 for the pg_upgrade part where this exists. Double-checking the tree, it does not seem that we have simolar holes now.. I hope that I'm not wrong. -- Michael
Attachment
Em qui., 27 de fev. de 2025 às 22:19, Michael Paquier <michael@paquier.xyz> escreveu:
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> Yeah, I also think it would look good like this.
It's the least confusing option, indeed. I've reduced a bit the diffs
and done that down to v16 for the pg_upgrade part where this exists.
Thanks Michael.
Double-checking the tree, it does not seem that we have simolar holes
now.. I hope that I'm not wrong.
I think so.
best regards,
Ranier Vilela
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote: > Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier <michael@paquier.xyz> > escreveu: > > > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote: > > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void) > > > locale->db_locale, > > > strlen(locale->db_locale)); > > > else > > > - datlocale_literal = pg_strdup("NULL"); > > > + datlocale_literal = PQescapeLiteral(conn_new_template1, > > > + "NULL", > > > + strlen("NULL")); > > > > Yeah, I've considered that but hardcoding NULL twice felt a bit weird, > > as well. Perhaps it's slightly cleaner to use an intermediate > > variable given then as an argument of PQescapeLiteral()? > > > Yeah, I also think it would look good like this. This became commit 2a083ab "pg_upgrade: Fix inconsistency in memory freeing". PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale to contain datlocale='NULL'::text instead of datlocale IS NULL.
Em ter., 1 de abr. de 2025 às 15:39, Noah Misch <noah@leadboat.com> escreveu:
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier <michael@paquier.xyz>
> escreveu:
>
> > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> > > locale->db_locale,
> > > strlen(locale->db_locale));
> > > else
> > > - datlocale_literal = pg_strdup("NULL");
> > > + datlocale_literal = PQescapeLiteral(conn_new_template1,
> > > + "NULL",
> > > + strlen("NULL"));
> >
> > Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
> > as well. Perhaps it's slightly cleaner to use an intermediate
> > variable given then as an argument of PQescapeLiteral()?
> >
> Yeah, I also think it would look good like this.
This became commit 2a083ab "pg_upgrade: Fix inconsistency in memory freeing".
PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale to
contain datlocale='NULL'::text instead of datlocale IS NULL.
Yeah, thanks for pointing this out.
The patch maintained the previous behavior.
I'll take a look.
I'll take a look.
best regards,
Ranier Vilela
Em ter., 1 de abr. de 2025 às 15:39, Noah Misch <noah@leadboat.com> escreveu:
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier <michael@paquier.xyz>
> escreveu:
>
> > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote:
> > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void)
> > > locale->db_locale,
> > > strlen(locale->db_locale));
> > > else
> > > - datlocale_literal = pg_strdup("NULL");
> > > + datlocale_literal = PQescapeLiteral(conn_new_template1,
> > > + "NULL",
> > > + strlen("NULL"));
> >
> > Yeah, I've considered that but hardcoding NULL twice felt a bit weird,
> > as well. Perhaps it's slightly cleaner to use an intermediate
> > variable given then as an argument of PQescapeLiteral()?
> >
> Yeah, I also think it would look good like this.
This became commit 2a083ab "pg_upgrade: Fix inconsistency in memory freeing".
PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale to
contain datlocale='NULL'::text instead of datlocale IS NULL.
I believe the intention of the query is:
For example:
UPDATE pg_catalog.pg_database
SET encoding = 6,
datlocprovider = 'c',
datlocale = NULL
WHERE datname = 'template0';
For example:
UPDATE pg_catalog.pg_database
SET encoding = 6,
datlocprovider = 'c',
datlocale = NULL
WHERE datname = 'template0';
Where datlocale with NULL is correct no?
Because:
UPDATE pg_catalog.pg_database
SET encoding = 6,
datlocprovider = 'c',
datlocale IS NULL
WHERE datname = 'template0';
SET encoding = 6,
datlocprovider = 'c',
datlocale IS NULL
WHERE datname = 'template0';
ERROR: syntax error at or near "IS"
LINE 4: datlocale IS NULL
LINE 4: datlocale IS NULL
best regards,
Ranier Vilela
On Tue, Apr 01, 2025 at 04:28:34PM -0300, Ranier Vilela wrote: > Em ter., 1 de abr. de 2025 às 15:39, Noah Misch <noah@leadboat.com> > escreveu: > > > On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote: > > > Em qui., 27 de fev. de 2025 às 02:51, Michael Paquier < > > michael@paquier.xyz> > > > escreveu: > > > > > > > On Tue, Feb 25, 2025 at 08:54:31AM -0300, Ranier Vilela wrote: > > > > > @@ -455,7 +455,9 @@ set_locale_and_encoding(void) > > > > > locale->db_locale, > > > > > strlen(locale->db_locale)); > > > > > else > > > > > - datlocale_literal = pg_strdup("NULL"); > > > > > + datlocale_literal = PQescapeLiteral(conn_new_template1, > > > > > + "NULL", > > > > > + strlen("NULL")); > > This became 2a083ab "pg_upgrade: Fix inconsistency in memory > > freeing". > > PQescapeLiteral("NULL") is "'NULL'", so this causes pg_database.datlocale > > to > > contain datlocale='NULL'::text instead of datlocale IS NULL. > > > I believe the intention of the query is: > For example: > UPDATE pg_catalog.pg_database > SET encoding = 6, > datlocprovider = 'c', > datlocale = NULL > WHERE datname = 'template0'; > > Where datlocale with NULL is correct no? Right. Commit 2a083ab changed it to: UPDATE pg_catalog.pg_database SET encoding = 6, datlocprovider = 'c', datlocale = 'NULL' WHERE datname = 'template0'; > Because: > UPDATE pg_catalog.pg_database > SET encoding = 6, > datlocprovider = 'c', > datlocale IS NULL > WHERE datname = 'template0'; > > ERROR: syntax error at or near "IS" > LINE 4: datlocale IS NULL Yes, pg_upgrade should not do that. I was trying to explain the difference between NULL and 'NULL'. I didn't mean pg_upgrade should emit "IS NULL".