Thread: Replace remaining StrNCpy() by strlcpy()

Replace remaining StrNCpy() by strlcpy()

From
Peter Eisentraut
Date:
I propose to replace the remaining uses of StrNCpy() with strlcpy() and 
remove the former.  It's clear that strlcpy() has won the popularity 
contest, and the presence of the former is just confusing now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Replace remaining StrNCpy() by strlcpy()

From
David Rowley
Date:
On Mon, 3 Aug 2020 at 18:59, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I propose to replace the remaining uses of StrNCpy() with strlcpy() and
> remove the former.  It's clear that strlcpy() has won the popularity
> contest, and the presence of the former is just confusing now.

It certainly would be good to get rid of some of these, but are some
of the changes not a bit questionable?

e.g:

@@ -4367,7 +4367,7 @@ pgstat_send_archiver(const char *xlog, bool failed)
  */
  pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ARCHIVER);
  msg.m_failed = failed;
- StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
+ strlcpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
  msg.m_timestamp = GetCurrentTimestamp();
  pgstat_send(&msg, sizeof(msg));

Will mean that we'll now no longer zero the full length of the m_xlog
field after the end of the string. Won't that mean we'll start writing
junk bytes to the stats collector?

David



Re: Replace remaining StrNCpy() by strlcpy()

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> - StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
> + strlcpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));

> Will mean that we'll now no longer zero the full length of the m_xlog
> field after the end of the string. Won't that mean we'll start writing
> junk bytes to the stats collector?

StrNCpy doesn't zero-fill the destination today either (except for
the very last byte).  If you need that, you need to memset the
dest buffer ahead of time.

I didn't review the patch in complete detail, but the principle
seems sound to me, and strlcpy is surely more standard than StrNCpy.

            regards, tom lane



Re: Replace remaining StrNCpy() by strlcpy()

From
Tom Lane
Date:
I wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> Will mean that we'll now no longer zero the full length of the m_xlog
>> field after the end of the string. Won't that mean we'll start writing
>> junk bytes to the stats collector?

> StrNCpy doesn't zero-fill the destination today either (except for
> the very last byte).

Oh, no, I take that back --- didn't read all of the strncpy man
page :-(.  Yeah, this is a point.  We'd need to check each call
site to see whether the zero-padding matters.

In the specific case of the stats collector, if you don't want
to be sending junk bytes then you'd better be memset'ing the
whole message buffer not just this string field.  So I'm not
sure that the argument has any force there.  But in places
like namecpy() and namestrcpy() we absolutely do mean to be
zeroing the whole destination buffer.

memset plus strlcpy might still be preferable to StrNCpy for
readability by people new to Postgres; but it's less of a
slam dunk than I thought.

            regards, tom lane



Re: Replace remaining StrNCpy() by strlcpy()

From
Peter Eisentraut
Date:
On 2020-08-03 14:12, Tom Lane wrote:
> I wrote:
>> David Rowley <dgrowleyml@gmail.com> writes:
>>> Will mean that we'll now no longer zero the full length of the m_xlog
>>> field after the end of the string. Won't that mean we'll start writing
>>> junk bytes to the stats collector?
> 
>> StrNCpy doesn't zero-fill the destination today either (except for
>> the very last byte).
> 
> Oh, no, I take that back --- didn't read all of the strncpy man
> page :-(.  Yeah, this is a point.  We'd need to check each call
> site to see whether the zero-padding matters.

Oh, that's easy to miss.

> In the specific case of the stats collector, if you don't want
> to be sending junk bytes then you'd better be memset'ing the
> whole message buffer not just this string field.  So I'm not
> sure that the argument has any force there.  But in places
> like namecpy() and namestrcpy() we absolutely do mean to be
> zeroing the whole destination buffer.

That's easy to fix, but it's perhaps wondering briefly why it needs to 
be zero-padded.  hashname() doesn't care, heap_form_tuple() doesn't 
care.  Does anything care?

While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff 
like namein()?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Replace remaining StrNCpy() by strlcpy()

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2020-08-03 14:12, Tom Lane wrote:
>> In the specific case of the stats collector, if you don't want
>> to be sending junk bytes then you'd better be memset'ing the
>> whole message buffer not just this string field.  So I'm not
>> sure that the argument has any force there.  But in places
>> like namecpy() and namestrcpy() we absolutely do mean to be
>> zeroing the whole destination buffer.

> That's easy to fix, but it's perhaps wondering briefly why it needs to 
> be zero-padded.  hashname() doesn't care, heap_form_tuple() doesn't 
> care.  Does anything care?

We do have an expectation that there are no undefined bytes in values to
be stored on-disk.  There's even some code in coerce_type() that will
complain about this:

         * For pass-by-reference data types, repeat the conversion to see if
         * the input function leaves any uninitialized bytes in the result. We
         * can only detect that reliably if RANDOMIZE_ALLOCATED_MEMORY is
         * enabled, so we don't bother testing otherwise.  The reason we don't
         * want any instability in the input function is that comparison of
         * Const nodes relies on bytewise comparison of the datums, so if the
         * input function leaves garbage then subexpressions that should be
         * identical may not get recognized as such.  See pgsql-hackers
         * discussion of 2008-04-04.

> While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff 
> like namein()?

Excellent point --- probably so, unless the callers are all truncating
in advance, which I doubt.

            regards, tom lane



Re: Replace remaining StrNCpy() by strlcpy()

From
Peter Eisentraut
Date:
On 2020-08-03 19:39, Tom Lane wrote:
>> That's easy to fix, but it's perhaps wondering briefly why it needs to
>> be zero-padded.  hashname() doesn't care, heap_form_tuple() doesn't
>> care.  Does anything care?
> 
> We do have an expectation that there are no undefined bytes in values to
> be stored on-disk.  There's even some code in coerce_type() that will
> complain about this:

Okay, here is a new patch with improved implementations of namecpy() and 
namestrcpy().  I didn't see any other places that relied on the 
zero-filling behavior of strncpy().

>> While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff
>> like namein()?
> 
> Excellent point --- probably so, unless the callers are all truncating
> in advance, which I doubt.

I will look into that separately.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Replace remaining StrNCpy() by strlcpy()

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Okay, here is a new patch with improved implementations of namecpy() and 
> namestrcpy().  I didn't see any other places that relied on the 
> zero-filling behavior of strncpy().

I've looked through this patch, and I concur with your conclusion that
noplace else is depending on zero-fill, with the exception of the one
place in pgstat.c that David already noted.  But the issue there is only
that valgrind might bitch about send()'ing undefined bytes, and ISTM
that the existing suppressions in our valgrind.supp should already
handle it, since we already have other pgstat messages with pad bytes.

However I do see one remaining nit to pick, in CreateInitDecodingContext:
 
     /* register output plugin name with slot */
     SpinLockAcquire(&slot->mutex);
-    StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN);
+    namestrcpy(&slot->data.plugin, plugin);
     SpinLockRelease(&slot->mutex);

This is already a pro-forma violation of our rule about "only
straight-line code inside a spinlock".  Now I'm not terribly concerned
about that right now, and the patch as it stands is only changing things
cosmetically.  But if you modify namestrcpy to do pg_mbcliplen then all
of a sudden there is a whole lot of code that could get reached within
the spinlock, and I'm not a bit happy about that prospect.

The least-risk fix would be to namestrcpy() into a local variable
and then just use a plain memcpy() inside the spinlock.  There might
be better ways if we're willing to make assumptions about what the
plugin name can be.  For that matter, do we really need a spinlock
here at all?  Why is the plugin name critical but the rest of the
slot not?

BTW, while we're here I think we ought to change namecpy and namestrcpy
to return void (no caller checks their results) and drop their checks
for null-pointer inputs.  AFAICS a null pointer would be a caller bug in
every case, and if it isn't, why is failing to initialize the
destination an OK outcome?  I find the provisions for nulls in namestrcmp
pretty useless too, although it looks like at least some thought has
been spent there.

I think this is committable once these points are addressed.

            regards, tom lane



Re: Replace remaining StrNCpy() by strlcpy()

From
David Rowley
Date:
On Tue, 4 Aug 2020 at 00:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > David Rowley <dgrowleyml@gmail.com> writes:
> >> Will mean that we'll now no longer zero the full length of the m_xlog
> >> field after the end of the string. Won't that mean we'll start writing
> >> junk bytes to the stats collector?
>
> > StrNCpy doesn't zero-fill the destination today either (except for
> > the very last byte).
>
> Oh, no, I take that back --- didn't read all of the strncpy man
> page :-(.  Yeah, this is a point.  We'd need to check each call
> site to see whether the zero-padding matters.

I just had a thought that even strlcpy() is not really ideal for many
of our purposes for it.

Currently we still have cruddy code like:

strlcpy(fullname, pg_TZDIR(), sizeof(fullname));
if (strlen(fullname) + 1 + strlen(name) >= MAXPGPATH)
return -1; /* not gonna fit */
strcat(fullname, "/");
strcat(fullname, name);

If strlcpy() had been designed differently to take a signed size and
do nothing when the size is <= 0 then we could have had much cleaner
implementations of the above:

size_t len;
len = strlcpy(fullname, pg_TZDIR(), sizeof(fullname));
len += strlcpy(fullname + len, "/", sizeof(fullname) - len);
len += strlcpy(fullname + len, name, sizeof(fullname) - len);
if (len >= sizeof(fullname))
return -1; /* didn't fit */

This should be much more efficient, in general, due to the lack of
strlen() calls and the concatenation not having to refind the end of
the string again each time.

Now, I'm not saying we should change strlcpy() to take a signed type
instead of size_t to allow this to work. Reusing that name for another
purpose is likely a bad idea that will lead to misuse and confusion.
What I am saying is that perhaps strlcpy() is not all that it's
cracked up to be and it could have been done better.  Perhaps we can
invent our own version that fixes the shortcomings.

Just a thought.

On the other hand, perhaps we're not using the return value of
strlcpy() enough for such a change to make sense. However, a quick
glance shows we certainly could use it more often, e.g:

if (parsed->xinfo & XACT_XINFO_HAS_GID)
{
strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid));
data += strlen(data) + 1;
}

David



Re: Replace remaining StrNCpy() by strlcpy()

From
Peter Eisentraut
Date:
On 2020-08-05 17:49, Tom Lane wrote:
> However I do see one remaining nit to pick, in CreateInitDecodingContext:
>   
>       /* register output plugin name with slot */
>       SpinLockAcquire(&slot->mutex);
> -    StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN);
> +    namestrcpy(&slot->data.plugin, plugin);
>       SpinLockRelease(&slot->mutex);
> 
> This is already a pro-forma violation of our rule about "only
> straight-line code inside a spinlock".  Now I'm not terribly concerned
> about that right now, and the patch as it stands is only changing things
> cosmetically.  But if you modify namestrcpy to do pg_mbcliplen then all
> of a sudden there is a whole lot of code that could get reached within
> the spinlock, and I'm not a bit happy about that prospect.

fixed

> BTW, while we're here I think we ought to change namecpy and namestrcpy
> to return void (no caller checks their results) and drop their checks
> for null-pointer inputs.  AFAICS a null pointer would be a caller bug in
> every case, and if it isn't, why is failing to initialize the
> destination an OK outcome?  I find the provisions for nulls in namestrcmp
> pretty useless too, although it looks like at least some thought has
> been spent there.

fixed

I removed namecpy() altogether because you can just use struct assignment.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Replace remaining StrNCpy() by strlcpy()

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I removed namecpy() altogether because you can just use struct assignment.

Makes sense, and I notice it was unused anyway.

v3 passes eyeball examination (I didn't bother running tests), with
only one remaining nit: the proposed commit message says

    They are equivalent,

which per this thread is incorrect.  Somebody might possibly refer to this
commit for guidance in updating third-party code, so I don't think we want
to leave a misleading claim here.  Perhaps something like

    They are equivalent, except that StrNCpy zero-fills the entire
    destination buffer instead of providing just one trailing zero.
    For all but a tiny number of callers, that's just overhead rather
    than being desirable.

            regards, tom lane



Re: Replace remaining StrNCpy() by strlcpy()

From
Peter Eisentraut
Date:
On 2020-08-08 18:09, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> I removed namecpy() altogether because you can just use struct assignment.
> 
> Makes sense, and I notice it was unused anyway.
> 
> v3 passes eyeball examination (I didn't bother running tests), with
> only one remaining nit: the proposed commit message says
> 
>     They are equivalent,
> 
> which per this thread is incorrect.  Somebody might possibly refer to this
> commit for guidance in updating third-party code, so I don't think we want
> to leave a misleading claim here.  Perhaps something like
> 
>     They are equivalent, except that StrNCpy zero-fills the entire
>     destination buffer instead of providing just one trailing zero.
>     For all but a tiny number of callers, that's just overhead rather
>     than being desirable.

Committed with that change.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services