Thread: Replace remaining StrNCpy() by strlcpy()
I propose to replace the remaining uses of StrNCpy() with strlcpy() and remove the former. It's clear that strlcpy() has won the popularity contest, and the presence of the former is just confusing now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, 3 Aug 2020 at 18:59, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I propose to replace the remaining uses of StrNCpy() with strlcpy() and > remove the former. It's clear that strlcpy() has won the popularity > contest, and the presence of the former is just confusing now. It certainly would be good to get rid of some of these, but are some of the changes not a bit questionable? e.g: @@ -4367,7 +4367,7 @@ pgstat_send_archiver(const char *xlog, bool failed) */ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ARCHIVER); msg.m_failed = failed; - StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog)); + strlcpy(msg.m_xlog, xlog, sizeof(msg.m_xlog)); msg.m_timestamp = GetCurrentTimestamp(); pgstat_send(&msg, sizeof(msg)); Will mean that we'll now no longer zero the full length of the m_xlog field after the end of the string. Won't that mean we'll start writing junk bytes to the stats collector? David
David Rowley <dgrowleyml@gmail.com> writes: > - StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog)); > + strlcpy(msg.m_xlog, xlog, sizeof(msg.m_xlog)); > Will mean that we'll now no longer zero the full length of the m_xlog > field after the end of the string. Won't that mean we'll start writing > junk bytes to the stats collector? StrNCpy doesn't zero-fill the destination today either (except for the very last byte). If you need that, you need to memset the dest buffer ahead of time. I didn't review the patch in complete detail, but the principle seems sound to me, and strlcpy is surely more standard than StrNCpy. regards, tom lane
I wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> Will mean that we'll now no longer zero the full length of the m_xlog >> field after the end of the string. Won't that mean we'll start writing >> junk bytes to the stats collector? > StrNCpy doesn't zero-fill the destination today either (except for > the very last byte). Oh, no, I take that back --- didn't read all of the strncpy man page :-(. Yeah, this is a point. We'd need to check each call site to see whether the zero-padding matters. In the specific case of the stats collector, if you don't want to be sending junk bytes then you'd better be memset'ing the whole message buffer not just this string field. So I'm not sure that the argument has any force there. But in places like namecpy() and namestrcpy() we absolutely do mean to be zeroing the whole destination buffer. memset plus strlcpy might still be preferable to StrNCpy for readability by people new to Postgres; but it's less of a slam dunk than I thought. regards, tom lane
On 2020-08-03 14:12, Tom Lane wrote: > I wrote: >> David Rowley <dgrowleyml@gmail.com> writes: >>> Will mean that we'll now no longer zero the full length of the m_xlog >>> field after the end of the string. Won't that mean we'll start writing >>> junk bytes to the stats collector? > >> StrNCpy doesn't zero-fill the destination today either (except for >> the very last byte). > > Oh, no, I take that back --- didn't read all of the strncpy man > page :-(. Yeah, this is a point. We'd need to check each call > site to see whether the zero-padding matters. Oh, that's easy to miss. > In the specific case of the stats collector, if you don't want > to be sending junk bytes then you'd better be memset'ing the > whole message buffer not just this string field. So I'm not > sure that the argument has any force there. But in places > like namecpy() and namestrcpy() we absolutely do mean to be > zeroing the whole destination buffer. That's easy to fix, but it's perhaps wondering briefly why it needs to be zero-padded. hashname() doesn't care, heap_form_tuple() doesn't care. Does anything care? While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff like namein()? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2020-08-03 14:12, Tom Lane wrote: >> In the specific case of the stats collector, if you don't want >> to be sending junk bytes then you'd better be memset'ing the >> whole message buffer not just this string field. So I'm not >> sure that the argument has any force there. But in places >> like namecpy() and namestrcpy() we absolutely do mean to be >> zeroing the whole destination buffer. > That's easy to fix, but it's perhaps wondering briefly why it needs to > be zero-padded. hashname() doesn't care, heap_form_tuple() doesn't > care. Does anything care? We do have an expectation that there are no undefined bytes in values to be stored on-disk. There's even some code in coerce_type() that will complain about this: * For pass-by-reference data types, repeat the conversion to see if * the input function leaves any uninitialized bytes in the result. We * can only detect that reliably if RANDOMIZE_ALLOCATED_MEMORY is * enabled, so we don't bother testing otherwise. The reason we don't * want any instability in the input function is that comparison of * Const nodes relies on bytewise comparison of the datums, so if the * input function leaves garbage then subexpressions that should be * identical may not get recognized as such. See pgsql-hackers * discussion of 2008-04-04. > While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff > like namein()? Excellent point --- probably so, unless the callers are all truncating in advance, which I doubt. regards, tom lane
On 2020-08-03 19:39, Tom Lane wrote: >> That's easy to fix, but it's perhaps wondering briefly why it needs to >> be zero-padded. hashname() doesn't care, heap_form_tuple() doesn't >> care. Does anything care? > > We do have an expectation that there are no undefined bytes in values to > be stored on-disk. There's even some code in coerce_type() that will > complain about this: Okay, here is a new patch with improved implementations of namecpy() and namestrcpy(). I didn't see any other places that relied on the zero-filling behavior of strncpy(). >> While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff >> like namein()? > > Excellent point --- probably so, unless the callers are all truncating > in advance, which I doubt. I will look into that separately. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Okay, here is a new patch with improved implementations of namecpy() and > namestrcpy(). I didn't see any other places that relied on the > zero-filling behavior of strncpy(). I've looked through this patch, and I concur with your conclusion that noplace else is depending on zero-fill, with the exception of the one place in pgstat.c that David already noted. But the issue there is only that valgrind might bitch about send()'ing undefined bytes, and ISTM that the existing suppressions in our valgrind.supp should already handle it, since we already have other pgstat messages with pad bytes. However I do see one remaining nit to pick, in CreateInitDecodingContext: /* register output plugin name with slot */ SpinLockAcquire(&slot->mutex); - StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN); + namestrcpy(&slot->data.plugin, plugin); SpinLockRelease(&slot->mutex); This is already a pro-forma violation of our rule about "only straight-line code inside a spinlock". Now I'm not terribly concerned about that right now, and the patch as it stands is only changing things cosmetically. But if you modify namestrcpy to do pg_mbcliplen then all of a sudden there is a whole lot of code that could get reached within the spinlock, and I'm not a bit happy about that prospect. The least-risk fix would be to namestrcpy() into a local variable and then just use a plain memcpy() inside the spinlock. There might be better ways if we're willing to make assumptions about what the plugin name can be. For that matter, do we really need a spinlock here at all? Why is the plugin name critical but the rest of the slot not? BTW, while we're here I think we ought to change namecpy and namestrcpy to return void (no caller checks their results) and drop their checks for null-pointer inputs. AFAICS a null pointer would be a caller bug in every case, and if it isn't, why is failing to initialize the destination an OK outcome? I find the provisions for nulls in namestrcmp pretty useless too, although it looks like at least some thought has been spent there. I think this is committable once these points are addressed. regards, tom lane
On Tue, 4 Aug 2020 at 00:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > >> Will mean that we'll now no longer zero the full length of the m_xlog > >> field after the end of the string. Won't that mean we'll start writing > >> junk bytes to the stats collector? > > > StrNCpy doesn't zero-fill the destination today either (except for > > the very last byte). > > Oh, no, I take that back --- didn't read all of the strncpy man > page :-(. Yeah, this is a point. We'd need to check each call > site to see whether the zero-padding matters. I just had a thought that even strlcpy() is not really ideal for many of our purposes for it. Currently we still have cruddy code like: strlcpy(fullname, pg_TZDIR(), sizeof(fullname)); if (strlen(fullname) + 1 + strlen(name) >= MAXPGPATH) return -1; /* not gonna fit */ strcat(fullname, "/"); strcat(fullname, name); If strlcpy() had been designed differently to take a signed size and do nothing when the size is <= 0 then we could have had much cleaner implementations of the above: size_t len; len = strlcpy(fullname, pg_TZDIR(), sizeof(fullname)); len += strlcpy(fullname + len, "/", sizeof(fullname) - len); len += strlcpy(fullname + len, name, sizeof(fullname) - len); if (len >= sizeof(fullname)) return -1; /* didn't fit */ This should be much more efficient, in general, due to the lack of strlen() calls and the concatenation not having to refind the end of the string again each time. Now, I'm not saying we should change strlcpy() to take a signed type instead of size_t to allow this to work. Reusing that name for another purpose is likely a bad idea that will lead to misuse and confusion. What I am saying is that perhaps strlcpy() is not all that it's cracked up to be and it could have been done better. Perhaps we can invent our own version that fixes the shortcomings. Just a thought. On the other hand, perhaps we're not using the return value of strlcpy() enough for such a change to make sense. However, a quick glance shows we certainly could use it more often, e.g: if (parsed->xinfo & XACT_XINFO_HAS_GID) { strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid)); data += strlen(data) + 1; } David
On 2020-08-05 17:49, Tom Lane wrote: > However I do see one remaining nit to pick, in CreateInitDecodingContext: > > /* register output plugin name with slot */ > SpinLockAcquire(&slot->mutex); > - StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN); > + namestrcpy(&slot->data.plugin, plugin); > SpinLockRelease(&slot->mutex); > > This is already a pro-forma violation of our rule about "only > straight-line code inside a spinlock". Now I'm not terribly concerned > about that right now, and the patch as it stands is only changing things > cosmetically. But if you modify namestrcpy to do pg_mbcliplen then all > of a sudden there is a whole lot of code that could get reached within > the spinlock, and I'm not a bit happy about that prospect. fixed > BTW, while we're here I think we ought to change namecpy and namestrcpy > to return void (no caller checks their results) and drop their checks > for null-pointer inputs. AFAICS a null pointer would be a caller bug in > every case, and if it isn't, why is failing to initialize the > destination an OK outcome? I find the provisions for nulls in namestrcmp > pretty useless too, although it looks like at least some thought has > been spent there. fixed I removed namecpy() altogether because you can just use struct assignment. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I removed namecpy() altogether because you can just use struct assignment. Makes sense, and I notice it was unused anyway. v3 passes eyeball examination (I didn't bother running tests), with only one remaining nit: the proposed commit message says They are equivalent, which per this thread is incorrect. Somebody might possibly refer to this commit for guidance in updating third-party code, so I don't think we want to leave a misleading claim here. Perhaps something like They are equivalent, except that StrNCpy zero-fills the entire destination buffer instead of providing just one trailing zero. For all but a tiny number of callers, that's just overhead rather than being desirable. regards, tom lane
On 2020-08-08 18:09, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> I removed namecpy() altogether because you can just use struct assignment. > > Makes sense, and I notice it was unused anyway. > > v3 passes eyeball examination (I didn't bother running tests), with > only one remaining nit: the proposed commit message says > > They are equivalent, > > which per this thread is incorrect. Somebody might possibly refer to this > commit for guidance in updating third-party code, so I don't think we want > to leave a misleading claim here. Perhaps something like > > They are equivalent, except that StrNCpy zero-fills the entire > destination buffer instead of providing just one trailing zero. > For all but a tiny number of callers, that's just overhead rather > than being desirable. Committed with that change. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services