Re: Text <-> C string - Mailing list pgsql-hackers

From Brendan Jurd
Subject Re: Text <-> C string
Date
Msg-id 37ed240d0803251033h60fe993bp694dd4ff657f442a@mail.gmail.com
Whole thread Raw
In response to Re: Text <-> C string  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Text <-> C string
List pgsql-hackers
On 26/03/2008, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  Brendan's patch also included "cstring_text_limit(const char *s, int len)"
>  which was defined as copying Min(len, strlen(s)) bytes.  I didn't find
>  this to be particularly useful.  In the first place, all potential
>  callers are passing the exact desired length, so the strlen() call is
>  just a waste of cycles.  In the second place, at least some callers pass
>  text that is not embedded in a known-to-be-null-terminated string (it
>  could be a section of a text datum instead); which means there is a
>  nonzero chance of the strlen running off the end of memory and dumping
>  core.  So I propose instead
>
>  extern text *cstring_to_text_with_len(const char *s, int len);
>

That all makes sense to me.  I think the new name is good.  It's
pretty long, but I'm not seeing a shorter name that accurately
describes the function.

>  which just takes the given length as gospel.  Brendan had also proposed
>  "text_to_cstring_limit(const text *t, int len)" with similar Min()
>  semantics, but what this was doing was replacing copies into
>  limited-size local buffers with a palloc.  If we did that we might
>  as well just use text_to_cstring.  What I think is more useful is
>  a strlcpy()-like function that copies into a caller-supplied buffer
>  of limited size.  For lack of a better idea I propose defining it
>  *exactly* like strlcpy:
>
>  extern size_t textlcpy(char *dst, const text *src, size_t siz);
>

I'm all for providing a function with this behaviour, but is
textlcpy() a bit ambiguous?  It's not clear from the name whether the
function copies text -> text, text -> cstring or cstring -> text.  In
fact, if I didn't already know better I'd probably assume that the
function copied text -> text with length, in the same way strlcpy
copies string -> string.

A text_to_cstring_with_len() or text_to_cstring_limit() might be more
to the point, and more consistent with the other functions in the
family.

On the other hand, maybe some difference in naming would help make it
obvious to callers that, unlike its siblings, textlcpy() takes the
destination string as an argument rather than returning it.
text_to_cstring_lcpy()?

>  I've also found that there are lots and lots of places where the
>  text end of the conversion needs to be a Datum not a text *,
>  so it seems worthwhile to introduce a couple of macros to minimize
>  notation in that case:
>
>  #define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
>  #define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d))
>

Yes, I recall coming across a number of sites where these macros would
come in handy.

>  Lastly, the originally submitted text-to-something functions would
>  work correctly on plain and 1-byte-header datums, but not on
>  compressed or toasted-out-of-line datums.  There are a whole lot of
>  places where that's not good enough.  Rather than expecting the caller
>  to use the right detoasting macro everywhere, it seems best to make
>  these functions cope with any variant.  That also avoids memory
>  leakage by allowing the intermediate copy to be pfree'd.  (I had
>  suggested that the pfree might be pointless, but I reconsidered ---
>  if the text object is large enough to be compressed or toasted,
>  we're talking about at least several K, so it's worth not leaking.)
>

Excellent.  My patch didn't contemplate dealing with
compressed/toasted datums because, quite frankly, I didn't know *how*
to deal with them correctly.  Much to learn about varlenas, I still
have.

Cheers,
BJ


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: partial dump of patch queue to wiki
Next
From: Tom Lane
Date:
Subject: Re: Text <-> C string