Re: Text <-> C string - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Text <-> C string |
Date | |
Msg-id | 23752.1206462883@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Text <-> C string (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Text <-> C string
Re: Text <-> C string |
List | pgsql-hackers |
I've been working some more on Brendan Jurd's patch to simplify text <-> C string conversions. It seems we have consensus on the names for the base operations: extern text *cstring_to_text(const char *s); extern char *text_to_cstring(const text *t); Brendan's patch also included "cstring_text_limit(const char *s, int len)" which was defined as copying Min(len, strlen(s)) bytes. I didn't find this to be particularly useful. In the first place, all potential callers are passing the exact desired length, so the strlen() call is just a waste of cycles. In the second place, at least some callers pass text that is not embedded in a known-to-be-null-terminated string (it could be a section of a text datum instead); which means there is a nonzero chance of the strlen running off the end of memory and dumping core. So I propose instead extern text *cstring_to_text_with_len(const char *s, int len); which just takes the given length as gospel. Brendan had also proposed "text_to_cstring_limit(const text *t, int len)" with similar Min() semantics, but what this was doing was replacing copies into limited-size local buffers with a palloc. If we did that we might as well just use text_to_cstring. What I think is more useful is a strlcpy()-like function that copies into a caller-supplied buffer of limited size. For lack of a better idea I propose defining it *exactly* like strlcpy: extern size_t textlcpy(char *dst, const text *src, size_t siz); I've also found that there are lots and lots of places where the text end of the conversion needs to be a Datum not a text *, so it seems worthwhile to introduce a couple of macros to minimize notation in that case: #define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s)) #define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d)) Lastly, the originally submitted text-to-something functions would work correctly on plain and 1-byte-header datums, but not on compressed or toasted-out-of-line datums. There are a whole lot of places where that's not good enough. Rather than expecting the caller to use the right detoasting macro everywhere, it seems best to make these functions cope with any variant. That also avoids memory leakage by allowing the intermediate copy to be pfree'd. (I had suggested that the pfree might be pointless, but I reconsidered --- if the text object is large enough to be compressed or toasted, we're talking about at least several K, so it's worth not leaking.) In short, the infrastructure I'm currently testing is the above definitions with the attached implementation. Last call for objections ... regards, tom lane /** cstring_to_text** Create a text value from a null-terminated C string.** The new text value is freshly palloc'd witha full-size VARHDR.*/ text * cstring_to_text(const char *s) {return cstring_to_text_with_len(s, strlen(s)); } /** cstring_to_text_with_len** Same as cstring_to_text except the caller specifies the string length;* the string need notbe null_terminated.*/ text * cstring_to_text_with_len(const char *s, int len) {text *result = (text *) palloc(len + VARHDRSZ); SET_VARSIZE(result, len + VARHDRSZ);memcpy(VARDATA(result), s, len); return result; } /** text_to_cstring** Create a palloc'd, null-terminated C string from a text value.** We support being passed a compressedor toasted text value.* This is a bit bogus since such values shouldn't really be referred to as* "text *", butit seems useful for robustness. If we didn't handle that* case here, we'd need another routine that did, anyway.*/ char * text_to_cstring(const text *t) {char *result;text *tunpacked = pg_detoast_datum_packed((struct varlena *) t);int len = VARSIZE_ANY_EXHDR(tunpacked); result = (char *) palloc(len + 1);memcpy(result, VARDATA_ANY(tunpacked), len);result[len] = '\0'; if (tunpacked != t) pfree(tunpacked);return result; } /** textlcpy --- exactly like strlcpy(), except source is a text value.** Copy src to string dst of size siz. At mostsiz-1 characters* will be copied. Always NUL terminates (unless siz == 0).* Returns strlen(src); if retval >= siz,truncation occurred.** We support being passed a compressed or toasted text value.* This is a bit bogus since such valuesshouldn't really be referred to as* "text *", but it seems useful for robustness. If we didn't handle that* case here,we'd need another routine that did, anyway.*/ size_t textlcpy(char *dst, const text *src, size_t siz) {text *srcunpacked = pg_detoast_datum_packed((struct varlena *) src);size_t srclen = VARSIZE_ANY_EXHDR(srcunpacked); if (siz > 0){ siz--; if (siz >= srclen) siz = srclen; else /* ensure truncation is encoding-safe*/ siz = pg_mbcliplen(VARDATA_ANY(srcunpacked), srclen, siz); memcpy(dst, VARDATA_ANY(srcunpacked),siz); dst[siz] = '\0';} if (srcunpacked != src) pfree(srcunpacked); return srclen; }
pgsql-hackers by date: