Home > mailing lists

Correct Allocation of UNICODE string in C - Mailing list pgsql-general

From	Steffen Macke
Subject	Correct Allocation of UNICODE string in C
Date	August 1, 2003 11:28:03
Msg-id	3F29017A.1070400@web.de Whole thread Raw
List	pgsql-general

Tree view

Hello All,

I'm struggling with the correct allocation of a
UNICODE text in a C function for PostgreSQL.
The strings are sometimes truncated, sometimes garbage
bytes are added at the end.

Is there a code example, that takes a UNICODE (UTF-8) text
of unknown length, allocates the PostgreSQL structure and copies
the data correctly?

You find the function in question below,
the full sources are available from
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/dcmms/arabic/

The problem is that the arabic_reshape() function will return texts
that are longer or shorter than the original text. In the PostgreSQL
sources I just found examples, where texts are copied - no example
how to allocate a "fresh" UTF-8 string.

Best Regards,

Steffen Macke

text *
shape_arabic(text *t)
{
   glong items_read;
   glong items_written;
   long len;
   long i;
   text *new_t;
   text *utf8_t;

   len = g_utf8_strlen(VARDATA(t), -1);
   new_t = (text *) palloc(VARHDRSZ+(len*4)+4);
   VARATT_SIZEP(new_t) = VARHDRSZ+(len*4)+4;
   utf8_t = (text *) palloc(VARSIZE(t)+4);
   VARATT_SIZEP(utf8_t) = VARSIZE(t)+4;
   memset(VARDATA(new_t), 0, (len*4)+4);
   memset(VARDATA(utf8_t), 0, VARSIZE(utf8_t)-VARHDRSZ);
   len = len*2;
   arabic_reshape(&len, VARDATA(t), VARDATA(new_t), ar_unifont);
   g_ucs4_to_utf8(VARDATA(new_t), VARDATA(utf8_t), -1, &items_read,
&items_written);
   len = g_utf8_strlen(VARDATA(utf8_t), -1);
   return utf8_t;
}

pgsql-general by date:

From: Ian Harding
Date: 01 August 2003, 11:26:03
Subject: Error from Sub-Select, Simple Example.

From: "Tambet Matiisen"
Date: 01 August 2003, 11:31:01
Subject: COPY and domains

Correct Allocation of UNICODE string in C - Mailing list pgsql-general

Previous

Next