Thread: Best way to construct PostgreSQL ArrayType (_int4) from C int array
At the moment I am using the following code to construct a PostgreSQL array from a C array in my C extension but I am not so sure if this is really the best solution: const int *data = array.data(); // C array Datum *d = (Datum *) palloc(sizeof(Datum) * size); for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]); ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i'); Is this okay or is there a better solution (existing function in the PostgreSQL source for example)? Cheers, Adrian
On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: > At the moment I am using the following code to construct a PostgreSQL > array from a C array in my C extension but I am not so sure if this is > really the best solution: > > const int *data = array.data(); // C array > Datum *d = (Datum *) palloc(sizeof(Datum) * size); > > for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]); > > ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i'); > > Is this okay or is there a better solution (existing function in the > PostgreSQL source for example)? That's pretty much the best way AFAIK. Int32GetDatum doesn't do anything fancy -- it's just a 32 bit mask/assignment. constructing the array at once is going to be a lot better than incrementally creating it. Do you expect the arrays to be large, say bigger than 10k elements? merlin
The largest arrays I expect at the moment are more or less sparse vectors of around 4.8k elements and I have noticed that the input/output (C/C++ extension) does not scale well with the number of elements in the array. Using a function that sums all elements in the array, this is the time it takes for ~150k arrays of various sizes (including ordering desc and limit 10): 128: 61ms 256: 80ms 512: 681ms 1024 1065ms 2048 7682ms 4096 21332ms That's why I thought that the construction of the PostgreSQL array was not optimal. On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >> At the moment I am using the following code to construct a PostgreSQL >> array from a C array in my C extension but I am not so sure if this is >> really the best solution: >> >> const int *data = array.data(); // C array >> Datum *d = (Datum *) palloc(sizeof(Datum) * size); >> >> for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]); >> >> ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i'); >> >> Is this okay or is there a better solution (existing function in the >> PostgreSQL source for example)? > > That's pretty much the best way AFAIK. Int32GetDatum doesn't do > anything fancy -- it's just a 32 bit mask/assignment. constructing > the array at once is going to be a lot better than incrementally > creating it. Do you expect the arrays to be large, say bigger than > 10k elements? > > merlin >
On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214@cam.ac.uk> wrote: > The largest arrays I expect at the moment are more or less sparse > vectors of around 4.8k elements and I have noticed that the > input/output (C/C++ extension) does not scale well with the number of > elements in the array. > > Using a function that sums all elements in the array, this is the time > it takes for ~150k arrays of various sizes (including ordering desc > and limit 10): > > 128: 61ms > 256: 80ms > 512: 681ms > 1024 1065ms > 2048 7682ms > 4096 21332ms hm, I'm not following you exactly -- what sql are you running? This scales pretty well for me: select array_dims(array(select generate_series(1,1000000))); etc merlin
On Wed, Apr 27, 2011 at 18:06, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >> The largest arrays I expect at the moment are more or less sparse >> vectors of around 4.8k elements and I have noticed that the >> input/output (C/C++ extension) does not scale well with the number of >> elements in the array. >> >> Using a function that sums all elements in the array, this is the time >> it takes for ~150k arrays of various sizes (including ordering desc >> and limit 10): >> >> 128: 61ms >> 256: 80ms >> 512: 681ms >> 1024 1065ms >> 2048 7682ms >> 4096 21332ms > > hm, I'm not following you exactly -- what sql are you running? This > scales pretty well for me: > select array_dims(array(select generate_series(1,1000000))); > etc > > merlin > I have a C extension function that creates _int4 arrays of a specified size with random elements, in this case 128,256,512 etc. Another function from my extension returns the sum of the array. In this case I created a table with around 150k arrays to benchmark the extension. The query sums each array in the table and returns the 10 highest numbers. The C extension is actually a wrapper around the Eigen 3 template library, which works pretty well - now I am trying to tweak the input/output functions to get better performance with larger arrays.
2011/4/27 Adrian Schreyer <ams214@cam.ac.uk>: > The largest arrays I expect at the moment are more or less sparse > vectors of around 4.8k elements and I have noticed that the > input/output (C/C++ extension) does not scale well with the number of > elements in the array. > > Using a function that sums all elements in the array, this is the time > it takes for ~150k arrays of various sizes (including ordering desc > and limit 10): > PostgreSQL doesn't use a index for access to array fields. So access to fields of packed arrays can be slower for higher subscripts. Regards Pavel Stehule > 128: 61ms > 256: 80ms > 512: 681ms > 1024 1065ms > 2048 7682ms > 4096 21332ms > > That's why I thought that the construction of the PostgreSQL array was > not optimal. > > On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214@cam.ac.uk> wrote: >>> At the moment I am using the following code to construct a PostgreSQL >>> array from a C array in my C extension but I am not so sure if this is >>> really the best solution: >>> >>> const int *data = array.data(); // C array >>> Datum *d = (Datum *) palloc(sizeof(Datum) * size); >>> >>> for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]); >>> >>> ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i'); >>> >>> Is this okay or is there a better solution (existing function in the >>> PostgreSQL source for example)? >> >> That's pretty much the best way AFAIK. Int32GetDatum doesn't do >> anything fancy -- it's just a 32 bit mask/assignment. constructing >> the array at once is going to be a lot better than incrementally >> creating it. Do you expect the arrays to be large, say bigger than >> 10k elements? >> >> merlin >> > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >