Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date
Msg-id CAFj8pRDg9oLcOHrGrZC_A-Nv_HHa-TR7AYy5q3SxSPqrFyJZWQ@mail.gmail.com
Whole thread Raw
In response to ToDo: fast update of arrays with fixed length fields for PL/pgSQL  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers



2013/11/25 Heikki Linnakangas <heikki@localhost.vmware.com>
On 07.10.2013 17:00, Pavel Stehule wrote:
Hello

I fixed patch - there was a missing cast to domain when it was used (and
all regress tests are ok now).

This doesn't work correctly for varlen datatypes. I modified your quicksort example to work with text[], and got this:

postgres=#  SELECT array_upper(quicksort(1,20000,array_agg((g::text))),1) FROM generate_series(1,20000) g;
ERROR:  invalid input syntax for integer: " "
CONTEXT:  PL/pgSQL function quicksort(integer,integer,text[]) line 10 at assignment
PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignment

The handling of array domains is also wrong. You replace the new value to the array and only check the domain constraint after that. If the constraint is violated, it's too late to roll that back. For example:

both are bug, and should be fixed
 

postgres=# create domain posarray as int[] check (value[1] > 0);
CREATE DOMAIN
postgres=# do $$
declare
  iarr posarray;
begin
  begin
    iarr[1] := 1;
    iarr[1] := -1;
  exception when others then raise notice 'got error: %', sqlerrm; end;
  raise notice 'value: %', iarr[1];
end;
$$;
NOTICE:  got error: value for domain posarray violates check constraint "posarray_check"
NOTICE:  value: -1
DO

Without the patch, it prints 1, but with the patch, -1.


In general, I'm not convinced this patch is worth the trouble. The speedup isn't all that great; manipulating large arrays in PL/pgSQL is still so slow that if you care about performance you'll want to rewrite your function in some other language or use temporary tables. And you only get a gain with arrays of fixed-length element type with no NULLs.

I understand to your objection, but I disagree so particular solution is not enough. Update of fixed array should be fast - for this task there are no any workaround in plpgsql, so using a array as fast cache for calculation has some (sometimes very high) hidden costs. A very advanced users can use a different PL or can prepare C extensions, but for common users, this cost is hidden and unsolvable.
 

So I think we should drop this patch in its current form. If we want to make array manipulation in PL/pgSQL, I think we'll have to do something similar to how we handle "row" variables, or something else entirely. But that's a much bigger patch, and I'm not sure it's worth the trouble either.

Sorry, I disagree again - using same mechanism for arrays of fixed types and arrays of varlena types should be ineffective.  Slow part of this area is manipulation with large memory blocks - there we fighting with physical CPU and access to memory limits.

Now, I know about few significant PL/pgSQL issues related to manipulation with variables

* repeated detoast
* missing preallocation
* low effectiveness manipulation with large arrays
* low effectiveness with casting (using IO casting instead binary)

A final solution of these issues needs redesign of work with variables in PL/pgSQL - some in Perl style (maybe). It can be nice, but I don't know somebody who will work on this early.

I would to concentrate to integration of plpgsql_check to core - and maybe later I can work on this topic (maybe after 2 years).

So any particular solution (should not be mine) can serve well two three years. This code is 100% hidden on top, and we can change it safely, so particular solution can be enough. 

Regards

Pavel


Looking at perf profile and the functions involved, though, I see some low-hanging fruit: in array_set, the new array is allocated with palloc0'd, but we only need to zero out a few alignment bytes where the new value is copied. Replacing it with palloc() will save some cycles, per the attached patch. Nowhere near as much as your patch, but this is pretty uncontroversial.

- Heikki


pgsql-hackers by date:

Previous
From: Sawada Masahiko
Date:
Subject: psql shows line number
Next
From: Rajeev rastogi
Date:
Subject: Re: PostgreSQL Service on Windows does not start. ~ "is not a valid Win32 application"