Thread: Strange plpgsql performance -- arithmetic, numeric() type, arrays
FYI Postgresql 8.0.1 $ uname -a Linux example.example.com 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 19 01:53:23 GMT 2005 i686 i686 i386 GNU/Linux Incrementing the loop counter by a factor of 10, from 1000 to 10000 makes the process take more than 100 times longer. (I only saw this happen when I happened upon using a numeric() datatype and then dividing i/100 to avoid overflow. It does not happen without the array and working with other, much larger, arrays of other data types you see no slowdown.) create or replace function baz() returns void language plpgsql as $func$ declare a numeric(4,2)[] := '{}'; begin for i in 1..1000 loop a[i] := -9.0; a[i] := a[i] + i/100; end loop; return; end; $func$; ------loop size of 1000 => explain analyze select baz(); QUERY PLAN ------------------------------------------------------------------------------------------ Result (cost=0.00..0.01 rows=1 width=0) (actual time=1116.873..1116.874 rows=1 loops=1) Total runtime: 1116.894 ms (2 rows) -----loop size of 10000 => explain analyze select baz(); QUERY PLAN ---------------------------------------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=0) (actual time=134312.457..134312.458 rows=1 loops=1) Total runtime: 134312.487 ms (2 rows) Postgresql recompiled from the source rpm with rpmbuild --target=i686-centos-linux $ gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.3/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man -- infodir=/usr/share/info --enable-shared --enable-threads=posix -- disable-checking --with-system-zlib --enable-__cxa_atexit --host=i386- redhat-linux Thread model: posix gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-42) $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 1 cpu MHz : 2992.800 cache size : 1024 KB physical id : 0 siblings : 2 runqueue : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm nx lm bogomips : 5976.88 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 1 cpu MHz : 2992.800 cache size : 1024 KB physical id : 0 siblings : 2 runqueue : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm nx lm bogomips : 5976.88 Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
> Incrementing the loop counter by a factor of 10, from 1000 to 10000 > makes the process take more than 100 times longer. (I only saw > this happen when I happened upon using a numeric() datatype > and then dividing i/100 to avoid overflow. It does not happen > without the array and working with other, much larger, arrays > of other data types you see no slowdown.) > It's not bug, it's feature ;-). plpgsql isn't good language for inicialisation big arrays. If it's possible use plperl for example. CREATE OR REPLACE FUNCTION speed1(integer) RETURNS numeric(7,2)[] AS $$ $i = 0.00; @myarray = (); while ($i<$_[0]) { push @myarray, $i; $i = $i + 1; } return '{'.join(',',@myarray).'}'; $$ LANGUAGE plperlu; select speed(100); CREATE OR REPLACE FUNCTION speed2(integer) RETURNS numeric(7,2)[] AS $$ DECLARE a numeric(7,2)[] = '{}'; BEGIN FOR _i IN 1..$1 LOOP a[_i] := _i; END LOOP; RETURN a; END; $$ LANGUAGE plpgsql; tarif=# select speed(10000); Time: 28,269 ms tarif=# select speed2(10000); Time: 91186,199 ms Regards Pavel Stehule
Hello, I found different behavior of array inicialised from plperl than plpgsql. I use code from my prev. mail tarif=# select speed(10); speed ----------------------- {0,1,2,3,4,5,6,7,8,9} (1 row) Time: 2,304 ms tarif=# select speed2(10); speed2 ------------------------------------------------------ {1.00,2.00,3.00,4.00,5.00,6.00,7.00,8.00,9.00,10.00} (1 row) Time: 0,863 ms the array from speed2 is ok, but array from speed is mal formated. I declare all function as numeric(7,2)[] when I change init value on 0.01, than results are equal regards Pavel Stehule
Pavel Stehule <stehule@kix.fsv.cvut.cz> writes: > the array from speed2 is ok, but array from speed is mal formated. They both look OK to me. > I declare all function as numeric(7,2)[] Type modifiers applied to function arguments and results are generally ignored. What you have here is functions returning numeric[], and not anything else. In the plpgsql example the coercion to numeric(7,2) happens because you stored into a local variable declared that way, but there's nothing to make it happen in the plperl example. regards, tom lane
"Karl O. Pinc" <kop@meme.com> writes: > Incrementing the loop counter by a factor of 10, from 1000 to 10000 > makes the process take more than 100 times longer. (I only saw > this happen when I happened upon using a numeric() datatype > and then dividing i/100 to avoid overflow. It does not happen > without the array and working with other, much larger, arrays > of other data types you see no slowdown.) It's the array access, not the arithmetic, that's getting you. Since numeric is not a fixed-width datatype, accessing the N'th element of the array requires O(N) time to find that element. regards, tom lane
On 04/03/2005 08:04:27 PM, Tom Lane wrote: > "Karl O. Pinc" <kop@meme.com> writes: > > Incrementing the loop counter by a factor of 10, from 1000 to 10000 > > makes the process take more than 100 times longer. (I only saw > > this happen when I happened upon using a numeric() datatype > > and then dividing i/100 to avoid overflow. It does not happen > > without the array and working with other, much larger, arrays > > of other data types you see no slowdown.) > > It's the array access, not the arithmetic, that's getting you. > Since numeric is not a fixed-width datatype, accessing the N'th > element of the array requires O(N) time to find that element. Makes sense. Thanks. (Makes me think that these sorts of arrays should be implimented with an extra level of indirection, an array of pointers to the varying data, which may not be the best way to represent arrays on disk because storage requirements go up and disk is slow.... Anyhow, I leave it to the coders.) Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein