In PG12, query with float calculations is slower than PG11 - Mailing list pgsql-hackers
From | keisuke kuroda |
---|---|
Subject | In PG12, query with float calculations is slower than PG11 |
Date | |
Msg-id | CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com Whole thread Raw |
Responses |
Re: In PG12, query with float calculations is slower than PG11
|
List | pgsql-hackers |
Hi,
I am testing performance both PG12 and PG11.
I found the case of performance degradation in PG12.
Amit Langote help me to analyze and to create patch.
Thanks!
* environment
CentOS Linux release 7.6.1810 (Core)
postgresql 12.1
postgresql 11.6
* postgresql.conf
shared_buffers = 2048MB
max_parallel_workers_per_gather = 0
work_mem = '64MB'
jit = off
* test case
CREATE TABLE realtest(a real, b real, c real, d real, e real);
INSERT INTO realtest SELECT i,i,i,i,i FROM generate_series(0,10000000) AS i;
EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
* result
PG12.1 5878.389 ms
PG11.6 4533.554 ms
** PostgreSQL 12.1
pgbench=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.040..5195.328 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.051 ms
Execution Time: 5878.389 ms
(5 行)
Samples: 6K of event 'cpu-clock', Event count (approx.): 1577750000
Overhead Command Shared Object Symbol
25.48% postgres postgres [.] ExecInterpExpr
★18.65% postgres libc-2.17.so [.] __isinf
14.36% postgres postgres [.] float84mul
8.54% postgres [vdso] [.] __vdso_clock_gettime
4.02% postgres postgres [.] ExecScan
3.69% postgres postgres [.] tts_buffer_heap_getsomeattrs
2.63% postgres libc-2.17.so [.] __clock_gettime
2.55% postgres postgres [.] HeapTupleSatisfiesVisibility
2.00% postgres postgres [.] heapgettup_pagemode
** PostgreSQL 11.6
pgbench=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.012..3845.480 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.033 ms
Execution Time: 4533.554 ms
(5 行)
Samples: 4K of event 'cpu-clock', Event count (approx.): 1192000000
Overhead Command Shared Object Symbol
32.30% postgres postgres [.] ExecInterpExpr
14.95% postgres postgres [.] float84mul
10.57% postgres [vdso] [.] __vdso_clock_gettime
★ 6.84% postgres libc-2.17.so [.] __isinf
3.96% postgres postgres [.] ExecScan
3.50% postgres libc-2.17.so [.] __clock_gettime
3.31% postgres postgres [.] heap_getnext
3.08% postgres postgres [.] HeapTupleSatisfiesMVCC
2.77% postgres postgres [.] slot_deform_tuple
2.37% postgres postgres [.] ExecProcNodeInstr
2.08% postgres postgres [.] standard_ExecutorRun
* cause
Obviously, even in common cases where no overflow occurs,
you can tell that PG 12 is performing isinf() 3 times on every call of
float8_mul() once for each of val1, val2, result where as PG 11
is performing only once for result.
That's because check_float8_val() (in PG 12) is a function
whose arguments must be evaluated before
it is called (it is inline, but that's irrelevant),
whereas CHECKFLOATVAL() (in PG11) is a macro
whose arguments are only substituted into its body.
By the way, this change of float8mul() implementation is
mostly due to the following commit in PG 12 development cycle:
commit 6bf0bc842bd75877e31727eb559c6a69e237f831
Especially, the following diff:
@@ -894,13 +746,8 @@ float8mul(PG_FUNCTION_ARGS) {
float8 arg1 = PG_GETARG_FLOAT8(0);
float8 arg2 = PG_GETARG_FLOAT8(1);
- float8 result;
-
- result = arg1 * arg2;
- CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2),
- arg1 == 0 || arg2 == 0);
- PG_RETURN_FLOAT8(result);
+ PG_RETURN_FLOAT8(float8_mul(arg1, arg2));
}
* patch
This patch uses MACRO which was used by PG11.
I tried attached patch, which can be applied to PG 12 source and performed a benchmark:
PG12.1 5878.389 ms
PG11.6 4533.554 ms
PG12.1 + Patch 4679.162 ms
** PostgreSQL 12.1 + Patch
postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..307328.38 rows=10828150 width=40) (actual time=0.012..4009.012 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.038 ms
Execution Time: 4679.162 ms
(5 rows)
Samples: 5K of event 'cpu-clock', Event count (approx.): 1376750000
Overhead Command Shared Object Symbol
31.43% postgres postgres [.] ExecInterpExpr
14.24% postgres postgres [.] float84mul
10.40% postgres [vdso] [.] __vdso_clock_gettime
★ 5.41% postgres libc-2.17.so [.] __isinf
4.63% postgres postgres [.] tts_buffer_heap_getsomeattrs
4.03% postgres postgres [.] ExecScan
3.54% postgres libc-2.17.so [.] __clock_gettime
3.12% postgres postgres [.] HeapTupleSatisfiesVisibility
2.36% postgres postgres [.] heap_getnextslot
2.16% postgres postgres [.] heapgettup_pagemode
2.09% postgres postgres [.] standard_ExecutorRun
2.07% postgres postgres [.] SeqNext
2.03% postgres postgres [.] ExecProcNodeInstr
2.03% postgres postgres [.] tts_virtual_clear
PG 12 is still slower compared to PG 11, but the __isinf() situation is better with the patch.
Best Regards,
Keisuke Kuroda
I am testing performance both PG12 and PG11.
I found the case of performance degradation in PG12.
Amit Langote help me to analyze and to create patch.
Thanks!
* environment
CentOS Linux release 7.6.1810 (Core)
postgresql 12.1
postgresql 11.6
* postgresql.conf
shared_buffers = 2048MB
max_parallel_workers_per_gather = 0
work_mem = '64MB'
jit = off
* test case
CREATE TABLE realtest(a real, b real, c real, d real, e real);
INSERT INTO realtest SELECT i,i,i,i,i FROM generate_series(0,10000000) AS i;
EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
* result
PG12.1 5878.389 ms
PG11.6 4533.554 ms
** PostgreSQL 12.1
pgbench=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.040..5195.328 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.051 ms
Execution Time: 5878.389 ms
(5 行)
Samples: 6K of event 'cpu-clock', Event count (approx.): 1577750000
Overhead Command Shared Object Symbol
25.48% postgres postgres [.] ExecInterpExpr
★18.65% postgres libc-2.17.so [.] __isinf
14.36% postgres postgres [.] float84mul
8.54% postgres [vdso] [.] __vdso_clock_gettime
4.02% postgres postgres [.] ExecScan
3.69% postgres postgres [.] tts_buffer_heap_getsomeattrs
2.63% postgres libc-2.17.so [.] __clock_gettime
2.55% postgres postgres [.] HeapTupleSatisfiesVisibility
2.00% postgres postgres [.] heapgettup_pagemode
** PostgreSQL 11.6
pgbench=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.012..3845.480 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.033 ms
Execution Time: 4533.554 ms
(5 行)
Samples: 4K of event 'cpu-clock', Event count (approx.): 1192000000
Overhead Command Shared Object Symbol
32.30% postgres postgres [.] ExecInterpExpr
14.95% postgres postgres [.] float84mul
10.57% postgres [vdso] [.] __vdso_clock_gettime
★ 6.84% postgres libc-2.17.so [.] __isinf
3.96% postgres postgres [.] ExecScan
3.50% postgres libc-2.17.so [.] __clock_gettime
3.31% postgres postgres [.] heap_getnext
3.08% postgres postgres [.] HeapTupleSatisfiesMVCC
2.77% postgres postgres [.] slot_deform_tuple
2.37% postgres postgres [.] ExecProcNodeInstr
2.08% postgres postgres [.] standard_ExecutorRun
* cause
Obviously, even in common cases where no overflow occurs,
you can tell that PG 12 is performing isinf() 3 times on every call of
float8_mul() once for each of val1, val2, result where as PG 11
is performing only once for result.
That's because check_float8_val() (in PG 12) is a function
whose arguments must be evaluated before
it is called (it is inline, but that's irrelevant),
whereas CHECKFLOATVAL() (in PG11) is a macro
whose arguments are only substituted into its body.
By the way, this change of float8mul() implementation is
mostly due to the following commit in PG 12 development cycle:
commit 6bf0bc842bd75877e31727eb559c6a69e237f831
Especially, the following diff:
@@ -894,13 +746,8 @@ float8mul(PG_FUNCTION_ARGS) {
float8 arg1 = PG_GETARG_FLOAT8(0);
float8 arg2 = PG_GETARG_FLOAT8(1);
- float8 result;
-
- result = arg1 * arg2;
- CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2),
- arg1 == 0 || arg2 == 0);
- PG_RETURN_FLOAT8(result);
+ PG_RETURN_FLOAT8(float8_mul(arg1, arg2));
}
* patch
This patch uses MACRO which was used by PG11.
I tried attached patch, which can be applied to PG 12 source and performed a benchmark:
PG12.1 5878.389 ms
PG11.6 4533.554 ms
PG12.1 + Patch 4679.162 ms
** PostgreSQL 12.1 + Patch
postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
select (2 * a) , (2 * b) , (2 * c), (2 * d), (2 * e)
from realtest;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.realtest (cost=0.00..307328.38 rows=10828150 width=40) (actual time=0.012..4009.012 rows=10000001 loops=1)
Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
Buffers: shared hit=63695
Planning Time: 0.038 ms
Execution Time: 4679.162 ms
(5 rows)
Samples: 5K of event 'cpu-clock', Event count (approx.): 1376750000
Overhead Command Shared Object Symbol
31.43% postgres postgres [.] ExecInterpExpr
14.24% postgres postgres [.] float84mul
10.40% postgres [vdso] [.] __vdso_clock_gettime
★ 5.41% postgres libc-2.17.so [.] __isinf
4.63% postgres postgres [.] tts_buffer_heap_getsomeattrs
4.03% postgres postgres [.] ExecScan
3.54% postgres libc-2.17.so [.] __clock_gettime
3.12% postgres postgres [.] HeapTupleSatisfiesVisibility
2.36% postgres postgres [.] heap_getnextslot
2.16% postgres postgres [.] heapgettup_pagemode
2.09% postgres postgres [.] standard_ExecutorRun
2.07% postgres postgres [.] SeqNext
2.03% postgres postgres [.] ExecProcNodeInstr
2.03% postgres postgres [.] tts_virtual_clear
PG 12 is still slower compared to PG 11, but the __isinf() situation is better with the patch.
Best Regards,
Keisuke Kuroda
Attachment
pgsql-hackers by date: