Re: asynchronous and vectorized execution - Mailing list pgsql-hackers

From Andres Freund
Subject Re: asynchronous and vectorized execution
Date
Msg-id 20160511005016.j3m7wkk6cafx2ccr@alap3.anarazel.de
Whole thread Raw
In response to Re: asynchronous and vectorized execution  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: asynchronous and vectorized execution  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2016-05-10 12:56:17 -0400, Robert Haas wrote:
> I suspect the number of queries that are being hurt by fmgr overhead
> is really large, and I think it would be nice to attack that problem
> more directly.  It's a bit hard to discuss what's worthwhile in the
> abstract, without performance numbers, but when you vectorize, how
> much is the benefit from using SIMD instructions and how much is the
> benefit from just not going through the fmgr every time?

I think fmgr overhead is an issue, but in most profiles of execution
heavy loads I've seen the bottlenecks are elsewhere. They often seem to
roughly look like
+   15.47%  postgres  postgres           [.] slot_deform_tuple
+   12.99%  postgres  postgres           [.] slot_getattr
+   10.36%  postgres  postgres           [.] ExecMakeFunctionResultNoSets
+    9.76%  postgres  postgres           [.] heap_getnext
+    6.34%  postgres  postgres           [.] HeapTupleSatisfiesMVCC
+    5.09%  postgres  postgres           [.] heapgetpage
+    4.59%  postgres  postgres           [.] hash_search_with_hash_value
+    4.36%  postgres  postgres           [.] ExecQual
+    3.30%  postgres  postgres           [.] ExecStoreTuple
+    3.29%  postgres  postgres           [.] ExecScan

or

-   33.67%  postgres  postgres           [.] ExecMakeFunctionResultNoSets  - ExecMakeFunctionResultNoSets     + 99.11%
ExecEvalOr    + 0.89% ExecQual
 
+   14.32%  postgres  postgres           [.] slot_getattr
+    5.66%  postgres  postgres           [.] ExecEvalOr
+    5.06%  postgres  postgres           [.] check_stack_depth
+    5.02%  postgres  postgres           [.] slot_deform_tuple
+    4.05%  postgres  postgres           [.] pgstat_end_function_usage
+    3.69%  postgres  postgres           [.] heap_getnext
+    3.41%  postgres  postgres           [.] ExecEvalScalarVarFast
+    3.36%  postgres  postgres           [.] ExecEvalConst


with a healthy dose of _bt_compare, heap_hot_search_buffer in more index
heavy workloads.

(yes, I just pulled these example profiles from somewhere, but I've more
often seen them look like this, than very fmgr heavy).


That seems to suggest that we need to restructure how we get to calling
fmgr functions, before worrying about the actual fmgr call.


Tomas, Mark, IIRC you'd both generated perf profiles for TPC-H (IIRC?)
queries at some point. Any chance the results are online somewhere?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Next
From: Bruce Momjian
Date:
Subject: Re: ALTER TABLE lock downgrades have broken pg_upgrade