Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations |
Date | |
Msg-id | 20230217034000.ddcklsr6hfdifyqk@awork3.anarazel.de Whole thread Raw |
In response to | Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations ("Jonah H. Harris" <jonah.harris@gmail.com>) |
Responses |
Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations
|
List | pgsql-hackers |
Hi, On 2023-02-16 21:34:18 -0500, Jonah H. Harris wrote: > On Thu, Feb 16, 2023 at 7:32 PM Andres Freund <andres@anarazel.de> wrote: > Given not much changed regarding that allocation context IIRC, I’d think > all recents. It was observed in 13, 14, and 15. We did have a fair bit of changes in related code in the last few years, including some in 16. I wouldn't expect them to help *hugely*, but also wouldn't be surprised if it showed. > I’ll have to create one - it was most evident on a TPC-C or sysbench test > using the Postgres, MySQL, SQLite, and Oracle FDWs. It may be reproducible > with pgbench as well. I'd like a workload that hits a perf issue with this, because I think there likely are some general performance improvements that we could make, without changing the initial size or the "growth rate". Perhaps, as a starting point, you could get MemoryContextStats(queryDesc->estate->es_query_cxt) both at the end of standard_ExecutorStart() and at the beginning of standard_ExecutorFinish(), for one of the queries triggering the performance issues? > > If the issue is a specific FDW needing to make a lot of allocations, I can > > see > > adding an API to tell a memory context that it ought to be ready to > > allocate a > > certain amount of memory efficiently (e.g. by increasing the block size of > > the > > next allocation by more than 2x). > > > While I’m happy to be wrong, it seems is an inherent problem not really > specific to FDW implementations themselves but the general expectation that > all FDWs are using more of that context than non-FDW cases for similar > types of operations, which wasn’t really a consideration in that allocation > over time. Lots of things can end up in the query context, it's really not FDW specific for it to be of nontrivial size. E.g. most tuples passed around end up in it. Similar performance issues also exists for plenty other memory contexts. Which for me that's a reason *not* to make it configurable just for CreateExecutorState. Or were you proposing to make ALLOCSET_DEFAULT_INITSIZE configurable? That would end up with a lot of waste, I think. The executor context case might actually be a comparatively easy case to address. There's really two "phases" of use for es_query_ctx. First, we create the entire executor tree in it, during standard_ExecutorStart(). Second, during query execution, we allocate things with query lifetime (be that because they need to live till the end, or because they are otherwise managed, like tuples). Even very simple queries end up with multiple blocks at the end: E.g. SELECT relname FROM pg_class WHERE relkind = 'r' AND relname = 'frak'; yields: ExecutorState: 43784 total in 3 blocks; 8960 free (5 chunks); 34824 used ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used Grand total: 51976 bytes in 4 blocks; 16888 free (5 chunks); 35088 used So quite justifiably we could just increase the hardcoded size in CreateExecutorState. I'd expect that starting a few size classes up would help noticeably. But I think we likely could do better here. The amount of memory that ends up in es_query_cxt during "phase 1" strongly correlates with the complexity of the statement, as the whole executor tree ends up in it. Using information about the complexity of the planned statement to influence es_query_cxt's block sizes would make sense to me. I suspect it's a decent enough proxy for "phase 2" as well. Medium-long term I really want to allocate at least all the executor nodes themselves in a single allocation. But that's a bit further out than what we're talking about here. Greetings, Andres Freund
pgsql-hackers by date: