Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Use generation context to speed up tuplesorts
Date
Msg-id 3082578.5fSG56mABF@aivenronan
Whole thread Raw
In response to Re: Use generation context to speed up tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use generation context to speed up tuplesorts
List pgsql-hackers
Le vendredi 31 décembre 2021, 22:26:37 CET David Rowley a écrit :
> I've attached some benchmark results that I took recently.  The
> spreadsheet contains results from 3 versions. master, master + 0001 -
> 0002, then master + 0001 - 0003.  The 0003 patch makes the code a bit
> more conservative about the chunk sizes it allocates and also tries to
> allocate the tuple array according to the number of tuples we expect
> to be able to sort in a single batch for when the sort is not
> estimated to fit inside work_mem.

(Sorry for trying to merge back the discussion on the two sides of the thread)

In  https://www.postgresql.org/message-id/4776839.iZASKD2KPV%40aivenronan, I
expressed the idea of being able to tune glibc's malloc behaviour.

I implemented that (patch 0001) to provide a new hook which is called on
backend startup, and anytime we set work_mem. This hook is # defined depending
on the malloc implementation: currently a default, no-op implementation is
provided as well as a glibc's malloc implementation.

The glibc's malloc implementation relies on a new GUC,
glibc_malloc_max_trim_threshold. When set to it's default value of -1, we
don't tune malloc at all, exactly as in HEAD. If a different value is provided,
we set M_MMAP_THRESHOLD to half this value, and M_TRIM_TRESHOLD to this value,
capped by work_mem / 2 and work_mem respectively.

The net result is that we can then allow to keep more unused memory at the top
of the heap, and to use mmap less frequently, if the DBA chooses too. A
possible other use case would be to on the contrary, limit the allocated
memory in idle backends to a minimum.

The reasoning behind this is that glibc's malloc default way of handling those
two thresholds is to adapt to the size of the last freed mmaped block.

I've run the same "up to 32 columns" benchmark as you did, with this new patch
applied on top of both HEAD and your v2 patchset incorporating planner
estimates for the block sizez. Those are called "aset" and "generation" in the
attached spreadsheet. For each, I've run it with
glibc_malloc_max_trim_threshold set to -1, 1MB, 4MB and 64MB. In each case
I've measured two things:
 - query latency, as reported by pgbench
 - total memory allocated by malloc at backend ext after running each query
three times. This represents the "idle" memory consumption, and thus what we
waste in malloc inside of releasing back to the system. This measurement has
been performed using the very small module presented in patch 0002. Please
note that I in no way propose that we include this module, it was just a
convenient way for me to measure memory footprint.

My conclusion is that the impressive gains you see from using the generation
context with bigger blocks mostly comes from the fact that we allocate bigger
blocks, and that this moves the mmap thresholds accordingly. I wonder how much
of a difference it would make on other malloc implementation: I'm afraid the
optimisation presented here would in fact be specific to glibc's malloc, since
we have almost the same gains with both allocators when tuning malloc to keep
more memory. I still think both approaches are useful, and would be necessary.

Since this affects all memory allocations, I need to come up with other
meaningful scenarios to benchmarks.


--
Ronan Dunklau
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical replication timeout problem
Next
From: Michael Paquier
Date:
Subject: Re: \dP and \dX use ::regclass without "pg_catalog."