Re: slab allocator performance issues - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: slab allocator performance issues
Date
Msg-id a5ccda91-d9fc-49c5-b3c7-c81528b938c5@enterprisedb.com
Whole thread Raw
In response to Re: slab allocator performance issues  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: slab allocator performance issues  (David Rowley <dgrowleyml@gmail.com>)
Re: slab allocator performance issues  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

I've been investigating the regressions in some of the benchmark 
results, together with the generation context benchmarks [1].

Turns out it's pretty difficult to benchmark this, because the results 
strongly depend on what the backend did before. For example if I run 
slab_bench_fifo with the "decreasing" test for 32kB blocks and 512B 
chunks, I get this:

   select * from slab_bench_fifo(1000000, 32768, 512, 100, 10000, 5000);

    mem_allocated | alloc_ms | free_ms
   ---------------+----------+---------
        528547840 |   155394 |   87440


i.e. palloc() takes ~155ms and pfree() ~87ms (and these result are 
stable, the numbers don't change much with more runs).

But if I run a set of "lifo" tests in the backend first, the results 
look like this:

    mem_allocated | alloc_ms | free_ms
   ---------------+----------+---------
        528547840 |    41728 |   71524
   (1 row)

so the pallocs are suddenly about ~4x faster. Clearly, what the backend 
did before may have pretty dramatic impact on results, even for simple 
benchmarks like this.

Note: The benchmark was a single SQL script, running all the different 
workloads in the same backend.

I did a fair amount of perf profiling, and the main difference between 
the slow and fast runs seems to be this:

                  0      page-faults:u 

                  0      minor-faults:u 

                  0      major-faults:u 


vs

         20,634,153      page-faults:u 

         20,634,153      minor-faults:u 

                  0      major-faults:u 


Attached is a more complete perf stat output, but the page faults seem 
to be the main issue. My theory is that in the "fast" case, the past 
backend activity puts the glibc memory management into a state that 
prevents page faults in the benchmark.

But of course, this theory may be incomplete - for example it's not 
clear why running the benchmark repeatedly would not "condition" the 
backend the same way. But it doesn't - it's ~150ms even for repeated runs.

Secondly, I'm not sure this explains why some of the timings actually 
got much slower with the 0003 patch, when the sequence of the steps is 
still the same. Of course, it's possible 0003 changes the allocation 
pattern a bit, interfering with glibc memory management.

This leads to a couple of interesting questions, I think:

1) I've only tested this on Linux, with glibc. I wonder how it'd behave 
on other platforms, or with other allocators.

2) Which cases are more important? When the backend was warmed up, or 
when each benchmark runs in a new backend? It seems the "new backend" is 
something like a "worst case" leading to more page faults, so maybe 
that's the thing to watch. OTOH it's unlikely to have a completely new 
backend, so maybe not.

3) Can this teach us something about how to allocate stuff, to better 
"prepare" the backend for future allocations? For example, it's a bit 
strange that repeated runs of the same benchmark don't do the trick, for 
some reason.



regards


[1] 
https://www.postgresql.org/message-id/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: "McCoy, Shawn"
Date:
Subject: Remove_temp_files_after_crash and significant recovery/startup time
Next
From: David Zhang
Date:
Subject: Re: ORDER BY pushdowns seem broken in postgres_fdw