Re: [PATCH] Add support for choosing huge page size - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [PATCH] Add support for choosing huge page size |
Date | |
Msg-id | CA+hUKG+gdWThHi0v6TmiLgUE_rqqQ+PKw2t+kT6w08H36qzxpw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Add support for choosing huge page size (Odin Ugedal <odin@ugedal.com>) |
Responses |
Re: [PATCH] Add support for choosing huge page size
Re: [PATCH] Add support for choosing huge page size |
List | pgsql-hackers |
Hi Odin, Documentation syntax error "<literal>2MB<literal>" shows up as: config.sgml:1605: parser error : Opening and ending tag mismatch: literal line 1602 and para </para> ^ Please install the documentation tools https://www.postgresql.org/docs/devel/docguide-toolsets.html, rerun configure and "make docs" to see these kinds of errors. The build is currently failing on Windows: undefined symbol: HAVE_DECL_MAP_HUGE_MASK at src/include/pg_config.h line 143 at src/tools/msvc/Mkvcbuild.pm line 851. I think that's telling us that you need to add this stuff into src/tools/msvc/Solution.pm, so that we can say it doesn't have it. I don't have Windows but whenever you post a new version we'll see if Windows likes it here: http://cfbot.cputube.org/odin-ugedal.html When using huge_pages=on, huge_page_size=1GB, but default shared_buffers, I noticed that the error message reports the wrong (unrounded) size in this message: 2020-06-18 02:06:30.407 UTC [73552] HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 149069824 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections. The request size was actually: mmap(NULL, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB|30<<MAP_HUGE_SHIFT, -1, 0) = -1 ENOMEM (Cannot allocate memory) 1GB pages are so big that it becomes a little tricky to set shared buffers large enough without wasting RAM. What I mean is, if I want to use shared_buffers=16GB, I need to have at least 17 huge pages available, but the 17th page is nearly entirely wasted! Imagine that on POWER 16GB pages. That makes me wonder if we should actually redefine these GUCs differently so that you state the total, or at least use the rounded memory for buffers... I think we could consider that to be a separate problem with a separate patch though. Just for fun, I compared 4KB, 2MB and 1GB pages for a hash join of a 3.5GB table against itself. Hash joins are the perfect way to exercise the TLB because they're very likely to miss. I also applied my patch[1] to allow parallel queries to use shared memory from the main shared memory area, so that they benefit from the configured page size, using pages that are allocated once at start up. (Without that, you'd have to mess around with /dev/shm mount options, and then hope that pages were available at query time, and it'd also be slower for other stupid implementation reasons). # echo never > /sys/kernel/mm/transparent_hugepage/enabled # echo 8500 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # echo 17 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages shared_buffers=8GB dynamic_shared_memory_main_size=8GB create table t as select generate_series(1, 100000000)::int i; alter table t set (parallel_workers = 7); create extension pg_prewarm; select pg_prewarm('t'); set max_parallel_workers_per_gather=7; set work_mem='1GB'; select count(*) from t t1 join t t2 using (i); 4KB pages: 12.42 seconds 2MB pages: 9.12 seconds 1GB pages: 9.07 seconds Unfortunately I can't access the TLB miss counters on this system due to virtualisation restrictions, and the systems where I can don't have 1GB pages. According to cpuid(1) this system has a fairly typical setup: cache and TLB information (2): 0x63: data TLB: 2M/4M pages, 4-way, 32 entries data TLB: 1G pages, 4-way, 4 entries 0x03: data TLB: 4K pages, 4-way, 64 entries This operation is touching about 8GB of data (scanning 3.5GB of table, building a 4.5GB hash table) so 4 x 1GB is not enough do this without TLB misses. Let's try that again, except this time with shared_buffers=4GB, dynamic_shared_memory_main_size=4GB, and only half as many tuples in t, so it ought to fit: 4KB pages: 6.37 seconds 2MB pages: 4.96 seconds 1GB pages: 5.07 seconds Well that's disappointing. I wondered if this was something to do with NUMA effects on this two node box, so I tried running that again with postgres under numactl --cpunodebind 0 --membind 0 and I got: 4KB pages: 5.43 seconds 2MB pages: 4.05 seconds 1GB pages: 4.00 seconds From this I can't really conclude that it's terribly useful to use larger page sizes, but it's certainly useful to have the ability to do further testing using the proposed GUC. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGLAE2QBv-WgGp%2BD9P_J-%3Dyne3zof9nfMaqq1h3EGHFXYQ%40mail.gmail.com
pgsql-hackers by date: