Re: Estimating HugePages Requirements? - Mailing list pgsql-admin

From Don Seiler
Subject Re: Estimating HugePages Requirements?
Date
Msg-id CAHJZqBAZ+SYR4jZ-Jy5nHYwUP3vYF+UjPGKwCR+gZm0z8vyoag@mail.gmail.com
Whole thread Raw
In response to Estimating HugePages Requirements?  (Don Seiler <don@seiler.us>)
List pgsql-admin
On Thu, Jun 10, 2021 at 7:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
> On Wed, Jun 9, 2021, 21:03 P C <puravc@gmail.com> wrote:
>
> > I agree, its confusing for many and that confusion arises from the fact
> > that you usually talk of shared_buffers in MB or GB whereas hugepages have
> > to be configured in units of 2mb. But once they understand they realize its
> > pretty simple.
> >
> > Don, we have experienced the same not just with postgres but also with
> > oracle. I havent been able to get to the root of it, but what we usually do
> > is, we add another 100-200 pages and that works for us. If the SGA or
> > shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
> > hundred MBs  may be wasted (because the moment you configure hugepages, the
> > operating system considers it as used and does not use it any more) but
> > nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to
> > 1gb does not hurt really.
>
> I don't have a problem with the math, just wanted to know if it was
> possible to better estimate what the actual requirements would be at
> deployment time. My fallback will probably be you did and just pad with an
> extra 512MB by default.

It's because the huge allocation isn't just shared_buffers, but also
wal_buffers:

| The amount of shared memory used for WAL data that has not yet been written to disk.
| The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, ...

.. and other stuff:

src/backend/storage/ipc/ipci.c
         * Size of the Postgres shared-memory block is estimated via
         * moderately-accurate estimates for the big hogs, plus 100K for the
         * stuff that's too small to bother with estimating.
         *
         * We take some care during this phase to ensure that the total size
         * request doesn't overflow size_t.  If this gets through, we don't
         * need to be so careful during the actual allocation phase.
         */
        size = 100000;
        size = add_size(size, PGSemaphoreShmemSize(numSemas));
        size = add_size(size, SpinlockSemaSize());
        size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
                                                                                         sizeof(ShmemIndexEnt)));
        size = add_size(size, dsm_estimate_size());
        size = add_size(size, BufferShmemSize());
        size = add_size(size, LockShmemSize());
        size = add_size(size, PredicateLockShmemSize());
        size = add_size(size, ProcGlobalShmemSize());
        size = add_size(size, XLOGShmemSize());
        size = add_size(size, CLOGShmemSize());
        size = add_size(size, CommitTsShmemSize());
        size = add_size(size, SUBTRANSShmemSize());
        size = add_size(size, TwoPhaseShmemSize());
        size = add_size(size, BackgroundWorkerShmemSize());
        size = add_size(size, MultiXactShmemSize());
        size = add_size(size, LWLockShmemSize());
        size = add_size(size, ProcArrayShmemSize());
        size = add_size(size, BackendStatusShmemSize());
        size = add_size(size, SInvalShmemSize());
        size = add_size(size, PMSignalShmemSize());
        size = add_size(size, ProcSignalShmemSize());
        size = add_size(size, CheckpointerShmemSize());
        size = add_size(size, AutoVacuumShmemSize());
        size = add_size(size, ReplicationSlotsShmemSize());
        size = add_size(size, ReplicationOriginShmemSize());
        size = add_size(size, WalSndShmemSize());
        size = add_size(size, WalRcvShmemSize());
        size = add_size(size, PgArchShmemSize());
        size = add_size(size, ApplyLauncherShmemSize());
        size = add_size(size, SnapMgrShmemSize());
        size = add_size(size, BTreeShmemSize());
        size = add_size(size, SyncScanShmemSize());
        size = add_size(size, AsyncShmemSize());
#ifdef EXEC_BACKEND
        size = add_size(size, ShmemBackendArraySize());
#endif

        /* freeze the addin request size and include it */
        addin_request_allowed = false;
        size = add_size(size, total_addin_request);

        /* might as well round it off to a multiple of a typical page size */
        size = add_size(size, 8192 - (size % 8192));

BTW, I think it'd be nice if this were a NOTICE:
| elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", allocsize);

Great detail. I did some trial and error around just a few variables (shared_buffers, wal_buffers, max_connections) and came up with a formula that seems to be "good enough" for at least a rough default estimate.

The pseudo-code is basically:

ceiling((shared_buffers + 200 + (25 * shared_buffers/1024) + 10*(max_connections-100)/200 + wal_buffers-16)/2)
 
This assumes that all values are in MB and that wal_buffers is set to a value other than the default of -1 obviously. I decided to default wal_buffers to 16MB in our environments since that's what -1 should go to based on the description in the documentation for an instance with shared_buffers of the sizes in our deployments.

This formula did come up a little short (2MB) when I had a low shared_buffers value at 2GB. Raising that starting 200 value to something like 250 would take care of that. The limited testing I did based on different values we see across our production deployments worked otherwise. Please let me know what you folks think. I know I'm ignoring a lot of other factors, especially given what Justin recently shared.

The remaining trick for me now is to calculate this in chef since shared_buffers and wal_buffers attributes are strings with the unit ("MB") in them, rather than just numerical values. Thinking of changing that attribute to be just that and assume/require MB to make the calculations easier.

--
Don Seiler
www.seiler.us

pgsql-admin by date:

Previous
From: pramod kg
Date:
Subject: Re: PostgreSQL SSL params
Next
From: abbas alizadeh
Date:
Subject: Kill postgresql process