Thread: Large writable variables
Hi, while briefly thinking about the per-process overhead of postgres, I looked at the section sizes of a modern postgres. In particular which memory areas are *not* shared between processes. If you look at the section sizes that are mapped read-write: $ size --format=sysv src/backend/postgres src/backend/postgres : section size addr .rodata 1593617 5730304 (read-only, for reference) .data.rel.ro 134944 8039904 (read only after start) .data 53888 8178912 (read-write, initialized) .bss 510416 8232800 (read-write, uninitialized) Total 52417197 So we have 500kb of not-initialized memory mapped into every process. That's, uh, not nothing. top unitialized allocations: $ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[bB]\b'|head 0000000008251872 0000000000131144 b LagTracker 0000000008585088 0000000000131104 b hist_entries 0000000008435040 0000000000085280 b DCHCache 0000000008391168 0000000000043840 b NUMCache 0000000008560224 0000000000023440 b tzdefrules_s 0000000008536704 0000000000023440 b gmtmem.7009 0000000008716192 0000000000016384 b hist_start 0000000008238336 0000000000008192 b PqRecvBuffer 0000000008734208 0000000000005136 B BackendWritebackContext 0000000008386368 0000000000003200 b held_lwlocks So we have a two variables sized 130kb. Yikes. struct { XLogRecPtr last_lsn; /* 0 8 */ WalTimeSample buffer[8192]; /* 8 131072 */ /* --- cacheline 2048 boundary (131072 bytes) was 8 bytes ago --- */ int write_head; /* 131080 4 */ int read_heads[3]; /* 131084 12 */ WalTimeSample last_read[3]; /* 131096 48 */ /* --- cacheline 2049 boundary (131136 bytes) was 8 bytes ago --- */ /* size: 131144, cachelines: 2050, members: 5 */ /* last cacheline: 8 bytes */ }; that's not actually used very often, nor in all processes... Thomas? hist_entries is pg_lzcompress; DCHCache,NUMCache are formatting.c. top itialized allocations: $ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[dD]\b'|head 0000000008086944 0000000000087904 D fmgr_builtins 0000000008201120 0000000000017280 d ConfigureNamesInt 0000000008218400 0000000000013680 d ConfigureNamesBool 0000000008189248 0000000000008512 d ConfigureNamesString 0000000008077344 0000000000007040 D ScanKeywords 0000000008184928 0000000000004320 d ConfigureNamesEnum 0000000008197760 0000000000003360 d ConfigureNamesReal 0000000008062976 0000000000002304 d DCH_keywords 0000000008069952 0000000000002016 D pg_wchar_table 0000000008075552 0000000000001776 d encoding_match_list fmgr_builtins isn't readonly even though declared a const - I assume because it's full of addresses that will be mapped differently from execution to execution. ConfigureNames* isn't marked as const, because we update them: /* Rather than requiring vartype to be filled in by hand, do this: */ conf->gen.vartype = PGC_BOOL; I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly section. Greetings, Andres Freund
Hi, On 2018-10-15 13:07:54 -0700, Andres Freund wrote: > top itialized allocations: > $ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[dD]\b'|head > 0000000008086944 0000000000087904 D fmgr_builtins > 0000000008201120 0000000000017280 d ConfigureNamesInt > 0000000008218400 0000000000013680 d ConfigureNamesBool > 0000000008189248 0000000000008512 d ConfigureNamesString > 0000000008077344 0000000000007040 D ScanKeywords > 0000000008184928 0000000000004320 d ConfigureNamesEnum > 0000000008197760 0000000000003360 d ConfigureNamesReal > 0000000008062976 0000000000002304 d DCH_keywords > 0000000008069952 0000000000002016 D pg_wchar_table > 0000000008075552 0000000000001776 d encoding_match_list > > fmgr_builtins isn't readonly even though declared a const - I assume > because it's full of addresses that will be mapped differently from > execution to execution. > > ConfigureNames* isn't marked as const, because we update them: > /* Rather than requiring vartype to be filled in by hand, do this: */ > conf->gen.vartype = PGC_BOOL; > > I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly > section. It's because they contain pointers to strings, which are affected by relocations (and position independent executables force everything to be relocatable). They do go into .data.rel.ro however: $ objdump -t ~/build/postgres/dev-optimize/vpath/src/backend/postgres|grep -E '\b(ScanKeywords|fmgr_builtins|DCH_keywords|pg_wchar_table)\b' 00000000007b0800 l O .data.rel.ro 0000000000000900 DCH_keywords 00000000007b4020 g O .data.rel.ro 0000000000001b80 ScanKeywords 00000000007b65a0 g O .data.rel.ro 0000000000015760 fmgr_builtins 00000000007b2340 g O .data.rel.ro 00000000000007e0 pg_wchar_table as long as we're using forking rather than EXEC_BACKEND, that's perfectly fine. I don't quite know how windows handles any of this, so I can't say whether it's a problem there. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > So we have 500kb of not-initialized memory mapped into every > process. That's, uh, not nothing. Bleah. > 0000000008585088 0000000000131104 b hist_entries > 0000000008716192 0000000000016384 b hist_start I'm unsure what fraction of processes would have use for these. > 0000000008435040 0000000000085280 b DCHCache > 0000000008391168 0000000000043840 b NUMCache We could surely allocate these on first use. > 0000000008560224 0000000000023440 b tzdefrules_s > 0000000008536704 0000000000023440 b gmtmem.7009 I think that tzdefrules_s is not used in common cases (though I could be wrong about that), so we could win by alloc-on-first-use. The same might be true for gmtmem, but there's a sticking point: there is no provision for failure there, so I'm unsure how we avoid crashing on OOM. > 0000000008238336 0000000000008192 b PqRecvBuffer > 0000000008734208 0000000000005136 B BackendWritebackContext > 0000000008386368 0000000000003200 b held_lwlocks These are below my personal threshold of pain. > fmgr_builtins isn't readonly even though declared a const - I assume > because it's full of addresses that will be mapped differently from > execution to execution. Check. > I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly > section. I think it's the same problem: pointers can't be truly const because they have to be changed if one relocates the executable. We could possibly fix these by changing the data structure so that what's in a ScanKeywords entry is an offset into some giant string constant somewhere. No idea how that would affect performance, but I do notice that we could reduce the sizeof(ScanKeyword), which can't hurt. regards, tom lane
Hi, On 2018-10-15 16:36:26 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > So we have 500kb of not-initialized memory mapped into every > > process. That's, uh, not nothing. > > Bleah. Yea... > > 0000000008585088 0000000000131104 b hist_entries > > 0000000008716192 0000000000016384 b hist_start > > I'm unsure what fraction of processes would have use for these. Yea, I'm not sure either. Although I suspect that given the cost of compression having an "allocate on first use" check should be quite doable. > > 0000000008435040 0000000000085280 b DCHCache > > 0000000008391168 0000000000043840 b NUMCache > > We could surely allocate these on first use. yep. > > 0000000008560224 0000000000023440 b tzdefrules_s > > 0000000008536704 0000000000023440 b gmtmem.7009 > > I think that tzdefrules_s is not used in common cases (though I could be > wrong about that), so we could win by alloc-on-first-use. The same might > be true for gmtmem, but there's a sticking point: there is no provision > for failure there, so I'm unsure how we avoid crashing on OOM. I guess we could return false / NULL to the caller. Not perfect, but there's not that many callers. We could wrap them in wrappers that throw errors... > > 0000000008238336 0000000000008192 b PqRecvBuffer > > 0000000008734208 0000000000005136 B BackendWritebackContext > > 0000000008386368 0000000000003200 b held_lwlocks > > These are below my personal threshold of pain. Yea, agreed. PqRecvBuffer and held_lwlocks are common enough that it makes sense to pre-allocate them anyway. I guess you could argue that BackendWritebackContext should be dynamically allocated. > > I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly > > section. > > I think it's the same problem: pointers can't be truly const because > they have to be changed if one relocates the executable. Right. It's, as I noticed when looking via objdupm, in .data.rel.ro, so I think it's not that bad. > We could possibly fix these by changing the data structure so that > what's in a ScanKeywords entry is an offset into some giant string > constant somewhere. No idea how that would affect performance, but > I do notice that we could reduce the sizeof(ScanKeyword), which can't > hurt. Yea, that might even help performancewise. Alternatively we could change ScanKeyword to store the keyword name inline, but that'd be a measurable size increase... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-10-15 16:36:26 -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> 0000000008585088 0000000000131104 b hist_entries >>> 0000000008716192 0000000000016384 b hist_start >> I'm unsure what fraction of processes would have use for these. > Yea, I'm not sure either. Although I suspect that given the cost of > compression having an "allocate on first use" check should be quite > doable. I don't think the extra check would be a problem, but if we end up needing the space in most processes, we're not really buying anything. It'd need some investigation. >> We could possibly fix these by changing the data structure so that >> what's in a ScanKeywords entry is an offset into some giant string >> constant somewhere. No idea how that would affect performance, but >> I do notice that we could reduce the sizeof(ScanKeyword), which can't >> hurt. > Yea, that might even help performancewise. Alternatively we could change > ScanKeyword to store the keyword name inline, but that'd be a measurable > size increase... Yeah. It also seems like doing it this way would improve locality of access: the pieces of the giant string would presumably be in the same order as the ScanKeywords entries, whereas with the current setup, who knows where the compiler has put 'em or in what order. We'd need some tooling to generate the constants that way, though; I can't see how to make it directly from kwlist.h. regards, tom lane
Hi, On 2018-10-15 16:54:53 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-10-15 16:36:26 -0400, Tom Lane wrote: > >> Andres Freund <andres@anarazel.de> writes: > >>> 0000000008585088 0000000000131104 b hist_entries > >>> 0000000008716192 0000000000016384 b hist_start > > >> I'm unsure what fraction of processes would have use for these. > > > Yea, I'm not sure either. Although I suspect that given the cost of > > compression having an "allocate on first use" check should be quite > > doable. > > I don't think the extra check would be a problem, but if we end up > needing the space in most processes, we're not really buying anything. > It'd need some investigation. I don't think it's particularly uncommon to have processes that don't use any toasted datums. I'm not sure however how to put numbers on that? After all, it'll be drastically workload dependant. > >> We could possibly fix these by changing the data structure so that > >> what's in a ScanKeywords entry is an offset into some giant string > >> constant somewhere. No idea how that would affect performance, but > >> I do notice that we could reduce the sizeof(ScanKeyword), which can't > >> hurt. > > > Yea, that might even help performancewise. Alternatively we could change > > ScanKeyword to store the keyword name inline, but that'd be a measurable > > size increase... > > Yeah. It also seems like doing it this way would improve locality of > access: the pieces of the giant string would presumably be in the same > order as the ScanKeywords entries, whereas with the current setup, > who knows where the compiler has put 'em or in what order. I assume you're talking about the offset approach. Performancewise I assume that my suggestion of inlining the names into the struct would be faster. Are there many realistic cases where performance matters enough to warrant the size increase? > We'd need some tooling to generate the constants that way, though; > I can't see how to make it directly from kwlist.h. I guess we could create a struct with all the strings as members. But that seems fairly nasty. Greetings, Andres Freund
On Tue, Oct 16, 2018 at 9:07 AM Andres Freund <andres@anarazel.de> wrote: > $ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[bB]\b'|head > 0000000008251872 0000000000131144 b LagTracker ... > So we have a two variables sized 130kb. Yikes. ... > that's not actually used very often, nor in all processes... Thomas? Yeah, here's a patch to move it in to the heap. -- Thomas Munro http://www.enterprisedb.com
Attachment
Andres Freund <andres@anarazel.de> writes: > On 2018-10-15 16:54:53 -0400, Tom Lane wrote: >> Yeah. It also seems like doing it this way would improve locality of >> access: the pieces of the giant string would presumably be in the same >> order as the ScanKeywords entries, whereas with the current setup, >> who knows where the compiler has put 'em or in what order. > I assume you're talking about the offset approach. Performancewise I > assume that my suggestion of inlining the names into the struct would be > faster. Are there many realistic cases where performance matters enough > to warrant the size increase? Doubt it, because there'd be an awful lot of wasted space due to the need to set the struct size large enough for the longest keyword. (Plus it would likely not come out to be a power-of-2 size, slowing array indexing.) If you want this to be cache-friendly, I'd think the smaller the better. regards, tom lane
On 2018-10-16 10:16:34 +1300, Thomas Munro wrote: > On Tue, Oct 16, 2018 at 9:07 AM Andres Freund <andres@anarazel.de> wrote: > > $ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[bB]\b'|head > > 0000000008251872 0000000000131144 b LagTracker > ... > > So we have a two variables sized 130kb. Yikes. > ... > > that's not actually used very often, nor in all processes... Thomas? > > Yeah, here's a patch to move it in to the heap. > > -- > Thomas Munro > http://www.enterprisedb.com > From f3fb64e67c1e86e11dfafc8a712e9750f650f60b Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@enterprisedb.com> > Date: Tue, 16 Oct 2018 10:08:57 +1300 > Subject: [PATCH] Move the replication lag tracker into heap memory. > > Andres Freund complained about the 128KB of .bss occupied by LagTracker. > It's only needed in the walsender process, so allocate it in heap > memory there. Cool, let's do that. Greetings, Andres Freund
On Tue, Oct 16, 2018 at 10:57 AM Andres Freund <andres@anarazel.de> wrote: > > Subject: [PATCH] Move the replication lag tracker into heap memory. > > > > Andres Freund complained about the 128KB of .bss occupied by LagTracker. > > It's only needed in the walsender process, so allocate it in heap > > memory there. > > Cool, let's do that. Pushed. -- Thomas Munro http://www.enterprisedb.com
I wrote: > Andres Freund <andres@anarazel.de> writes: >> 0000000008560224 0000000000023440 b tzdefrules_s >> 0000000008536704 0000000000023440 b gmtmem.7009 > I think that tzdefrules_s is not used in common cases (though I could be > wrong about that), so we could win by alloc-on-first-use. The same might > be true for gmtmem, but there's a sticking point: there is no provision > for failure there, so I'm unsure how we avoid crashing on OOM. So my intuition was exactly backwards on this. 1. tzdefrules_s is filled in the postmaster, and if it's not filled there it'd be filled by every child process, so there's zero point in converting it to malloc-on-the-fly style. This is because tzparse() always wants it filled. That's actually a tad annoying, because it looks to me like with many timezone settings the data will not get used, but I'm hesitant to mess with the IANA code's logic enough to improve that. Maybe I'll try submitting an upstream patch and see what they think of it first. 2. gmtmem, on the other hand, is only used if somebody calls pg_gmtime, which already has a perfectly good error-report convention, cf gmtime(3). We have exactly one caller, which was not bothering to test for error :-( and would dump core on failure. And that caller isn't reached in most processes, at least not in the regression tests. So this side of it is easy to improve. Hence I propose the attached patch for point 2, which I'd want to backpatch to keep our copies of the IANA code in sync. Point 1 maybe can be addressed in future. regards, tom lane diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index ba15a29..449164a 100644 *** a/src/backend/utils/adt/timestamp.c --- b/src/backend/utils/adt/timestamp.c *************** GetEpochTime(struct pg_tm *tm) *** 2008,2013 **** --- 2008,2016 ---- t0 = pg_gmtime(&epoch); + if (t0 == NULL) + elog(ERROR, "could not convert epoch to timestamp: %m"); + tm->tm_year = t0->tm_year; tm->tm_mon = t0->tm_mon; tm->tm_mday = t0->tm_mday; diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c index 31b06b0..04a1013 100644 *** a/src/timezone/localtime.c --- b/src/timezone/localtime.c *************** gmtsub(pg_time_t const *timep, int32 off *** 1328,1340 **** struct pg_tm *result; /* GMT timezone state data is kept here */ ! static struct state gmtmem; ! static bool gmt_is_set = false; ! #define gmtptr (&gmtmem) ! if (!gmt_is_set) { ! gmt_is_set = true; gmtload(gmtptr); } result = timesub(timep, offset, gmtptr, tmp); --- 1328,1341 ---- struct pg_tm *result; /* GMT timezone state data is kept here */ ! static struct state *gmtptr = NULL; ! if (gmtptr == NULL) { ! /* Allocate on first use. */ ! gmtptr = (struct state *) malloc(sizeof(struct state)); ! if (gmtptr == NULL) ! return NULL; /* errno should be set by malloc */ gmtload(gmtptr); } result = timesub(timep, offset, gmtptr, tmp);
Hi, I've a patch making .data variables constant, fwiw. Will post in a bit. Just so we don't duplicate work too much. On 2018-10-15 19:41:49 -0400, Tom Lane wrote: > I wrote: > > Andres Freund <andres@anarazel.de> writes: > >> 0000000008560224 0000000000023440 b tzdefrules_s > >> 0000000008536704 0000000000023440 b gmtmem.7009 > > > I think that tzdefrules_s is not used in common cases (though I could be > > wrong about that), so we could win by alloc-on-first-use. The same might > > be true for gmtmem, but there's a sticking point: there is no provision > > for failure there, so I'm unsure how we avoid crashing on OOM. > > So my intuition was exactly backwards on this. > > 1. tzdefrules_s is filled in the postmaster, and if it's not filled there > it'd be filled by every child process, so there's zero point in converting > it to malloc-on-the-fly style. This is because tzparse() always wants > it filled. That's actually a tad annoying, because it looks to me like > with many timezone settings the data will not get used, but I'm hesitant > to mess with the IANA code's logic enough to improve that. Maybe I'll try > submitting an upstream patch and see what they think of it first. Ok, that makes sense. > 2. gmtmem, on the other hand, is only used if somebody calls pg_gmtime, > which already has a perfectly good error-report convention, cf gmtime(3). > We have exactly one caller, which was not bothering to test for error :-( > and would dump core on failure. And that caller isn't reached in most > processes, at least not in the regression tests. So this side of it is > easy to improve. > > Hence I propose the attached patch for point 2, which I'd want to > backpatch to keep our copies of the IANA code in sync. That makes sense, too. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > top unitialized allocations: > 0000000008435040 0000000000085280 b DCHCache > 0000000008391168 0000000000043840 b NUMCache Here's a patch to improve that situation. regards, tom lane diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index b3ff133..43a0307 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -93,6 +93,7 @@ #include "utils/float.h" #include "utils/formatting.h" #include "utils/int8.h" +#include "utils/memutils.h" #include "utils/numeric.h" #include "utils/pg_locale.h" @@ -385,12 +386,12 @@ typedef struct } NUMCacheEntry; /* global cache for date/time format pictures */ -static DCHCacheEntry DCHCache[DCH_CACHE_ENTRIES]; +static DCHCacheEntry *DCHCache[DCH_CACHE_ENTRIES]; static int n_DCHCache = 0; /* current number of entries */ static int DCHCounter = 0; /* aging-event counter */ /* global cache for number format pictures */ -static NUMCacheEntry NUMCache[NUM_CACHE_ENTRIES]; +static NUMCacheEntry *NUMCache[NUM_CACHE_ENTRIES]; static int n_NUMCache = 0; /* current number of entries */ static int NUMCounter = 0; /* aging-event counter */ @@ -3368,13 +3369,13 @@ DCH_cache_getnew(const char *str) { DCHCacheEntry *ent; - /* counter overflow check - paranoia? */ + /* handle counter overflow by resetting all ages */ if (DCHCounter >= (INT_MAX - DCH_CACHE_ENTRIES)) { DCHCounter = 0; - for (ent = DCHCache; ent < (DCHCache + DCH_CACHE_ENTRIES); ent++) - ent->age = (++DCHCounter); + for (int i = 0; i < n_DCHCache; i++) + DCHCache[i]->age = (++DCHCounter); } /* @@ -3382,15 +3383,16 @@ DCH_cache_getnew(const char *str) */ if (n_DCHCache >= DCH_CACHE_ENTRIES) { - DCHCacheEntry *old = DCHCache + 0; + DCHCacheEntry *old = DCHCache[0]; #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "cache is full (%d)", n_DCHCache); #endif if (old->valid) { - for (ent = DCHCache + 1; ent < (DCHCache + DCH_CACHE_ENTRIES); ent++) + for (int i = 1; i < DCH_CACHE_ENTRIES; i++) { + ent = DCHCache[i]; if (!ent->valid) { old = ent; @@ -3414,7 +3416,9 @@ DCH_cache_getnew(const char *str) #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "NEW (%d)", n_DCHCache); #endif - ent = DCHCache + n_DCHCache; + Assert(DCHCache[n_DCHCache] == NULL); + DCHCache[n_DCHCache] = ent = (DCHCacheEntry *) + MemoryContextAllocZero(TopMemoryContext, sizeof(DCHCacheEntry)); ent->valid = false; StrNCpy(ent->str, str, DCH_CACHE_SIZE + 1); ent->age = (++DCHCounter); @@ -3428,20 +3432,19 @@ DCH_cache_getnew(const char *str) static DCHCacheEntry * DCH_cache_search(const char *str) { - int i; - DCHCacheEntry *ent; - - /* counter overflow check - paranoia? */ + /* handle counter overflow by resetting all ages */ if (DCHCounter >= (INT_MAX - DCH_CACHE_ENTRIES)) { DCHCounter = 0; - for (ent = DCHCache; ent < (DCHCache + DCH_CACHE_ENTRIES); ent++) - ent->age = (++DCHCounter); + for (int i = 0; i < n_DCHCache; i++) + DCHCache[i]->age = (++DCHCounter); } - for (i = 0, ent = DCHCache; i < n_DCHCache; i++, ent++) + for (int i = 0; i < n_DCHCache; i++) { + DCHCacheEntry *ent = DCHCache[i]; + if (ent->valid && strcmp(ent->str, str) == 0) { ent->age = (++DCHCounter); @@ -4047,13 +4050,13 @@ NUM_cache_getnew(const char *str) { NUMCacheEntry *ent; - /* counter overflow check - paranoia? */ + /* handle counter overflow by resetting all ages */ if (NUMCounter >= (INT_MAX - NUM_CACHE_ENTRIES)) { NUMCounter = 0; - for (ent = NUMCache; ent < (NUMCache + NUM_CACHE_ENTRIES); ent++) - ent->age = (++NUMCounter); + for (int i = 0; i < n_NUMCache; i++) + NUMCache[i]->age = (++NUMCounter); } /* @@ -4061,15 +4064,16 @@ NUM_cache_getnew(const char *str) */ if (n_NUMCache >= NUM_CACHE_ENTRIES) { - NUMCacheEntry *old = NUMCache + 0; + NUMCacheEntry *old = NUMCache[0]; #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "Cache is full (%d)", n_NUMCache); #endif if (old->valid) { - for (ent = NUMCache + 1; ent < (NUMCache + NUM_CACHE_ENTRIES); ent++) + for (int i = 1; i < NUM_CACHE_ENTRIES; i++) { + ent = NUMCache[i]; if (!ent->valid) { old = ent; @@ -4093,7 +4097,9 @@ NUM_cache_getnew(const char *str) #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, "NEW (%d)", n_NUMCache); #endif - ent = NUMCache + n_NUMCache; + Assert(NUMCache[n_NUMCache] == NULL); + NUMCache[n_NUMCache] = ent = (NUMCacheEntry *) + MemoryContextAllocZero(TopMemoryContext, sizeof(NUMCacheEntry)); ent->valid = false; StrNCpy(ent->str, str, NUM_CACHE_SIZE + 1); ent->age = (++NUMCounter); @@ -4107,20 +4113,19 @@ NUM_cache_getnew(const char *str) static NUMCacheEntry * NUM_cache_search(const char *str) { - int i; - NUMCacheEntry *ent; - - /* counter overflow check - paranoia? */ + /* handle counter overflow by resetting all ages */ if (NUMCounter >= (INT_MAX - NUM_CACHE_ENTRIES)) { NUMCounter = 0; - for (ent = NUMCache; ent < (NUMCache + NUM_CACHE_ENTRIES); ent++) - ent->age = (++NUMCounter); + for (int i = 0; i < n_NUMCache; i++) + NUMCache[i]->age = (++NUMCounter); } - for (i = 0, ent = NUMCache; i < n_NUMCache; i++, ent++) + for (int i = 0; i < n_NUMCache; i++) { + NUMCacheEntry *ent = NUMCache[i]; + if (ent->valid && strcmp(ent->str, str) == 0) { ent->age = (++NUMCounter);
Hi, On 2018-10-15 16:45:03 -0700, Andres Freund wrote: > I've a patch making .data variables constant, fwiw. Will post in a > bit. Just so we don't duplicate work too much. I pushed a few very obvious ones. An additional fix, for the system attributes in heap.c, is attached. But that's a bit more invasive, so I wanted to get some input. I think all of the changes are fairly obvious and the new const attributes just document what already had to be the case. But to correctly propagate the const through typedef-the-pointer-away typedefs, I had to use the underlying type (e.g. use const NameData *n2 instad of Name n2). To me that's the correct fix, but then I hate these typedefs with a passion anyway. An alternative fix is to add additional typedefs like ConstName, but I don't find that particularly advantageous. Comments? Arguably we should use consts more aggresively in signatures anyway, but I think that's a separate effort. Greetings, Andres Freund
Attachment
Hi, After the last few changes (including a proposed one), we now have in the .data segment (i.e. mutable initialized variables): $ objdump -j .data -t ~/build/postgres/dev-optimize/vpath/src/backend/postgres|awk '{print $4, $5, $6}'|sort -r|lessGreetings, .data 0000000000004380 ConfigureNamesInt .data 0000000000003570 ConfigureNamesBool .data 0000000000002140 ConfigureNamesString .data 00000000000010e0 ConfigureNamesEnum .data 0000000000000d20 ConfigureNamesReal These are by far the largest chunk of the .data segment (like 47152 bytes out of 51168 bytes). It's not entirely trivial to fix. While we can slap consts onto large parts of the functions taking a struct config_generic*, there's a few places where *do* modify them. ISTM, to actually fix, we'd have to split struct config_generic { /* constant fields, must be set correctly in initial value: */ const char *name; /* name of variable - MUST BE FIRST */ GucContext context; /* context required to set the variable */ enum config_group group; /* to help organize variables by function */ const char *short_desc; /* short desc. of this variable's purpose */ const char *long_desc; /* long desc. of this variable's purpose */ int flags; /* flag bits, see guc.h */ /* variable fields, initialized at runtime: */ enum config_type vartype; /* type of variable (set only at startup) */ int status; /* status bits, see below */ GucSource source; /* source of the current actual value */ GucSource reset_source; /* source of the reset_value */ GucContext scontext; /* context that set the current value */ GucContext reset_scontext; /* context that set the reset value */ GucStack *stack; /* stacked prior values */ void *extra; /* "extra" pointer for current actual value */ char *sourcefile; /* file current setting is from (NULL if not * set in config file) */ int sourceline; /* line in source file */ }; into two different arrays. Namely the already denoted 'constant fields' and 'variable fields. But it's a bit more complicated than that: The size doesn't come just from the base config_struct, but also from the config_{bool,int,real,string,enum} arrays. Where we again have separate 'constant fields' and 'variable fields'. So we'd have to refactor more heavily than just splitting the above array into two. I kind of wonder, if we shouldn't remove the separate ConfigureNamesInt* int arrays and change things so we have one struct config_def { const char *name; /* name of variable - MUST BE FIRST */ GucContext context; /* context required to set the variable */ enum config_group group; /* to help organize variables by function */ const char *short_desc; /* short desc. of this variable's purpose */ const char *long_desc; /* long desc. of this variable's purpose */ int flags; /* flag bits, see guc.h */ enum config_type vartype; /* type of variable (set only at startup) */ union { struct { bool *variable; bool boot_val; GucBoolCheckHook check_hook; GucBoolAssignHook assign_hook; GucShowHook show_hook; } config_bool; struct { int *variable; int boot_val; int min; int max; GucIntCheckHook check_hook; GucIntAssignHook assign_hook; GucShowHook show_hook; } config_int ... } pertype; } and then a corresponding struct config_val with a similar union. Besides reducing modified memory, it'd also have the benefit of making the grouping within the various arrays much more sensible, because related fields could be grouped together. .data 0000000000000420 intRelOpts .data 00000000000001c0 realRelOpts .data 0000000000000118 boolRelOpts .data 00000000000000a8 stringRelOpts these should be readonly, but the code doesn't make that easy. There's pending refactorings, I don't know how they effect this. .data 0000000000000068 SnapshotSelfData .data 0000000000000068 SnapshotAnyData .data 0000000000000068 SecondarySnapshotData .data 0000000000000068 CurrentSnapshotData .data 0000000000000068 CatalogSnapshotData The obviously actually are modifyable. .data 0000000000000028 spi_printtupDR .data 0000000000000028 printsimpleDR .data 0000000000000028 donothingDR .data 0000000000000028 debugtupDR These we could actually make constant, but CreateDestReceiver() as an API makes that inconvenient. They also are pretty darn small... There's a security benefit in making them constant and casting the constness away - I think that might not be insane. - Andres
Hi, On 2018-10-15 21:50:51 -0700, Andres Freund wrote: > .data 0000000000000028 spi_printtupDR > .data 0000000000000028 printsimpleDR > .data 0000000000000028 donothingDR > .data 0000000000000028 debugtupDR > > These we could actually make constant, but CreateDestReceiver() as an > API makes that inconvenient. They also are pretty darn small... There's > a security benefit in making them constant and casting the constness > away - I think that might not be insane. I.e. do something like the attached. Greetings, Andres Freund
Attachment
Andres Freund <andres@anarazel.de> writes: > [ let's rearrange the GUC structs ] I find it hard to believe that the API breaks you'll cause for extensions are a good trade for a few kB reduction in the size of the .data segment. regards, tom lane
I wrote: > Andres Freund <andres@anarazel.de> writes: >> top unitialized allocations: >> 0000000008435040 0000000000085280 b DCHCache >> 0000000008391168 0000000000043840 b NUMCache > Here's a patch to improve that situation. Hm, looking at that more closely, there's a problem with the idea of allocating the cache slots one-at-a-time. Currently, sizeof(DCHCacheEntry) and sizeof(NUMCacheEntry) are each just a bit more than a power of 2, which would cause palloc to waste nearly 50% of the allocation it makes for them. We could forget the one-at-a-time idea and just allocate the whole array on first use, but I feel like that's probably not a good answer. I'm inclined to instead reduce DCH_CACHE_SIZE and NUM_CACHE_SIZE enough to make the structs just less than powers of 2 instead of just more --- AFAICS, both of those numbers were picked out of the air, and so there's no reason not to pick a different number out of the air. Also, I noticed that the biggest part of those structs are arrays of FormatNode, which has been designed with complete lack of thought about size or padding issues. We can very easily cut it in half on 64-bit machines. The attached patch does both of those things. It's independent of the other one but is important to make that one efficient. regards, tom lane diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index b3ff133..9c2649e 100644 *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *************** *** 124,133 **** */ typedef struct { ! char *name; /* suffix string */ int len, /* suffix length */ id, /* used in node->suffix */ ! type; /* prefix / postfix */ } KeySuffix; /* ---------- --- 124,133 ---- */ typedef struct { ! const char *name; /* suffix string */ int len, /* suffix length */ id, /* used in node->suffix */ ! type; /* prefix / postfix */ } KeySuffix; /* ---------- *************** typedef struct *** 155,164 **** typedef struct { ! int type; /* NODE_TYPE_XXX, see below */ ! const KeyWord *key; /* if type is ACTION */ char character[MAX_MULTIBYTE_CHAR_LEN + 1]; /* if type is CHAR */ ! int suffix; /* keyword prefix/suffix code, if any */ } FormatNode; #define NODE_TYPE_END 1 --- 155,164 ---- typedef struct { ! uint8 type; /* NODE_TYPE_XXX, see below */ char character[MAX_MULTIBYTE_CHAR_LEN + 1]; /* if type is CHAR */ ! uint8 suffix; /* keyword prefix/suffix code, if any */ ! const KeyWord *key; /* if type is ACTION */ } FormatNode; #define NODE_TYPE_END 1 *************** typedef struct *** 358,370 **** * For simplicity, the cache entries are fixed-size, so they allow for the * worst case of a FormatNode for each byte in the picture string. * * The max number of entries in the caches is DCH_CACHE_ENTRIES * resp. NUM_CACHE_ENTRIES. * ---------- */ ! #define NUM_CACHE_SIZE 64 #define NUM_CACHE_ENTRIES 20 ! #define DCH_CACHE_SIZE 128 #define DCH_CACHE_ENTRIES 20 typedef struct --- 358,374 ---- * For simplicity, the cache entries are fixed-size, so they allow for the * worst case of a FormatNode for each byte in the picture string. * + * The CACHE_SIZE constants are chosen to make sizeof(DCHCacheEntry) and + * sizeof(NUMCacheEntry) be powers of 2, or just less than that, so that + * we don't waste too much space by palloc'ing them individually. + * * The max number of entries in the caches is DCH_CACHE_ENTRIES * resp. NUM_CACHE_ENTRIES. * ---------- */ ! #define NUM_CACHE_SIZE 56 #define NUM_CACHE_ENTRIES 20 ! #define DCH_CACHE_SIZE 119 #define DCH_CACHE_ENTRIES 20 typedef struct *************** do { \ *** 496,502 **** *****************************************************************************/ /* ---------- ! * Suffixes: * ---------- */ #define DCH_S_FM 0x01 --- 500,506 ---- *****************************************************************************/ /* ---------- ! * Suffixes (FormatNode.suffix is an OR of these codes) * ---------- */ #define DCH_S_FM 0x01
Hi, On 2018-10-16 01:49:22 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > [ let's rearrange the GUC structs ] > > I find it hard to believe that the API breaks you'll cause for extensions > are a good trade for a few kB reduction in the size of the .data segment. I'm doubtful that it's worth it too. But more because it seems like a fair bit of work. I don't think that many extensions use guc_tables.h - shouldn't most use guc.h, which wouldn't be affected? Greetings, Andres Freund
On 2018-10-15 22:02:00 -0700, Andres Freund wrote: > Hi, > > On 2018-10-15 21:50:51 -0700, Andres Freund wrote: > > .data 0000000000000028 spi_printtupDR > > .data 0000000000000028 printsimpleDR > > .data 0000000000000028 donothingDR > > .data 0000000000000028 debugtupDR > > > > These we could actually make constant, but CreateDestReceiver() as an > > API makes that inconvenient. They also are pretty darn small... There's > > a security benefit in making them constant and casting the constness > > away - I think that might not be insane. > > I.e. do something like the attached. This just reminded me that a couple times I wanted a cast that casts away const, but otherwise makes sure the type stays the same. I don't think there's a way to do that in C, but we can write one that verifies the cast doesn't do something bad if gcc is used: #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) #define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst)(var) #else #define unconstify(cst, var) ((cst) (var)) #endif Does anybody besides me see value in adding a cleaned up version of that? Greetings, Andres Freund
Hi, On 2018-10-16 01:59:00 -0400, Tom Lane wrote: > I wrote: > > Andres Freund <andres@anarazel.de> writes: > >> top unitialized allocations: > >> 0000000008435040 0000000000085280 b DCHCache > >> 0000000008391168 0000000000043840 b NUMCache > > > Here's a patch to improve that situation. > > Hm, looking at that more closely, there's a problem with the idea of > allocating the cache slots one-at-a-time. Currently, > sizeof(DCHCacheEntry) and sizeof(NUMCacheEntry) are each just a bit more > than a power of 2, which would cause palloc to waste nearly 50% of the > allocation it makes for them. Hm, that's a bit annoying... > We could forget the one-at-a-time idea and just allocate the whole > array on first use, but I feel like that's probably not a good answer. I suspect it'd be fine, but obviously we can do better. > Also, I noticed that the biggest part of those structs are arrays of > FormatNode, which has been designed with complete lack of thought about > size or padding issues. We can very easily cut it in half on 64-bit > machines. Heh, neat. I feel like we've paid very little attention to that in a myriad of places :( Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-10-16 01:59:00 -0400, Tom Lane wrote: >> Also, I noticed that the biggest part of those structs are arrays of >> FormatNode, which has been designed with complete lack of thought about >> size or padding issues. We can very easily cut it in half on 64-bit >> machines. > Heh, neat. I feel like we've paid very little attention to that in a > myriad of places :( Most of the time, we probably *shouldn't* pay attention to it; logical field ordering is worth a good deal IMO. But in a case like this, where there are large arrays of the things and it's not very painful to avoid padding waste, it's worth the trouble. regards, tom lane
On Tue, Oct 16, 2018 at 2:30 AM Andres Freund <andres@anarazel.de> wrote: > This just reminded me that a couple times I wanted a cast that casts > away const, but otherwise makes sure the type stays the same. I don't > think there's a way to do that in C, but we can write one that verifies > the cast doesn't do something bad if gcc is used: > > #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) > #define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst)(var) > #else > #define unconstify(cst, var) ((cst) (var)) > #endif > > Does anybody besides me see value in adding a cleaned up version of > that? Under what circumstances would we consider this to be a legitimate thing to use? I think if we add something this, we'd better accompany it with some detailed and very clearly-written statements about when you're allowed to use it. Otherwise, I predict that people will use it in cases where it's not actually safe, and we'll end up with low-grade bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-10-16 11:22:31 -0400, Robert Haas wrote: > On Tue, Oct 16, 2018 at 2:30 AM Andres Freund <andres@anarazel.de> wrote: > > This just reminded me that a couple times I wanted a cast that casts > > away const, but otherwise makes sure the type stays the same. I don't > > think there's a way to do that in C, but we can write one that verifies > > the cast doesn't do something bad if gcc is used: > > > > #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) > > #define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"),(cst) (var) > > #else > > #define unconstify(cst, var) ((cst) (var)) > > #endif > > > > Does anybody besides me see value in adding a cleaned up version of > > that? > > Under what circumstances would we consider this to be a legitimate thing to use? When the variable actually *will not* be modified, but language or API design reasons makes it unfeasiable to express that. Look e.g. DestReceiver * CreateDestReceiver(CommandDest dest); some of the returned receivers (e.g. donothingDR, printsimpleDR) are statically allocated and *any* modification would be a bug. But other return values will be modified, e.g. CreateIntoRelDestReceiver(). It's safe to cast constness away if the variable will not actually be modified after. Which is e.g. the case above. But making the static allocations const will a) save memory b) trigger sigbuses if you modify them. So the casting constness away here *increases* robustness. The problem is that just adding a cast like case DestNone: return (DestReceiver *) &donothingDR; also hides errors. If you e.g. changed the type of donothingDR you'd still not get an error. So I was wishing for a form of a cast that only casts the const away, but errors out if there's any other type difference. That's the above, I think. > I think if we add something this, we'd better accompany it with some > detailed and very clearly-written statements about when you're allowed > to use it. Otherwise, I predict that people will use it in cases > where it's not actually safe, and we'll end up with low-grade bugs. Well, right now people will (and have) just cast the const away like above. So I don't really see it being more likely to cause problems than we're doing now. But yea, it definitely should have a big red warning label. Greetings, Andres Freund
On Tue, Oct 16, 2018 at 12:06 PM Andres Freund <andres@anarazel.de> wrote: > But yea, it definitely should have a big red warning > label. That's all I'm asking. And the warning label shouldn't just say "use with caution!" but should rather explain how to know whether you're doing the right thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-10-16 12:19:55 -0400, Robert Haas wrote: > On Tue, Oct 16, 2018 at 12:06 PM Andres Freund <andres@anarazel.de> wrote: > > But yea, it definitely should have a big red warning > > label. > > That's all I'm asking. And the warning label shouldn't just say "use > with caution!" but should rather explain how to know whether you're > doing the right thing. I'm thinking of something like: /* * Macro that allows to cast constness away from a variable, but doesn't * allow changing the underlying type. Enforcement of the latter * currently only works for gcc like compilers. * * Please note IT IS NOT SAFE to cast constness away if the variable will ever * be modified (it would be undefined behaviour). Doing so anyway can cause * compiler misoptimizations or runtime crashes (modifying readonly memory). * It is only safe to use when the the variabble will not be modified, but API * design or language restrictions prevent you from declaring that * (e.g. because a function returns both const and non-const variables). */ Greetings, Andres Freund
On Tue, Oct 16, 2018 at 12:26 PM Andres Freund <andres@anarazel.de> wrote: > /* > * Macro that allows to cast constness away from a variable, but doesn't > * allow changing the underlying type. Enforcement of the latter > * currently only works for gcc like compilers. > * > * Please note IT IS NOT SAFE to cast constness away if the variable will ever > * be modified (it would be undefined behaviour). Doing so anyway can cause > * compiler misoptimizations or runtime crashes (modifying readonly memory). > * It is only safe to use when the the variabble will not be modified, but API > * design or language restrictions prevent you from declaring that > * (e.g. because a function returns both const and non-const variables). > */ "variabble" is a little too rich in "b"s. In terms of a function that returns both const and non-const variables, it seems a bit sketchy that the caller would know what the function is doing in particular cases and make decisions based on it, but maybe that's just how life is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2018-10-16 12:41:44 -0400, Robert Haas wrote: > On Tue, Oct 16, 2018 at 12:26 PM Andres Freund <andres@anarazel.de> wrote: > > /* > > * Macro that allows to cast constness away from a variable, but doesn't > > * allow changing the underlying type. Enforcement of the latter > > * currently only works for gcc like compilers. > > * > > * Please note IT IS NOT SAFE to cast constness away if the variable will ever > > * be modified (it would be undefined behaviour). Doing so anyway can cause > > * compiler misoptimizations or runtime crashes (modifying readonly memory). > > * It is only safe to use when the the variabble will not be modified, but API > > * design or language restrictions prevent you from declaring that > > * (e.g. because a function returns both const and non-const variables). > > */ > > "variabble" is a little too rich in "b"s. variababble. > In terms of a function that returns both const and non-const > variables, it seems a bit sketchy that the caller would know what the > function is doing in particular cases and make decisions based on it, > but maybe that's just how life is. I don't think it's necessary the callers doing so in most cases. E.g. in the DestReceiver case, it'll be the choice of the testreceiver (say intorel_receive modifying things for DestIntoRel), not the caller choosing when to modify things. The caller / users of dest receivers won't necessarily know. I agree it's not pretty, but I don't quite see any really realistic other approaches here. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-10-16 12:41:44 -0400, Robert Haas wrote: >> In terms of a function that returns both const and non-const >> variables, it seems a bit sketchy that the caller would know what the >> function is doing in particular cases and make decisions based on it, >> but maybe that's just how life is. > I don't think it's necessary the callers doing so in most cases. E.g. in > the DestReceiver case, it'll be the choice of the testreceiver (say > intorel_receive modifying things for DestIntoRel), not the caller > choosing when to modify things. The caller / users of dest receivers > won't necessarily know. Yeah, I think the use-case is more like "this API specifies non-const pointers, but some instances can return pointers to const objects, while others return non-const objects". Changing the API to const isn't better, it just moves where you have to cast away const. regards, tom lane
On Mon, Oct 15, 2018 at 4:08 PM Andres Freund <andres@anarazel.de> wrote: > So we have 500kb of not-initialized memory mapped into every > process. That's, uh, not nothing. Thinking about this a bit more, why is this bad? I mean, if the memory is never touched, the OS does not really need to allocate or zero any pages, or even make any page table entries. If somebody actually accesses the data, then we'll take a page fault and have to really allocate, but otherwise I would think we could have 50MB of unused bss floating around and it wouldn't really matter, let alone 500kB. What am I missing? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2018-10-15 13:07:54 -0700, Andres Freund wrote: > Hi, > > while briefly thinking about the per-process overhead of postgres, I > looked at the section sizes of a modern postgres. In particular which > memory areas are *not* shared between processes. > > If you look at the section sizes that are mapped read-write: > > $ size --format=sysv src/backend/postgres > src/backend/postgres : > section size addr > .rodata 1593617 5730304 (read-only, for reference) > .data.rel.ro 134944 8039904 (read only after start) > .data 53888 8178912 (read-write, initialized) > .bss 510416 8232800 (read-write, uninitialized) > Total 52417197 We improve a fair bit over the last two days: text data bss dec hex filename 8030289 191888 227024 8449201 80ecb1 src/backend/postgres 8029097 192880 510416 8732393 853ee9 src/backend/postgres.before breakdown: section size addr .rodata 1594577 5730304 (read-only, for reference) .data.rel.ro 136928 8037920 (read only after start) .data 50912 8178912 (read-write, initialized) .bss 227024 8229824 (read-write, uninitialized) As to be expected .rodata and .data.rel.ro gained a bit in size (as they contain constant data and we made more vars constant), whereas .data shrunk a bit, and .bss shrunk drastically by moving things to be dynamically allocated. Nice progress, thanks everyone! The top .bss entries (uninitialized modifyable vars) are now: .bss 0000000000020020 hist_entries .bss 0000000000005b90 tzdefrules_s .bss 0000000000004000 hist_start .bss 0000000000002000 PqRecvBuffer .bss 0000000000001410 BackendWritebackContext .bss 0000000000000c80 held_lwlocks .bss 0000000000000b00 re_array .bss 0000000000000800 buffer_lists .bss 0000000000000600 syscache_callback_list .bss 0000000000000560 reverse_dispatch_table .bss 0000000000000400 tzdir.7228 .bss 0000000000000400 pkglib_path .bss 0000000000000400 OutputFileName .bss 0000000000000400 my_exec_path .bss 0000000000000400 ExtraOptions .bss 0000000000000398 errordata .bss 0000000000000320 seq_scan_tables The top .data entries (initialized modifyable vars) are now: .data 0000000000004380 ConfigureNamesInt .data 0000000000003570 ConfigureNamesBool .data 0000000000002140 ConfigureNamesString .data 00000000000010e0 ConfigureNamesEnum .data 0000000000000d20 ConfigureNamesReal .data 0000000000000420 intRelOpts .data 00000000000001c0 realRelOpts .data 0000000000000118 boolRelOpts .data 00000000000000a8 stringRelOpts .data 0000000000000068 SnapshotSelfData .data 0000000000000068 SnapshotAnyData .data 0000000000000068 SecondarySnapshotData .data 0000000000000068 CurrentSnapshotData .data 0000000000000068 CatalogSnapshotData - Andres
Hi, On 2018-10-16 15:13:43 -0400, Robert Haas wrote: > On Mon, Oct 15, 2018 at 4:08 PM Andres Freund <andres@anarazel.de> wrote: > > So we have 500kb of not-initialized memory mapped into every > > process. That's, uh, not nothing. > > Thinking about this a bit more, why is this bad? I mean, if the > memory is never touched, the OS does not really need to allocate or > zero any pages, or even make any page table entries. If somebody > actually accesses the data, then we'll take a page fault and have to > really allocate, but otherwise I would think we could have 50MB of > unused bss floating around and it wouldn't really matter, let alone > 500kB. > > What am I missing? For one the OS will actually reserve all that memory when you use memory overcommit = 2 (which you should). There's no need to reserve memory for data that's guaranteed to be shared. It also reduces the frequency of page faults when looking at other global variables laid out nearby - we could do something about that by giving the linker placement instructions, but that's also work. The size of the pagetable also has performance effects, although it'll often be dwarfed by the shared buffers entry if not using huge pages. WRT .data entries, marking them as const where appropriate both reduces memory usage, possibly allows more compiler optimizations, and allows to catch bugs. There's also the fact that not every OS has COW pagetables, or we use processes in a way that don't allow that (say windows, where we don't fork). Greetings, Andres Freund
On 2018-10-16 10:16:33 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-10-16 01:59:00 -0400, Tom Lane wrote: > >> Also, I noticed that the biggest part of those structs are arrays of > >> FormatNode, which has been designed with complete lack of thought about > >> size or padding issues. We can very easily cut it in half on 64-bit > >> machines. > > > Heh, neat. I feel like we've paid very little attention to that in a > > myriad of places :( > > Most of the time, we probably *shouldn't* pay attention to it; logical > field ordering is worth a good deal IMO. Sure. But there's plenty structs which we allocate a bunch off, that are frequently accessed, where a lot of space is wasted to padding. I agree that we don't need to contort many structs, but there's plenty where we should. Often enough it's possible to reorder without making things make meaningfully less sense. > But in a case like this, > where there are large arrays of the things and it's not very painful > to avoid padding waste, it's worth the trouble. Attached is a patch that shrinks fmgr_builtins by 25%. That seems worthwhile, it's pretty frequently accessed, making it more dense is helpful. Unless somebody protests soon, I'm going to apply that... Greetings, Andres Freund
Attachment
Andres Freund <andres@anarazel.de> writes: > Attached is a patch that shrinks fmgr_builtins by 25%. That seems > worthwhile, it's pretty frequently accessed, making it more dense is > helpful. Unless somebody protests soon, I'm going to apply that... Hah. I'm pretty sure that struct *was* set up with an eye to padding ... on 32-bit machines. This does make it shorter on 64-bit, but also makes the size not a power of 2, which might add a few cycles to array indexing calculations. Might be worth checking whether that's going to be an issue anywhere. What's the point of the extra const decoration on funcName? ISTM the whole struct should be const, or not. regards, tom lane
On 17/10/2018 09:36, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> Attached is a patch that shrinks fmgr_builtins by 25%. That seems >> worthwhile, it's pretty frequently accessed, making it more dense is >> helpful. Unless somebody protests soon, I'm going to apply that... > Hah. I'm pretty sure that struct *was* set up with an eye to padding ... > on 32-bit machines. This does make it shorter on 64-bit, but also > makes the size not a power of 2, which might add a few cycles to > array indexing calculations. Might be worth checking whether that's > going to be an issue anywhere. > > What's the point of the extra const decoration on funcName? ISTM > the whole struct should be const, or not. > > regards, tom lane > Would it be useful to add dummy variable(s) to bring it up to a power of 2? Cheers, Gavin
Hi, On 2018-10-17 09:38:18 +1300, Gavin Flower wrote: > On 17/10/2018 09:36, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > Attached is a patch that shrinks fmgr_builtins by 25%. That seems > > > worthwhile, it's pretty frequently accessed, making it more dense is > > > helpful. Unless somebody protests soon, I'm going to apply that... > > Hah. I'm pretty sure that struct *was* set up with an eye to padding ... > > on 32-bit machines. This does make it shorter on 64-bit, but also > > makes the size not a power of 2, which might add a few cycles to > > array indexing calculations. Might be worth checking whether that's > > going to be an issue anywhere. > > > > What's the point of the extra const decoration on funcName? ISTM > > the whole struct should be const, or not. > Would it be useful to add dummy variable(s) to bring it up to a power of 2? Err. Reread what we're talking about. The point was to reduce the size, it's a power of two right now (32). We could of course also just do nothing (re-add a dummy variable), which would, drumroll, do nothing. Greetings, Andres Freund
On 2018-10-16 16:36:12 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Attached is a patch that shrinks fmgr_builtins by 25%. That seems > > worthwhile, it's pretty frequently accessed, making it more dense is > > helpful. Unless somebody protests soon, I'm going to apply that... > > Hah. I'm pretty sure that struct *was* set up with an eye to padding ... > on 32-bit machines. Possible, the new layout should work just as well there, luckily. > This does make it shorter on 64-bit, but also > makes the size not a power of 2, which might add a few cycles to > array indexing calculations. Might be worth checking whether that's > going to be an issue anywhere. I can't imagine that that outweight the cost of additional cache misses on any platform where performance matters. On x86 I assume indexing into an array with 24byte stride, will normally be just two leas (lea eax, [eax + eax * 2]; lea eax, [ebx + eax * 8]; where eax initially is the index, and ebx the array base). Indexing also plays less of a role than in the past, because previously we did a binary search, but now we normally look up the index via fmgr_builtin_oid_index. > What's the point of the extra const decoration on funcName? ISTM > the whole struct should be const, or not. The whole array is const. There's cases where that allows the compiler more freedom - but I guess it's really a bit redundant here. Greetings, Andres Freund
I wrote: > 1. tzdefrules_s is filled in the postmaster, and if it's not filled there > it'd be filled by every child process, so there's zero point in converting > it to malloc-on-the-fly style. This is because tzparse() always wants > it filled. That's actually a tad annoying, because it looks to me like > with many timezone settings the data will not get used, but I'm hesitant > to mess with the IANA code's logic enough to improve that. Maybe I'll try > submitting an upstream patch and see what they think of it first. I spent some time studying this, and realized that the reason that they always try to load the TZDEFRULES zone data is that they want to absorb whatever leap-seconds data it has. We don't want that and would be glad to just drop said data, but I'm sure that's intentional on their end, so they are not going to be interested in a patch here. However ... if you assume that any leap-seconds data in that zone can be ignored, then the only case where the data need be loaded is when parsing a POSIX-style zone name that has a DST component but does not specify a POSIX-style transition date rule. That's possible in our code but I do not think it is a mainstream case; in particular, that seems never to be reached when loading a real zone data file. Hence, the attached patch postpones the TZDEFRULES load until we find we actually need it, and ignores any leap-second data therein, and incidentally makes the data space be malloc-on-demand rather than static. This is actually less of a hack compared to the upstream code than what we were doing before, so I feel pretty pleased with it. regards, tom lane diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c index a2260e5..e3029d8 100644 *** a/src/timezone/localtime.c --- b/src/timezone/localtime.c *************** static const char gmt[] = "GMT"; *** 54,60 **** * PG: We cache the result of trying to load the TZDEFRULES zone here. * tzdefrules_loaded is 0 if not tried yet, +1 if good, -1 if failed. */ ! static struct state tzdefrules_s; static int tzdefrules_loaded = 0; /* --- 54,60 ---- * PG: We cache the result of trying to load the TZDEFRULES zone here. * tzdefrules_loaded is 0 if not tried yet, +1 if good, -1 if failed. */ ! static struct state *tzdefrules_s = NULL; static int tzdefrules_loaded = 0; /* *************** tzparse(const char *name, struct state * *** 908,927 **** stdname = name; if (lastditch) { ! /* ! * This is intentionally somewhat different from the IANA code. We do ! * not want to invoke tzload() in the lastditch case: we can't assume ! * pg_open_tzfile() is sane yet, and we don't care about leap seconds ! * anyway. ! */ stdlen = strlen(name); /* length of standard zone name */ name += stdlen; - if (stdlen >= sizeof sp->chars) - stdlen = (sizeof sp->chars) - 1; - charcnt = stdlen + 1; stdoffset = 0; - sp->goback = sp->goahead = false; /* simulate failed tzload() */ - load_ok = false; } else { --- 908,917 ---- stdname = name; if (lastditch) { ! /* Unlike IANA, don't assume name is exactly "GMT" */ stdlen = strlen(name); /* length of standard zone name */ name += stdlen; stdoffset = 0; } else { *************** tzparse(const char *name, struct state * *** 945,971 **** name = getoffset(name, &stdoffset); if (name == NULL) return false; - charcnt = stdlen + 1; - if (sizeof sp->chars < charcnt) - return false; - - /* - * This bit also differs from the IANA code, which doesn't make any - * attempt to avoid repetitive loadings of the TZDEFRULES zone. - */ - if (tzdefrules_loaded == 0) - { - if (tzload(TZDEFRULES, NULL, &tzdefrules_s, false) == 0) - tzdefrules_loaded = 1; - else - tzdefrules_loaded = -1; - } - load_ok = (tzdefrules_loaded > 0); - if (load_ok) - memcpy(sp, &tzdefrules_s, sizeof(struct state)); } ! if (!load_ok) ! sp->leapcnt = 0; /* so, we're off a little */ if (*name != '\0') { if (*name == '<') --- 935,957 ---- name = getoffset(name, &stdoffset); if (name == NULL) return false; } ! charcnt = stdlen + 1; ! if (sizeof sp->chars < charcnt) ! return false; ! ! /* ! * The IANA code always tries tzload(TZDEFRULES) here. We do not want to ! * do that; it would be bad news in the lastditch case, where we can't ! * assume pg_open_tzfile() is sane yet. Moreover, the only reason to do ! * it unconditionally is to absorb the TZDEFRULES zone's leap second info, ! * which we don't want to do anyway. Without that, we only need to load ! * TZDEFRULES if the zone name specifies DST but doesn't incorporate a ! * POSIX-style transition date rule, which is not a common case. ! */ ! sp->goback = sp->goahead = false; /* simulate failed tzload() */ ! sp->leapcnt = 0; /* intentionally assume no leap seconds */ ! if (*name != '\0') { if (*name == '<') *************** tzparse(const char *name, struct state * *** 996,1003 **** } else dstoffset = stdoffset - SECSPERHOUR; ! if (*name == '\0' && !load_ok) ! name = TZDEFRULESTRING; if (*name == ',' || *name == ';') { struct rule start; --- 982,1019 ---- } else dstoffset = stdoffset - SECSPERHOUR; ! if (*name == '\0') ! { ! /* ! * The POSIX zone name does not provide a transition-date rule. ! * Here we must load the TZDEFRULES zone, if possible, to serve as ! * source data for the transition dates. Unlike the IANA code, we ! * try to cache the data so it's only loaded once. ! */ ! if (tzdefrules_loaded == 0) ! { ! /* Allocate on first use */ ! if (tzdefrules_s == NULL) ! tzdefrules_s = (struct state *) malloc(sizeof(struct state)); ! if (tzdefrules_s != NULL) ! { ! if (tzload(TZDEFRULES, NULL, tzdefrules_s, false) == 0) ! tzdefrules_loaded = 1; ! else ! tzdefrules_loaded = -1; ! /* In any case, we ignore leap-second data from the file */ ! tzdefrules_s->leapcnt = 0; ! } ! } ! load_ok = (tzdefrules_loaded > 0); ! if (load_ok) ! memcpy(sp, tzdefrules_s, sizeof(struct state)); ! else ! { ! /* If we can't load TZDEFRULES, fall back to hard-wired rule */ ! name = TZDEFRULESTRING; ! } ! } if (*name == ',' || *name == ';') { struct rule start;
BTW, I looked around for .o files with large BSS numbers, and came across $ size src/interfaces/ecpg/ecpglib/prepare.o text data bss dec hex filename 4023 4 1048576 1052603 100fbb src/interfaces/ecpg/ecpglib/prepare.o That megabyte is from a statically allocated statement cache array. Seems a bit unfriendly to users of ecpglib, given that many apps would never use the statement cache (AFAICT you have to explicitly ask for auto-prepare mode to get to that code). Doesn't look hard to fix though. regards, tom lane
On 2018-10-16 19:48:42 -0400, Tom Lane wrote: > BTW, I looked around for .o files with large BSS numbers, and came across > > $ size src/interfaces/ecpg/ecpglib/prepare.o > text data bss dec hex filename > 4023 4 1048576 1052603 100fbb src/interfaces/ecpg/ecpglib/prepare.o Btw, I think one can fairly argue that large .data vars have a higher cost than .bss. That's not to say we shouldn't fix .bss ones ;) I've this awful shell script to figure out the sizes: for file in $(find . -type f -name '*.o'); do objdump -j .data -t $file|tail -n +5|grep -v '^$'|awk -v "file=$file" '{print$4, $5, $6, file}';done|sort -nr|less .data 0000000000001180 datetktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o .data 0000000000000c28 ibmkanji ./src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.o .data 00000000000003f0 deltatktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o .data 0000000000000100 sqlca_init ./src/interfaces/ecpg/ecpglib/misc.o .data 0000000000000100 sqlca_init ./src/interfaces/ecpg/compatlib/informix.o .data 0000000000000068 day_tab ./src/interfaces/ecpg/pgtypeslib/dt_common.o .data 0000000000000030 ecpg_query ./src/interfaces/ecpg/preproc/preproc.o .data 0000000000000030 ecpg_no_indicator ./src/interfaces/ecpg/preproc/preproc.o .data 0000000000000030 descriptor_type.5770 ./src/interfaces/ecpg/preproc/descriptor.o .data 000000000000002b ssl_nomem ./src/interfaces/libpq/fe-secure-openssl.o .data 000000000000002a PLy_subtransaction_doc ./src/pl/plpython/plpy_subxactobject.o .data 0000000000000023 PLy_cursor_doc ./src/pl/plpython/plpy_cursorobject.o .data 000000000000001e PLy_result_doc ./src/pl/plpython/plpy_resultobject.o .data 0000000000000018 PLy_plan_doc ./src/pl/plpython/plpy_planobject.o .data 0000000000000010 base_yylloc ./src/interfaces/ecpg/preproc/preproc.o Should also work for other sections. Hm, there's a lot of ecpg in this :( > That megabyte is from a statically allocated statement cache array. > Seems a bit unfriendly to users of ecpglib, given that many apps > would never use the statement cache (AFAICT you have to explicitly > ask for auto-prepare mode to get to that code). > > Doesn't look hard to fix though. Thanks for tackling that. Greetings, Andres Freund
On 16/10/2018 08:30, Andres Freund wrote: > This just reminded me that a couple times I wanted a cast that casts > away const, but otherwise makes sure the type stays the same. I don't > think there's a way to do that in C, but we can write one that verifies > the cast doesn't do something bad if gcc is used: > > #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) > #define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst)(var) > #else > #define unconstify(cst, var) ((cst) (var)) > #endif > > Does anybody besides me see value in adding a cleaned up version of > that? I've had the attached patch lying around. As you can see there, there are a few places where it could be useful, but not too many. The name CONST_CAST() is adapted from C++. Your version with __builtin_types_compatible_p() doesn't work for casting between char * and const char *, so it wouldn't be very useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Andres Freund <andres@anarazel.de> writes: > .data 0000000000001180 datetktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o > .data 0000000000000c28 ibmkanji ./src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.o > .data 00000000000003f0 deltatktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o Hmm. I think these can just be marked const, will investigate. The rest of those look to be mostly below the threshold of pain ... regards, tom lane
I wrote: > Andres Freund <andres@anarazel.de> writes: >> .data 0000000000001180 datetktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o >> .data 0000000000000c28 ibmkanji ./src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.o >> .data 00000000000003f0 deltatktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o > Hmm. I think these can just be marked const, will investigate. I pushed fixes for these, but curiously, the ibmkanji change did not make any difference to section sizes --- AFAICT, my toolchain already figured out that it could treat that table as const. The dt_common.c changes are a win though. regards, tom lane
On 2018-10-17 21:06:13 +0200, Peter Eisentraut wrote: > On 16/10/2018 08:30, Andres Freund wrote: > > This just reminded me that a couple times I wanted a cast that casts > > away const, but otherwise makes sure the type stays the same. I don't > > think there's a way to do that in C, but we can write one that verifies > > the cast doesn't do something bad if gcc is used: > > > > #if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P) > > #define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"),(cst) (var) > > #else > > #define unconstify(cst, var) ((cst) (var)) > > #endif > > > > Does anybody besides me see value in adding a cleaned up version of > > that? > > I've had the attached patch lying around. Heh. > As you can see there, there are a few places where it could be useful, > but not too many. Yea. > The name CONST_CAST() is adapted from C++. > > Your version with __builtin_types_compatible_p() doesn't work for > casting between char * and const char *, so it wouldn't be very useful. Huh, why wouldn't it work for char *? Seems to work fine for me? > +/* > + * Safer casting -- use this for casting away const or volatile. It ensures > + * that the source and target types are otherwise compatible. > + */ > +#define CONST_CAST(t, x) ((void)((t)0 == (x)), (t)(x)) > + Has the advantage that it probably works for globals, but OTOH, it only works correctly for pointers, and it doesn't reliably trigger warnigns / errors. Greetings, Andres Freund
On 17/10/2018 22:02, Andres Freund wrote: >> Your version with __builtin_types_compatible_p() doesn't work for >> casting between char * and const char *, so it wouldn't be very useful. > Huh, why wouldn't it work for char *? Seems to work fine for me? __builtin_types_compatible_p(const char *, char *) returns false (0) for me. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-10-17 23:43:31 +0200, Peter Eisentraut wrote: > On 17/10/2018 22:02, Andres Freund wrote: > >> Your version with __builtin_types_compatible_p() doesn't work for > >> casting between char * and const char *, so it wouldn't be very useful. > > Huh, why wouldn't it work for char *? Seems to work fine for me? > > __builtin_types_compatible_p(const char *, char *) returns false (0) for me. Right, that's why I added a const, inside the macro, to the type specified in the unconstify argument. So typeof() yields a const char *, and the return type is specified as char *, and adding a const in the argument also yields a const char *. I'd merged my version since, I don't feel particularly attached to it though... Greetings, Andres Freund
On 17/10/2018 23:51, Andres Freund wrote: >> __builtin_types_compatible_p(const char *, char *) returns false (0) for me. > > Right, that's why I added a const, inside the macro, to the type > specified in the unconstify argument. So typeof() yields a const char *, > and the return type is specified as char *, and adding a const in the > argument also yields a const char *. Yeah, that works. The C++-inspired version also allowed casting from not-const to const, which we don't really need. I'd perhaps change the signature #define unconstify(underlying_type, var) because the "var" doesn't actually have to be a variable. Attached is my previous patch adapted to your macro. I'm tempted to get rid of the stuff in dfmgr.c and just let the "older platforms" get the warning. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 18/10/2018 22:17, Peter Eisentraut wrote: > On 17/10/2018 23:51, Andres Freund wrote: >>> __builtin_types_compatible_p(const char *, char *) returns false (0) for me. >> >> Right, that's why I added a const, inside the macro, to the type >> specified in the unconstify argument. So typeof() yields a const char *, >> and the return type is specified as char *, and adding a const in the >> argument also yields a const char *. > > Yeah, that works. The C++-inspired version also allowed casting from > not-const to const, which we don't really need. > Attached is my previous patch adapted to your macro. Oh, I forgot to mention, your version doesn't work for this code in pqexpbuffer.c: str->data = (char *) oom_buffer; That's probably not a big deal though. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-10-18 22:20:29 +0200, Peter Eisentraut wrote: > On 18/10/2018 22:17, Peter Eisentraut wrote: > > On 17/10/2018 23:51, Andres Freund wrote: > >>> __builtin_types_compatible_p(const char *, char *) returns false (0) for me. > >> > >> Right, that's why I added a const, inside the macro, to the type > >> specified in the unconstify argument. So typeof() yields a const char *, > >> and the return type is specified as char *, and adding a const in the > >> argument also yields a const char *. > > > > Yeah, that works. The C++-inspired version also allowed casting from > > not-const to const, which we don't really need. > > > Attached is my previous patch adapted to your macro. > > Oh, I forgot to mention, your version doesn't work for this code in > pqexpbuffer.c: > > str->data = (char *) oom_buffer; That kind of seems correct ;). Can easily be done via unconstify(char *, &oom_buffer[0]), if necessary. > That's probably not a big deal though. Greetings, Andres Freund
On 2018-10-18 22:17:55 +0200, Peter Eisentraut wrote: > I'd perhaps change the signature > > #define unconstify(underlying_type, var) > > because the "var" doesn't actually have to be a variable. Hm, so expr, or what would you use? > Attached is my previous patch adapted to your macro. > > I'm tempted to get rid of the stuff in dfmgr.c and just let the "older > platforms" get the warning. Yea, that works for me. Are you planning to apply this? Greetings, Andres Freund
On 18/10/2018 21:31, Andres Freund wrote: > On 2018-10-18 22:17:55 +0200, Peter Eisentraut wrote: >> I'd perhaps change the signature >> >> #define unconstify(underlying_type, var) >> >> because the "var" doesn't actually have to be a variable. > > Hm, so expr, or what would you use? done >> Attached is my previous patch adapted to your macro. >> >> I'm tempted to get rid of the stuff in dfmgr.c and just let the "older >> platforms" get the warning. and done > Yea, that works for me. > > Are you planning to apply this? and done -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services