Thread: max backends checking patch
As of 6.4.2 (and snapshot, I guess), postmaser does not check if the number of backends exceeds MaxBackendId (defined in include/storage/sinvaladt.h). As a result (MaxBackendId+1)th backend gets started but failed in the middle of its initialising process. Typical error would be: NOTICE: SIAssignBackendId: discarding tag 2147430138 Connection databese 'request' failed. FATAL 1: Backend cache invalidation initialization failed Then postmaster decides to re-initialize the shared memory and all running backends are killed. Too bad. Attached patches try to fix the problem. -- Tatsuo Ishii -------------------------------- cut here --------------------------- *** postgresql-6.4.2/src/backend/postmaster/postmaster.c.orig Sun Nov 29 10:52:32 1998 --- postgresql-6.4.2/src/backend/postmaster/postmaster.c Sat Jan 9 18:14:52 1999 *************** *** 238,243 **** --- 238,244 ---- static long PostmasterRandom(void); static void RandomSalt(char *salt); static void SignalChildren(SIGNAL_ARGS); + static int CountChildren(void); #ifdef CYR_RECODE void GetCharSetByHost(char *, int, char *); *************** *** 754,764 **** * by the backend. */ ! if (BackendStartup(port) != STATUS_OK) ! PacketSendError(&port->pktInfo, "Backend startup failed"); ! else ! status = STATUS_ERROR; } /* Close the connection if required. */ --- 755,771 ---- * by the backend. */ ! if (CountChildren() < MaxBackendId) { ! if (BackendStartup(port) != STATUS_OK) ! PacketSendError(&port->pktInfo, "Backend startup failed"); ! else { ! status = STATUS_ERROR; ! } ! } else { ! PacketSendError(&port->pktInfo, ! "There are too many backends"); ! } } /* Close the connection if required. */ *************** *** 1617,1620 **** --- 1624,1655 ---- } return random() ^ random_seed; + } + + /* + * Count up number of chidren processes. + */ + static int + CountChildren(void) + { + Dlelem *curr, + *next; + Backend *bp; + int mypid = getpid(); + int cnt = 0; + + curr = DLGetHead(BackendList); + while (curr) + { + next = DLGetSucc(curr); + bp = (Backend *) DLE_VAL(curr); + + if (bp->pid != mypid) + { + cnt++; + } + + curr = next; + } + return(cnt); }
Tatsuo Ishii wrote: > > As of 6.4.2 (and snapshot, I guess), postmaser does not check if the > number of backends exceeds MaxBackendId (defined in > include/storage/sinvaladt.h). As a result (MaxBackendId+1)th backend > gets started but failed in the middle of its initialising process. > Typical error would be: > > NOTICE: SIAssignBackendId: discarding tag 2147430138 > Connection databese 'request' failed. > FATAL 1: Backend cache invalidation initialization failed > > Then postmaster decides to re-initialize the shared memory and all > running backends are killed. Too bad. > > Attached patches try to fix the problem. Couldn't postmaster just keep # of backends running in some variable, instead of examining BackendList ? > + /* > + * Count up number of chidren processes. > + */ > + static int > + CountChildren(void) > + { > + Dlelem *curr, > + *next; > + Backend *bp; > + int mypid = getpid(); > + int cnt = 0; > + > + curr = DLGetHead(BackendList); > + while (curr) > + { > + next = DLGetSucc(curr); > + bp = (Backend *) DLE_VAL(curr); > + > + if (bp->pid != mypid) > + { > + cnt++; > + } > + > + curr = next; > + } > + return(cnt); > } Vadim
> Couldn't postmaster just keep # of backends running > in some variable, instead of examining BackendList ? Yes, you could do that way. I just want to keep things simple. There are three places in postmaster.c where element is deleted from the BackendList. So you need to insert a code to count down the variable into those three places. Seems a seed of "maintenance problem" in the future IMHO:-) Having a counter inside the dllist module is another idea. If there were many codes in the backend that counting elemnts in the dllist, this would be worth to think about. I'm not sure, though. --- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> Couldn't postmaster just keep # of backends running >> in some variable, instead of examining BackendList ? > Yes, you could do that way. I just want to keep things simple. > Seems a seed of "maintenance problem" in the future IMHO:-) I agree with Tatsuo. Counting the children once per backend startup is certainly not a performance bottleneck, so there is no reason to add complexity and a source of potential bugs to speed it up. > Having a counter inside the dllist module is another idea. If there > were many codes in the backend that counting elemnts in the dllist, > this would be worth to think about. That would be reasonable if justified by usage --- the tradeoff is the added cost of maintaining the count for every dllist, whether or not it's ever asked for... regards, tom lane