Re: Backends stalled in 'startup' state: index corruption - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Backends stalled in 'startup' state: index corruption |
Date | |
Msg-id | 5513.1337986962@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Backends stalled in 'startup' state: index corruption (Greg Sabino Mullane <greg@endpoint.com>) |
Responses |
Re: Backends stalled in 'startup' state: index corruption
Re: Backends stalled in 'startup' state: index corruption |
List | pgsql-hackers |
Greg Sabino Mullane <greg@endpoint.com> writes: >> Yeah, this is proof that what it was doing is the same as what we saw in >> Jeff's backtrace, ie loading up the system catalog relcache entries the >> hard way via seqscans on the core catalogs. So the question to be >> answered is why that's suddenly a big performance bottleneck. It's not >> a cheap operation of course (that's why we cache the results ;-)) but >> it shouldn't take minutes either. And, because they are seqscans, it >> doesn't seem like messed-up indexes should matter. > FWIW, this appeared to be an all-or-nothing event: either every new backend > was suffering through this, or none were. They all seemed to clear up > at the same time as well. Mostly not surprising. They'd definitely all hit the missing init file at the same time, so the stalling would start consistently for all. And once any one process successfully created a new file, subsequent incoming sessions wouldn't stall. However, the remaining processes trying to compute new init files would still have to complete the process, so I'd expect there to be a diminishing effect --- the ones that were stalling shouldn't all release exactly together. Unless there is some additional effect that's syncing them all. (I wonder for instance if the syncscan logic is kicking in here.) One interesting question is why there's a thundering herd of new arrivals in the first place. IIRC you said you were using a connection pooler. I wonder if it has a bug^H^H^Hdesign infelicity that makes it drop and reopen all its connections simultaneously. > Let me know if you think of anything particular I can test while it is > happening again. I just noticed something that might explain the persistency of the effect as observed by Jeff. The code that seqscans pg_attribute (in RelationBuildTupleDesc) knows how many rows it's expecting to find for a given catalog, and it falls out of the seqscan loop as soon as it's gotten them all. Now, the rows belonging to core system catalogs are generally right near the front of pg_attribute, being the oldest rows in that catalog, which means that generally this results in not having to seqscan very far even if pg_attribute is large. It strikes me though that there are at least two ways that nice property could get broken, resulting in much seqscan work if pg_attribute is large: 1. Somebody decides to update one of those rows, and it gets dropped in some remote region of the table. The only really plausible reason for this is deciding to fool with the column-specific stats target (attstattarget) of a system catalog. Does that sound like something either of you might have done? You could check it by looking at the ctid columns of the pg_attribute rows for system catalogs, and seeing if any have large block numbers. 2. If the syncscan logic were to kick in and cause some backend to pick up its seqscan of pg_attribute from a point beyond where some of the target rows are, that backend would have to scan all of pg_attribute, circling back around to the start, before it would find all the rows it seeks. And most likely this effect would lather-rinse-repeat for each catalog it's seeking the pg_attribute entries for. Not only does this mean a much-worse-than-normal startup time for that backend, but any other ones that arrive while the synchronized scan is in progress would be caught in the undertow and likewise spend a long time to get their results. So point 2 is suddenly looking like a really attractive theory for explaining what happened. As we were just remarking in an adjacent thread, the threshold for this to start happening would be for pg_attribute to get larger than one-fourth of shared_buffers; the syncscan logic doesn't kick in for relations smaller than that. IIRC this is close enough to the numbers Jeff mentioned to make it plausible that it happened to him, and plausible that his new installation has pg_attribute just enough smaller to avoid the scenario. Not sure about Greg's case, but he should be able to tell us the size of pg_attribute and his shared_buffers setting ... If this is the explanation, then it's easy enough to deal with point 2 --- just disable syncscan for these searches. I don't see an easy answer for problems of ilk #1, other than "don't do that in a database that's been around for awhile". Another idea we might consider is to prevent the thundering herd effect by not having all the incoming backends try to update pg_internal.init independently. Instead let the first one take a lock while it's doing that, and the rest just queue up on that lock. They'd be stalled anyway, and they are not helping matters by duplicating the first one's effort. This is a rather more invasive change though. regards, tom lane
pgsql-hackers by date: