Thread: WIP: remove use of flat auth file for client authentication
Attached is a patch that removes the use of the flat auth file during client authentication, instead using regular access to the pg_auth catalogs. As previously discussed, this implies pushing the authentication work down to InitPostgres. I didn't yet do anything about the idea of falling back to connecting to "postgres" when the specified target DB doesn't exist, but other than that small change I think it's about ready to go. The main thing that turned out to be unexpectedly, um, "interesting" was signal handling. What we have been doing during client authentication is to point SIGTERM, SIGALRM, etc to the "authdie" handler, which just does "proc_exit(1)" directly. This will *not* do once we've entered a transaction --- in general you can't expect to just interrupt the backend in a random place and start doing something different. I'm not really sure it was safe even during authentication, but for sure it's not safe during database access. The solution that's adopted here is to use the regular backend SIGINT, SIGTERM, SIGQUIT, and SIGALRM signal handling, and to turn on ImmediateInterruptOK while we are doing client interactions during authentication, but not while we are doing database accesses during authentication. It's a bit ugly because the signal handlers have to be taught to do something different when ClientAuthInProgress, but I don't see a better way. Another interesting point is that for this to work, those signal interrupts have to actually be enabled (doh) ... and up to now we have been running InitPostgres with most signals disabled. I suspect that this means some things are actively broken during InitPostgres's initial transaction --- for example, if it happens to try to take a lock that completes a deadlock cycle, the deadlock won't be detected because the lock timeout SIGALRM interrupt will never occur. Another example is that SI inval messaging isn't working during InitPostgres either. The patch addresses this by moving up PostgresMain's PG_SETMASK(&UnBlockSig); call to before InitPostgres. We might need to back-patch that bit, though I'm hesitant to fool with such a thing in back branches. Thoughts, objections, better ideas? regards, tom lane
Attachment
On Sat, Aug 29, 2009 at 6:00 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Attached is a patch that removes the use of the flat auth file during > client authentication, instead using regular access to the pg_auth > catalogs. As previously discussed, this implies pushing the > authentication work down to InitPostgres. I didn't yet do anything > about the idea of falling back to connecting to "postgres" when the > specified target DB doesn't exist, but other than that small change > I think it's about ready to go. Falling back to connecting to "postgres" seems unnecessarily complex to me. > Another interesting point is that for this to work, those signal > interrupts have to actually be enabled (doh) ... and up to now we have > been running InitPostgres with most signals disabled. I suspect that > this means some things are actively broken during InitPostgres's initial > transaction --- for example, if it happens to try to take a lock that > completes a deadlock cycle, the deadlock won't be detected because the > lock timeout SIGALRM interrupt will never occur. Another example is > that SI inval messaging isn't working during InitPostgres either. > The patch addresses this by moving up PostgresMain's > PG_SETMASK(&UnBlockSig); call to before InitPostgres. We might need to > back-patch that bit, though I'm hesitant to fool with such a thing in > back branches. The deadlock can only fail to be detected by someone else if the whole initpostgres thing takes longer than deadlock_timout I think. So it doesn't seem very likely. Not sure how likely problems due to missed SI messages are. > > Thoughts, objections, better ideas? > > regards, tom lane > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- greg http://mit.edu/~gsstark/resume.pdf
On Sat, 2009-08-29 at 01:00 -0400, Tom Lane wrote: > Attached is a patch that removes the use of the flat auth file during > client authentication, instead using regular access to the pg_auth > catalogs. As previously discussed, this implies pushing the > authentication work down to InitPostgres. I didn't yet do anything > about the idea of falling back to connecting to "postgres" when the > specified target DB doesn't exist, but other than that small change > I think it's about ready to go. I get the feeling that part of the inspiration for this is that Hot Standby must maintain this file. If not, I'm curious as to the reasons for doing this. No objections however, just curiosity. Specifically, should I remove the parts of the HS patch that refresh those files? > I suspect that > this means some things are actively broken during InitPostgres's > initial > transaction --- for example, if it happens to try to take a lock that > completes a deadlock cycle, the deadlock won't be detected because the > lock timeout SIGALRM interrupt will never occur. Another example is > that SI inval messaging isn't working during InitPostgres either. Are we doing anything in the initial transaction that *could* deadlock, or cause an SI inval message? -- Simon Riggs www.2ndQuadrant.com
* Simon Riggs (simon@2ndQuadrant.com) wrote: > I get the feeling that part of the inspiration for this is that Hot > Standby must maintain this file. If not, I'm curious as to the reasons > for doing this. No objections however, just curiosity. The impetus for these changes was the performance complaint that adding new users is extremely expensive once the files get to be large. Solving that kind of a problem typically involves moving away from flat files and to a database. Thankfully, we happen to have a database already built-in. ;) > Specifically, should I remove the parts of the HS patch that refresh > those files? I'd probably wait till it's committed and then rip it out as part of re-baseing against CVS. That's just my 2c on it and I've no clue how invasive the changes to HS are to do that. If it's easy to separate the changes and remove/add them as needed, then perhaps it doesn't matter if you do that now or wait. Thanks, Stephen
Greg Stark <gsstark@mit.edu> writes: > On Sat, Aug 29, 2009 at 6:00 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> ... I didn't yet do anything >> about the idea of falling back to connecting to "postgres" when the >> specified target DB doesn't exist, but other than that small change >> I think it's about ready to go. > Falling back to connecting to "postgres" seems unnecessarily complex to me. If that's the consensus I don't have a problem with it ... it's certainly something we could add later if anyone complains. >> Another interesting point is that for this to work, those signal >> interrupts have to actually be enabled (doh) ... and up to now we have >> been running InitPostgres with most signals disabled. �I suspect that >> this means some things are actively broken during InitPostgres's initial >> transaction --- for example, if it happens to try to take a lock that >> completes a deadlock cycle, the deadlock won't be detected because the >> lock timeout SIGALRM interrupt will never occur. �Another example is >> that SI inval messaging isn't working during InitPostgres either. >> The patch addresses this by moving up PostgresMain's >> PG_SETMASK(&UnBlockSig); call to before InitPostgres. �We might need to >> back-patch that bit, though I'm hesitant to fool with such a thing in >> back branches. > The deadlock can only fail to be detected by someone else if the whole > initpostgres thing takes longer than deadlock_timout I think. So it > doesn't seem very likely. Not sure how likely problems due to missed > SI messages are. The problem I was worried about was where InitPostgres tries to take the last lock in a deadlock cycle, and all the other participants have already run their deadlock checks and are just waiting. The InitPostgres transaction needs to run the checker to get out of this state, and it won't because it's got SIGALRM blocked. I agree that this is not very probable because InitPostgres takes only pretty weak locks, but I suspect that a failure scenario could be demonstrated. We have seen occasional reports of apparent undetected-deadlock situations, but of course I've got no proof that this is the cause. On the whole, I don't want to risk tinkering with it in the back branches, but I'm happy that it will be fixed going forward. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > Specifically, should I remove the parts of the HS patch that refresh > those files? Yes. This was the last part that I was afraid might have insurmountable problems. There are some bits yet to do but they're in the nature of crank-turning, I believe. I think it's a good bet the flat files will be gone before the next commitfest starts. BTW, one of the to-do bits is to find another way to initialize the XID wrap limit at database start. Right now that's set up as a side-effect of scanning pg_database to create the flat DB file. The idea I had about it was to store the current wrap limit in checkpoint WAL records and use the most recent checkpoint's value as the initial limit. Can you tell me what you're doing about the wrap limit in HS and whether that solution would be an issue for you? >> I suspect that >> this means some things are actively broken during InitPostgres's >> initial >> transaction --- for example, if it happens to try to take a lock that >> completes a deadlock cycle, the deadlock won't be detected because the >> lock timeout SIGALRM interrupt will never occur. Another example is >> that SI inval messaging isn't working during InitPostgres either. > Are we doing anything in the initial transaction that *could* deadlock, Sure --- we have to take locks to examine system catalogs. Pretty weak locks mind you, but there's certainly a potential there if someone else is doing something like a VACUUM FULL on the catalogs. > or cause an SI inval message? I don't think InitPostgres could *cause* a SI inval, I was more worried about whether it would fail to respond to them. The worst case result is probably a failure of the InitPostgres transaction to notice someone else's relfilenode reassignment on a system catalog, which would end up with a "failed to open file" kind of error. Again, the probabilities aren't very high, but they appear nonzero. regards, tom lane
On Sat, 2009-08-29 at 09:00 -0400, Stephen Frost wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: > > I get the feeling that part of the inspiration for this is that Hot > > Standby must maintain this file. If not, I'm curious as to the reasons > > for doing this. No objections however, just curiosity. > > The impetus for these changes was the performance complaint that adding > new users is extremely expensive once the files get to be large. > Solving that kind of a problem typically involves moving away from flat > files and to a database. Thankfully, we happen to have a database > already built-in. ;) LOL. Quite lucky that. :) -- Simon Riggs www.2ndQuadrant.com
On Sat, 2009-08-29 at 11:19 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > Specifically, should I remove the parts of the HS patch that refresh > > those files? > > Yes. This was the last part that I was afraid might have insurmountable > problems. There are some bits yet to do but they're in the nature of > crank-turning, I believe. I think it's a good bet the flat files will > be gone before the next commitfest starts. OK, its gone. Shame really, took a while to make that work. :) > BTW, one of the to-do bits is to find another way to initialize the > XID wrap limit at database start. Right now that's set up as a > side-effect of scanning pg_database to create the flat DB file. > The idea I had about it was to store the current wrap limit in > checkpoint WAL records and use the most recent checkpoint's value > as the initial limit. Can you tell me what you're doing about the > wrap limit in HS and whether that solution would be an issue for you? ...quakes in boots... We're weren't doing anything at all. I guess we will be now; thanks for alerting me. In general we don't look at the actual xid values themselves, so going past the xid max value itself shouldn't present an issue (that I can see, but I will think longer on this). A problem could occur if a standby query/snapshot lived long enough to see the same xid in different epochs. Realistically that's a long time and a query would be unlikely to survive that long without having first constrained the flow of cleaning records by holding a snapshot open on the primary. However, it could present a problem to someone and agree we must have a clear remedial action. The standard remedial action is to kill queries that cause conflicts, so we would need to kill queries based upon a wrap test. We could do that as each xid arrives, but it would be best to do that via something rare like checkpoint records, so that works for me. -- Simon Riggs www.2ndQuadrant.com