Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+TgmoYsUw22=zYcKQrtC_Udi3-Ukw0qZU0zgccFyUcZ6_5FtA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Thu, Jun 20, 2019 at 2:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Okay, one reason that comes to mind is we don't want to choke the > system as applying undo can consume CPU and generate a lot of I/O. Is > that you have in mind or something else? Yeah, mainly that, but also things like log spam, and even pressure on the lock table. If we are trying over and over again to take useless locks, it can affect other things on the system. The main thing, however, is the CPU and I/O consumption. > I see an advantage in having some sort of throttling here, so we can > have some wait time (say 100ms) between processing requests. Do we > see any need of guc here? I don't think that is the right approach. As I said in my previous reply, we need a way of holding off the retry of the same error for a certain amount of time, probably measured in seconds or tens of seconds. Introducing a delay before processing every request is an inferior alternative: if there are a lot of rollbacks, it can cause the system to lag; and in the case where there's just one rollback that's failing, it will still be way too much log spam (and probably CPU time too). Nobody wants 10 failure messages per second in the log. > > It seems to me that thinking of this in terms of what the undo worker > > does and what the undo launcher does is probably not the right > > approach. We need to think of it more as an integrated system. Instead > > of storing a failure_count with each request in the error queue, how > > about storing a next retry time? > > I think both failure_count and next_retry_time can work in a similar way. > > I think incrementing next retry time in multiples will be a bit > tricky. Say first-time error occurs at X hours. We can say that > next_retry_time will X+10s=Y and error_occured_at will be X. The > second time it again failed, how will we know that we need set > next_retry_time as Y+20s, maybe we can do something like Y-X and then > add 10s to it and add the result to the current time. Now whenever > the worker or launcher finds this request, they can check if the > current_time is greater than or equal to next_retry_time, if so they > can pick that request, otherwise, they check request in next queue. > > The failure_count can also work in a somewhat similar fashion. > Basically, we can use error_occurred at and failure_count to compute > the required time. So, if error is occurred at say X hours and > failure count is 3, then we can check if current_time is greater than > X+(3 * 10s), then we will allow the entry to be processed, otherwise, > it will check other queues for work. Meh. Don't get stuck on one particular method of calculating the next retry time. We want to be able to change that easily if whatever we try first doesn't work out well. I am not convinced that we need anything more complex than a fixed retry time, probably controlled by a GUC (undo_failure_retry_time = 10s?). An escalating time between retries would be important and advantageous if we expected the sizes of these queues to grow into the millions, but the current design seems to be contemplating something more in the tends-of-thousands range and I am not sure we're going to need it at that level. We should try simple things first and then see where we need to make it more complex. At some basic level, the queue needs to be ordered by increasing retry time. You can do that with your design, but you have to recompute the next retry time from the error_occurred_at and failure_count values every time you examine an entry. It's almost certainly better to store the next_retry_time explicitly. That way, if for example we change the logic for computing the next_retry_time to something really complicated, it doesn't have any effect on the code that keeps the queue in order -- it just looks at the computed value. If we end up with something very simple, like error_occurred_at + constant, it may end up seeming a little silly, but I think that's a price well worth paying for code maintainability. If we end up with error_occurred_at + Min(failure_count * 10, 100) or something of that sort, then we can also store failure_count in each record, but it will just be part of the payload, not the sort key, so adding it or removing it won't affect the code that maintains the queue ordering. > > I think the error queue needs to be > > ordered by database_id, then by next_retry_time, and then by order of > > insertion. (The last part is important because next_retry_time is > > going to be prone to having ties, and we need to break those ties in > > the right way.) > > I think it makes sense to order requests by next_retry_time, > error_occurred_at (this will ensure the order of insertion). However, > I am not sure if there is a need to club the requests w.r.t database > id. It can starve the error requests from other databases. Moreover, > we already have a functionality wherein if the undo worker doesn't > encounter the next request from the same database on which it is > operating for a certain amount of time, then it will peek ahead (few > entries) in each queue to get the request for the same database. We > don't sort by db_id in other queues as well, so it will be consistent > for this queue if we just sort by next_retry_time and > error_occurred_at. You're misunderstanding my point. We certainly do not wish to always pick the request from the database with the lowest OID, or anything like that. However, we do need a worker for a particular database to find the work pending for that database efficiently. Peeking ahead a few requests is a version of that, but I'm not sure it's going to be good enough. Suppose we look ahead 3 requests but there are 10 databases. Then, if all 10 databases have requests pending, it is likely that we won't find the next request for our particular database even though it exists -- the first 3 may easily be for all other databases. If you look ahead more requests, that doesn't really fix it - it just means you need more databases for the problem to become likely. And note that this problem happens even if every database contains a worker. Some of those workers will erroneously think that they should exit. I'm not sure exactly what to do about this. My first thought was that for all of the queues we might need to have a queue per database (or something equivalent) rather than just one big queue. But that has problems too: it will mean that a database worker will never exit as long as there is any work at all to be done in that database, even if some other database is getting starved. Somehow we need to balance the efficiency of having a worker for a particular database process many requests before exiting against the need to ensure fairness across databases, and it doesn't sound to me like we quite know what exactly we ought to do there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: