Re: logical changeset generation v6.2 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: logical changeset generation v6.2 |
Date | |
Msg-id | CA+TgmoZOiUiPcKsrgiftEqsqN0bKuZLVrsLyAwkcwv5je3_SQQ@mail.gmail.com Whole thread Raw |
In response to | Re: logical changeset generation v6.2 (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: logical changeset generation v6.2
|
List | pgsql-hackers |
On Fri, Oct 18, 2013 at 2:26 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I know of the following solutions: > 1) Don't allow VACUUM FULL on catalog tables if wal_level = logical. > 2) Make VACUUM FULL prevent DDL and then wait till all changestreams > have decoded up to the current point. > 3) don't delete the old relfilenode for VACUUM/CLUSTERs of system tables > if there are life decoding slots around, instead delegate that > responsibility to the slot management. > 4) Store both (cmin, cmax) for catalog tuples. > > I bascially think only 1) and 4) are realistic. And 1) sucks. > > I've developed a prototype for 4) and except currently being incredibly > ugly, it seems to be the most promising approach by far. My trick to > store both cmin and cmax is to store cmax in t_hoff managed space when > wal_level = logical. In my opinion, (4) is too ugly to consider. I think that if we start playing games like this, we're opening up the doors to lots of subtle bugs and future architectural pain that will be with us for many, many years to come. I believe we will bitterly regret any foray into this area. It has long seemed to me to be a shame that we don't have some system for allowing old relfilenodes to stick around until they are no longer in use. If we had that, we might be able to allow utilities like CLUSTER or VACUUM FULL to permit concurrent read access to the table. I realize that what people really want is to let those things run while allowing concurrent *write* access to the table, but a bird in the hand is worth two in the bush. What we're really talking about here is applying MVCC to filesystem actions: instead of removing the old relfilenode(s) immediately, we do it when they're no longer referenced by anyone, just as we don't remove old tuples immediately, but rather when they are no longer referenced by anyone. The details are tricky, though: we can allow write access to the *new* heap just as soon as the rewrite is finished, but anyone who is still looking at the *old* heap can't ever upgrade their AccessShareLock to anything higher, or hilarity will ensue. Also, if they lock some *other* relation and AcceptInvalidationMessages(), their relcache entry for the rewritten relation will get rebuilt, and that's bound to work out poorly. The net-net here is that I think (3) is an attractive solution, but I don't know that we can make it work in a reasonable amount of time. I don't think I understand exactly what you have in mind for (2); can you elaborate? I have always thought that having a WaitForDecodingToCatchUp() primitive was a good way of handling changes that were otherwise too difficult to track our way through. I am not sure you're doing that at all right now, which in some sense I guess is fine, but I haven't really understood your aversion to this solution. There are some locking issues to be worked out here, but the problems don't seem altogether intractable. (1) is basically deciding not to fix the problem. I don't think that's acceptable. I don't have another idea right at the moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: