Thread: Wiki Page Draft for upcoming release
All, https://wiki.postgresql.org/wiki/20140320UpdateIssues I'm sure my explanation of the data corruption issue is not correct, so please fix it. Thanks! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
I sent a post to -general with a much more detailed brain dump of my current understanding on this topic. The main point I'm addressing here is how to recover from this problem. Since a symptom of the problem is that pg_dump/restore can fail saying that (in some instances) the only viable restore mechanism be pg_dump/restore means that someone so afflicted is going to lose data since their last good dump - if they still have one. However, if the true data table does not actually contain any duplicate data then such a dump/restore cycle (or I would think REINDEX - or DROP/CREATE INDEX chain) should resolve the problem. Thus if there is duplicate data the user needs-to/can identify and remove the offending records so that subsequent actions do not fail with a duplicate key error. If this is true then providing a query (or queries) that can provide the problem records and delete them from the table - along with any staging up that is necessary (like first dropping affected indexes if applicable) - would be good a nice addition. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Wiki-Page-Draft-for-upcoming-release-tp5796494p5796503.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Josh Berkus wrote > All, > > https://wiki.postgresql.org/wiki/20140320UpdateIssues > > I'm sure my explanation of the data corruption issue is not correct, so > please fix it. Thanks! I presume that because there is no way the master could have sent bad table data to the replication slaves that performing the base backup on the slaves is sufficient; i.e., it is not necessary to backup the master and distribute that? I'd either make the change myself or ask this kind of question somewhere else more appropriate but I'm not sure of the proper protocol to follow. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Wiki-Page-Draft-for-upcoming-release-tp5796494p5796505.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 03/17/2014 05:49 PM, Josh Berkus wrote: > All, > > https://wiki.postgresql.org/wiki/20140320UpdateIssues > > I'm sure my explanation of the data corruption issue is not correct, so > please fix it. Thanks! > Anyone? We're going public with this in 36 hours, so I'd love for it to actually be correct. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Mar 18, 2014 at 6:41 PM, Josh Berkus <josh@agliodbs.com> wrote: > > Anyone? We're going public with this in 36 hours, so I'd love for it to > actually be correct. I'll do a first pass now. -- greg
On Tue, Mar 18, 2014 at 7:05 PM, Greg Stark <stark@mit.edu> wrote: > I'll do a first pass now. I fixed the "Causes" section. I haven't touched the other sections which are pretty reasonable. It would be nice to have more suggestions for what people should do other than dump/restore. It would be nice to be able to tell people that if they vacuum, then reindex and check all their foreign key constraints then they should be ok. I think that's almost true except I'm not sure how to tell that they've waited long enough before vacuuming and I'm not 100% sure it'll fix the problem. (Also they could have lost a row which has since had all all it's referencing rows deleted. The database won't be able to help them find that.) -- greg
On 2014-03-18 19:28:53 +0000, Greg Stark wrote: > It would be nice to be able to tell people that if they vacuum, then > reindex and check all their foreign key constraints then they should > be ok. I don't think so: http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE old_type USING (col); is the only real thing to do. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/18/2014 12:35 PM, Andres Freund wrote: > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE > old_type USING (col); is the only real thing to do. Then why wouldn't VACUUM FULL work? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-03-18 12:52:49 -0700, Josh Berkus wrote: > On 03/18/2014 12:35 PM, Andres Freund wrote: > > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE > > old_type USING (col); is the only real thing to do. > > Then why wouldn't VACUUM FULL work? Please read the referenced message and children. It will use rewriteheap.c which tries to maintain the update chains. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/18/2014 12:55 PM, Andres Freund wrote: > On 2014-03-18 12:52:49 -0700, Josh Berkus wrote: >> On 03/18/2014 12:35 PM, Andres Freund wrote: >>> I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE >>> old_type USING (col); is the only real thing to do. >> >> Then why wouldn't VACUUM FULL work? > > Please read the referenced message and children. It will use > rewriteheap.c which tries to maintain the update chains. Reading ... hmmm. Well, the ALTER TABLE thing may work, but I'm really hesitant to recommend it as a general solution for our users; it doesn't look repliable. Too bad we don't have a handy corrupted database to practice on. I will note that VACUUM FULL doesn't work either. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-03-18 19:28:53 +0000, Greg Stark wrote: >> It would be nice to be able to tell people that if they vacuum, then >> reindex and check all their foreign key constraints then they should >> be ok. > I don't think so: > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE > old_type USING (col); is the only real thing to do. Unfortunately, the operative word in that is "noop". Didn't you see http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us It would have helped before we put in the transform smarts, but now it's too smart. regards, tom lane
On 2014-03-18 16:19:01 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-03-18 19:28:53 +0000, Greg Stark wrote: > >> It would be nice to be able to tell people that if they vacuum, then > >> reindex and check all their foreign key constraints then they should > >> be ok. > > > I don't think so: > > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de > > > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE > > old_type USING (col); is the only real thing to do. > > Unfortunately, the operative word in that is "noop". Didn't you see > http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us Huh, that somehow never arrived here. Very odd. > It would have helped before we put in the transform smarts, but now > it's too smart. Gna. Hm, what about: USING (COALESCE(columnname, NULL)); That still seems to work. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2014-03-18 16:19:01 -0400, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On 2014-03-18 19:28:53 +0000, Greg Stark wrote: > > >> It would be nice to be able to tell people that if they vacuum, then > > >> reindex and check all their foreign key constraints then they should > > >> be ok. > > > > > I don't think so: > > > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de > > > > > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE > > > old_type USING (col); is the only real thing to do. > > > > Unfortunately, the operative word in that is "noop". Didn't you see > > http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us > > Huh, that somehow never arrived here. Very odd. 2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us 2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us 2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 => andres@anarazel.de R=dnslookup T=remote_smtp H=mail.anarazel.de [217.115.131.40]X=TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128 QT=1s DT=1s 2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 Completed QT=1s -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-03-18 17:34:34 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2014-03-18 16:19:01 -0400, Tom Lane wrote: > > > Andres Freund <andres@2ndquadrant.com> writes: > > > > On 2014-03-18 19:28:53 +0000, Greg Stark wrote: > > > >> It would be nice to be able to tell people that if they vacuum, then > > > >> reindex and check all their foreign key constraints then they should > > > >> be ok. > > > > > > > I don't think so: > > > > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de > > > > > > > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE > > > > old_type USING (col); is the only real thing to do. > > > > > > Unfortunately, the operative word in that is "noop". Didn't you see > > > http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us > > > > Huh, that somehow never arrived here. Very odd. > > 2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us > 2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us > 2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 => andres@anarazel.de R=dnslookup T=remote_smtp H=mail.anarazel.de [217.115.131.40]X=TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128 QT=1s DT=1s > 2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 Completed QT=1s Mail that's CC/TOed to me onlist, is automatically marked as read by a sieve script so I don't have to mark it as read twice. It seems something went wrong there for a couple of messages... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 03/18/2014 04:39 PM, Andres Freund wrote: > > Mail that's CC/TOed to me onlist, is automatically marked as read by a > sieve script so I don't have to mark it as read twice. It seems > something went wrong there for a couple of messages... > Why not just turn on eliminatecc on the majordomo server? See <https://lists.postgresql.org/mj/mj_wwwusr> cheers andrew
Josh Berkus <josh@agliodbs.com> writes: > On 03/17/2014 05:49 PM, Josh Berkus wrote: >> https://wiki.postgresql.org/wiki/20140320UpdateIssues > Anyone? We're going public with this in 36 hours, so I'd love for it to > actually be correct. I did a bit more hacking on this page. Could use another look from Alvaro and/or Andres, I'm sure. regards, tom lane
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > On 03/17/2014 05:49 PM, Josh Berkus wrote: > >> https://wiki.postgresql.org/wiki/20140320UpdateIssues > > > Anyone? We're going public with this in 36 hours, so I'd love for it to > > actually be correct. > > I did a bit more hacking on this page. Could use another look from Alvaro > and/or Andres, I'm sure. Edited, mainly to remove mention of FOR NO KEY UPDATE as a possible cause of the problem. I don't know that this can cause an issue, since that lock level conflicts with updates. I wonder about suggesting other versions of ALTER TABLE that can do heap rewrites. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 03/19/2014 10:37 AM, Alvaro Herrera wrote: > Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>> On 03/17/2014 05:49 PM, Josh Berkus wrote: >>>> https://wiki.postgresql.org/wiki/20140320UpdateIssues >> >>> Anyone? We're going public with this in 36 hours, so I'd love for it to >>> actually be correct. >> >> I did a bit more hacking on this page. Could use another look from Alvaro >> and/or Andres, I'm sure. > > Edited, mainly to remove mention of FOR NO KEY UPDATE as a possible > cause of the problem. I don't know that this can cause an issue, since > that lock level conflicts with updates. > > I wonder about suggesting other versions of ALTER TABLE that can do > heap rewrites. I don't want to recommend any version of ALTER TABLE until someone actually tests it on a corrupted database. What about simply CREATE TABLE AS SELECT? Presumably that kind of manual cleanup would work, no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 03/19/2014 10:37 AM, Alvaro Herrera wrote: >> I wonder about suggesting other versions of ALTER TABLE that can do >> heap rewrites. > I don't want to recommend any version of ALTER TABLE until someone > actually tests it on a corrupted database. A test would be a good idea, yes. But in principle it ought to work. > What about simply CREATE TABLE AS SELECT? Presumably that kind of > manual cleanup would work, no? Well, it would leave you with a whole lot of error-prone manual cleanup to do, like repointing foreign key linkages, remaking indexes, etc. And what's possibly more relevant to this discussion, there's no very strong reason to believe that it'd result in data that's any cleaner than the ALTER TABLE way. Note that if you've already suffered some of the potential indirect consequences, like duplicate/missing keys, then there isn't going to be any automatic fix; you're gonna have to clean up the data by hand. But assuming that that hasn't happened, any seqscan-based data extraction ought to do the trick; and ALTER TABLE (as long as you avoid the no-op transformation pitfall) should be as good as other ways, with a lot less risk of human error than a manual recipe. regards, tom lane