Thread: Wiki Page Draft for upcoming release

Wiki Page Draft for upcoming release

From
Josh Berkus
Date:
All,

https://wiki.postgresql.org/wiki/20140320UpdateIssues

I'm sure my explanation of the data corruption issue is not correct, so
please fix it.  Thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Wiki Page Draft for upcoming release

From
David Johnston
Date:
I sent a post to -general with a much more detailed brain dump of my current
understanding on this topic.  The main point I'm addressing here is how to
recover from this problem.

Since a symptom of the problem is that pg_dump/restore can fail saying that
(in some instances) the only viable restore mechanism be pg_dump/restore
means that someone so afflicted is going to lose data since their last good
dump - if they still have one.

However, if the true data table does not actually contain any duplicate data
then such a dump/restore cycle (or I would think REINDEX - or DROP/CREATE
INDEX chain) should resolve the problem.  Thus if there is duplicate data
the user needs-to/can identify and remove the offending records so that
subsequent actions do not fail with a duplicate key error.

If this is true then providing a query (or queries) that can provide the
problem records and delete them from the table - along with any staging up
that is necessary (like first dropping affected indexes if applicable) -
would be good a nice addition.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Wiki-Page-Draft-for-upcoming-release-tp5796494p5796503.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Wiki Page Draft for upcoming release

From
David Johnston
Date:
Josh Berkus wrote
> All,
> 
> https://wiki.postgresql.org/wiki/20140320UpdateIssues
> 
> I'm sure my explanation of the data corruption issue is not correct, so
> please fix it.  Thanks!

I presume that because there is no way the master could have sent bad table
data to the replication slaves that performing the base backup on the slaves
is sufficient; i.e., it is not necessary to backup the master and distribute
that?

I'd either make the change myself or ask this kind of question somewhere
else more appropriate but I'm not sure of the proper protocol to follow.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Wiki-Page-Draft-for-upcoming-release-tp5796494p5796505.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Wiki Page Draft for upcoming release

From
Josh Berkus
Date:
On 03/17/2014 05:49 PM, Josh Berkus wrote:
> All,
> 
> https://wiki.postgresql.org/wiki/20140320UpdateIssues
> 
> I'm sure my explanation of the data corruption issue is not correct, so
> please fix it.  Thanks!
> 

Anyone?  We're going public with this in 36 hours, so I'd love for it to
actually be correct.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Wiki Page Draft for upcoming release

From
Greg Stark
Date:
On Tue, Mar 18, 2014 at 6:41 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
> Anyone?  We're going public with this in 36 hours, so I'd love for it to
> actually be correct.

I'll do a first pass now.


-- 
greg



Re: Wiki Page Draft for upcoming release

From
Greg Stark
Date:
On Tue, Mar 18, 2014 at 7:05 PM, Greg Stark <stark@mit.edu> wrote:
> I'll do a first pass now.

I fixed the "Causes" section. I haven't touched the other sections
which are pretty reasonable. It would be nice to have more suggestions
for what people should do other than dump/restore.

It would be nice to be able to tell people that if they vacuum, then
reindex and check all their foreign key constraints then they should
be ok. I think that's almost true except I'm not sure how to tell that
they've waited long enough before vacuuming and I'm not 100% sure
it'll fix the problem. (Also they could have lost a row which has
since had all all it's referencing rows deleted. The database won't be
able to help them find that.)

-- 
greg



Re: Wiki Page Draft for upcoming release

From
Andres Freund
Date:
On 2014-03-18 19:28:53 +0000, Greg Stark wrote:
> It would be nice to be able to tell people that if they vacuum, then
> reindex and check all their foreign key constraints then they should
> be ok.

I don't think so:
http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de

I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
old_type USING (col); is the only real thing to do.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Wiki Page Draft for upcoming release

From
Josh Berkus
Date:
On 03/18/2014 12:35 PM, Andres Freund wrote:
> I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
> old_type USING (col); is the only real thing to do.

Then why wouldn't VACUUM FULL work?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Wiki Page Draft for upcoming release

From
Andres Freund
Date:
On 2014-03-18 12:52:49 -0700, Josh Berkus wrote:
> On 03/18/2014 12:35 PM, Andres Freund wrote:
> > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
> > old_type USING (col); is the only real thing to do.
> 
> Then why wouldn't VACUUM FULL work?

Please read the referenced message and children. It will use
rewriteheap.c which tries to maintain the update chains.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Wiki Page Draft for upcoming release

From
Josh Berkus
Date:
On 03/18/2014 12:55 PM, Andres Freund wrote:
> On 2014-03-18 12:52:49 -0700, Josh Berkus wrote:
>> On 03/18/2014 12:35 PM, Andres Freund wrote:
>>> I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
>>> old_type USING (col); is the only real thing to do.
>>
>> Then why wouldn't VACUUM FULL work?
> 
> Please read the referenced message and children. It will use
> rewriteheap.c which tries to maintain the update chains.

Reading ... hmmm.  Well, the ALTER TABLE thing may work, but I'm really
hesitant to recommend it as a general solution for our users; it doesn't
look repliable.  Too bad we don't have a handy corrupted database to
practice on.

I will note that VACUUM FULL doesn't work either.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Wiki Page Draft for upcoming release

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-18 19:28:53 +0000, Greg Stark wrote:
>> It would be nice to be able to tell people that if they vacuum, then
>> reindex and check all their foreign key constraints then they should
>> be ok.

> I don't think so:
> http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de

> I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
> old_type USING (col); is the only real thing to do.

Unfortunately, the operative word in that is "noop".  Didn't you see
http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us
It would have helped before we put in the transform smarts, but now
it's too smart.
        regards, tom lane



Re: Wiki Page Draft for upcoming release

From
Andres Freund
Date:
On 2014-03-18 16:19:01 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-03-18 19:28:53 +0000, Greg Stark wrote:
> >> It would be nice to be able to tell people that if they vacuum, then
> >> reindex and check all their foreign key constraints then they should
> >> be ok.
> 
> > I don't think so:
> > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de
> 
> > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
> > old_type USING (col); is the only real thing to do.
> 
> Unfortunately, the operative word in that is "noop".  Didn't you see
> http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us

Huh, that somehow never arrived here. Very odd.

> It would have helped before we put in the transform smarts, but now
> it's too smart.

Gna. Hm, what about: USING (COALESCE(columnname, NULL));

That still seems to work.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Wiki Page Draft for upcoming release

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2014-03-18 16:19:01 -0400, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > On 2014-03-18 19:28:53 +0000, Greg Stark wrote:
> > >> It would be nice to be able to tell people that if they vacuum, then
> > >> reindex and check all their foreign key constraints then they should
> > >> be ok.
> > 
> > > I don't think so:
> > > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de
> > 
> > > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
> > > old_type USING (col); is the only real thing to do.
> > 
> > Unfortunately, the operative word in that is "noop".  Didn't you see
> > http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us
> 
> Huh, that somehow never arrived here. Very odd.

2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org
H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us
 
2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org
H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us
 
2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 => andres@anarazel.de R=dnslookup T=remote_smtp H=mail.anarazel.de
[217.115.131.40]X=TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128 QT=1s DT=1s
 
2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 Completed QT=1s

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Wiki Page Draft for upcoming release

From
Andres Freund
Date:
On 2014-03-18 17:34:34 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-03-18 16:19:01 -0400, Tom Lane wrote:
> > > Andres Freund <andres@2ndquadrant.com> writes:
> > > > On 2014-03-18 19:28:53 +0000, Greg Stark wrote:
> > > >> It would be nice to be able to tell people that if they vacuum, then
> > > >> reindex and check all their foreign key constraints then they should
> > > >> be ok.
> > > 
> > > > I don't think so:
> > > > http://archives.postgresql.org/message-id/20140317233919.GS16438%40awork2.anarazel.de
> > > 
> > > > I still think a rewriting noop ALTER TABLE ... ALTER COLUMN col TYPE
> > > > old_type USING (col); is the only real thing to do.
> > > 
> > > Unfortunately, the operative word in that is "noop".  Didn't you see
> > > http://www.postgresql.org/message-id/11964.1395100860@sss.pgh.pa.us
> > 
> > Huh, that somehow never arrived here. Very odd.
> 
> 2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org
H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us
 
> 2014-03-18 00:01:18 [10067] 1WPhT4-0002cN-F3 <= pgsql-hackers-owner+M246803=andres=anarazel.de@postgresql.org
H=localhost(postgresql.org) [127.0.0.1] P=smtp S=3502 id=11964.1395100860@sss.pgh.pa.us
 
> 2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 => andres@anarazel.de R=dnslookup T=remote_smtp H=mail.anarazel.de
[217.115.131.40]X=TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128 QT=1s DT=1s
 
> 2014-03-18 00:01:19 [10170] 1WPhT4-0002cN-F3 Completed QT=1s

Mail that's CC/TOed to me onlist, is automatically marked as read by a
sieve script so I don't have to mark it as read twice. It seems
something went wrong there for a couple of messages...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Wiki Page Draft for upcoming release

From
Andrew Dunstan
Date:
On 03/18/2014 04:39 PM, Andres Freund wrote:
>
> Mail that's CC/TOed to me onlist, is automatically marked as read by a
> sieve script so I don't have to mark it as read twice. It seems
> something went wrong there for a couple of messages...
>


Why not just turn on eliminatecc on the majordomo server? See 
<https://lists.postgresql.org/mj/mj_wwwusr>

cheers

andrew




Re: Wiki Page Draft for upcoming release

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 03/17/2014 05:49 PM, Josh Berkus wrote:
>> https://wiki.postgresql.org/wiki/20140320UpdateIssues

> Anyone?  We're going public with this in 36 hours, so I'd love for it to
> actually be correct.

I did a bit more hacking on this page.  Could use another look from Alvaro
and/or Andres, I'm sure.
        regards, tom lane



Re: Wiki Page Draft for upcoming release

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > On 03/17/2014 05:49 PM, Josh Berkus wrote:
> >> https://wiki.postgresql.org/wiki/20140320UpdateIssues
> 
> > Anyone?  We're going public with this in 36 hours, so I'd love for it to
> > actually be correct.
> 
> I did a bit more hacking on this page.  Could use another look from Alvaro
> and/or Andres, I'm sure.

Edited, mainly to remove mention of FOR NO KEY UPDATE as a possible
cause of the problem.  I don't know that this can cause an issue, since
that lock level conflicts with updates.

I wonder about suggesting other versions of ALTER TABLE that can do
heap rewrites.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Wiki Page Draft for upcoming release

From
Josh Berkus
Date:
On 03/19/2014 10:37 AM, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>> On 03/17/2014 05:49 PM, Josh Berkus wrote:
>>>> https://wiki.postgresql.org/wiki/20140320UpdateIssues
>>
>>> Anyone?  We're going public with this in 36 hours, so I'd love for it to
>>> actually be correct.
>>
>> I did a bit more hacking on this page.  Could use another look from Alvaro
>> and/or Andres, I'm sure.
> 
> Edited, mainly to remove mention of FOR NO KEY UPDATE as a possible
> cause of the problem.  I don't know that this can cause an issue, since
> that lock level conflicts with updates.
> 
> I wonder about suggesting other versions of ALTER TABLE that can do
> heap rewrites.

I don't want to recommend any version of ALTER TABLE until someone
actually tests it on a corrupted database.

What about simply CREATE TABLE AS SELECT?  Presumably that kind of
manual cleanup would work, no?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Wiki Page Draft for upcoming release

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 03/19/2014 10:37 AM, Alvaro Herrera wrote:
>> I wonder about suggesting other versions of ALTER TABLE that can do
>> heap rewrites.

> I don't want to recommend any version of ALTER TABLE until someone
> actually tests it on a corrupted database.

A test would be a good idea, yes.  But in principle it ought to work.

> What about simply CREATE TABLE AS SELECT?  Presumably that kind of
> manual cleanup would work, no?

Well, it would leave you with a whole lot of error-prone manual cleanup
to do, like repointing foreign key linkages, remaking indexes, etc.
And what's possibly more relevant to this discussion, there's no very
strong reason to believe that it'd result in data that's any cleaner than
the ALTER TABLE way.

Note that if you've already suffered some of the potential indirect
consequences, like duplicate/missing keys, then there isn't going to be
any automatic fix; you're gonna have to clean up the data by hand.
But assuming that that hasn't happened, any seqscan-based data extraction
ought to do the trick; and ALTER TABLE (as long as you avoid the no-op
transformation pitfall) should be as good as other ways, with a lot less
risk of human error than a manual recipe.
        regards, tom lane