Re: [BUGS] Breakage with VACUUM ANALYSE + partitions - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [BUGS] Breakage with VACUUM ANALYSE + partitions
Date
Msg-id CA+TgmobyCtD4xRY4Ee2Jv6-qznMDaLVrMshuirXbdPNjaYVQGA@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] Breakage with VACUUM ANALYSE + partitions  (Andres Freund <andres@anarazel.de>)
Responses Re: [BUGS] Breakage with VACUUM ANALYSE + partitions  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Apr 25, 2016 at 11:56 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-25 08:55:54 -0400, Robert Haas wrote:
>> Andres, this issue has now been open for more than a month, which is
>> frankly kind of ridiculous given the schedule we're trying to hit for
>> beta.  Do you think it's possible to commit something RSN without
>> compromising the quality of PostgreSQL 9.6?
>
> Frankly I'm getting a bit annoyed here too. I posted a patch Friday,
> directly after getting back from pgconf.us. Saturday I posted a patch
> for an issue which I didn't cause, but which caused issues when testing
> the patch in this thread. Sunday I just worked on some small patch - one
> you did want to see resolved too.  It's now 8.30 Monday morning.  What's
> the point of your message?

I think that the point of my message is exactly what I said in my
message.  This isn't really about the last couple of days.  The issue
was reported on March 20th.  On March 31st, Noah asked you for a plan
to get it fixed by April 7th.  You never replied.  On April 16th, the
issue not having been fixed, he followed up.  You said that you would
fix it next week.  That week is now over, and we're into the following
week.  We have a patch, and that's good, and I have reviewed it and
Thom has tested it, and that's good, too.  But it is not clear whether
you feel confident to commit it or when you might be planning to do
that, so I asked.  Given that this is the open item of longest tenure
and that we're hoping to ship beta soon, why is that out of line?

Fundamentally, the choices for each open item are (1) get it fixed
before beta, (2) revert the commit that caused it, (3) decide it's OK
to ship beta with that issue, or (4) slip beta.  We initially had a
theory that the commit that caused this issue merely revealed an
underlying problem that had existed before, but I no longer really
think that's the case.  That commit introduced a new way to write to
blocks that might have in the meantime been removed, and it failed to
make that safe.  That's not to say that md.c doesn't do some wonky
things, but the pre-existing code in the upper layers coped with the
wonkiness and your new code doesn't, so in effect it's a regression.
And in fact I think it's a regression that can be expected to be a
significant operational problem for people if not fixed, because the
circumstances in which it can happen are not very obscure.  You just
need to hold some pending flush requests in your backend-local queue
while some other process truncates the relation, and boom.  So I am
disinclined to option #3.  I also do not think that we should slip
beta for an issue that was reported this far in advance of the planned
beta date, so I am disinclined to option #4.  That leaves #1 and #2.
I assume you will be pretty darned unhappy if we end up at #2, so I am
trying to figure out if we can achieve #1.  OK?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposed change to make cancellations safe
Next
From: Robert Haas
Date:
Subject: Re: Ordering in guc.c vs. config.sgml vs. postgresql.sample.conf