Simon Riggs <simon@2ndQuadrant.com> wrote:
> On 30 July 2015 at 22:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> Kevin Grittner wrote:
>>>> If you run `make installcheck` against a cluster with
>>>> default_transaction_isolation = 'repeatable read' you get one
>>>> failure:
>>>
>>> I am tempted to say that we should just disallow to run vacuum
>>> on a table containing a brin index in a transaction-snapshot
>>> transaction.
>>
>> Huh? We don't allow vacuum inside a (user started) transaction
>> now. What new restriction are you proposing?
>
> Forgive me, but I believe there is a confusion here. Nobody is
> changing VACUUM.
>
> The code comment mentioned as "full of it" by Kevin appears to be
> accurate and appropriate.
>
> The code is called by VACUUM and brin_summarize_new_values(). It
> can't be VACUUM,
Reread the thread. The error is being hit on a VACUUM command
(which is not, of course in any explicit transaction).
> as you say and as the comment already says, so it must be the
> function brin_summarize_new_values().
It is, in fact, not brin_summarize_new_values() -- as shown by the
stack trace when the error is thrown. This is why it is such a bad
idea to have any function assume that it knows everywhere it is
being called from, and under what conditions each caller will call
it. (That's distinct from a function having a documented API about
invariants which must be true when calling the function.) A simple
thinko like forgetting that even a VACUUM command starts a
transaction if it is not already in an explicit transaction can
cause confusing problems when the function makes a bad guess at who
"must have" called it. Let the function name itself it if is going
to name something (in a user-facing error message) that is only
visible to those with access to the source code. Better yet, write
an error message that will mean something to those users who don't
have the source code handy.
> The test assumes that the default_transaction_isolation is read
> committed and so the test fails at other levels. If anything, it
> is the test that is at fault.
Yes, the test being wrong is a bigger problem than an arcane and
misleading error message when it causes a failure.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company