Tom Lane said:
> "Michael Brown" <mbrown@fensystems.co.uk> writes:
>> I have put in place a temporary workaround on the production system,
>> which is to insert a
>
>> // Pretend that the cache is always invalid
>> fprintf ( stderr, "*** bypassing cache ***\n" );
>> goto read_failed;
>
> I don't think this will actually help --- if anything it exposes you to
> the bug more :-(. Given my current theory, there is not anything wrong
> with the init file. The problem is a sort of race condition that would
> be triggered by very high cache-inval traffic during startup of a new
> backend. I looked at the cache inval array in your coredump, and it
> looked like there had been a whole bunch of table deletions happening
> concurrently with the startup --- "whole bunch" meaning
> hundreds if not thousands. Is there anything in your application
> behavior that might encourage a lot of table drops to happen
> concurrently?
If temporary table drops count towards this, then yes. We have a
reporting procedure (in PL/pgSQL) that runs every ten seconds. This
procedure needs to generate entries in two reporting tables. In order to
obtain a consistent view when running under READ COMMITTED, it creates a
temporary table containing both result sets, then splits the data out into
the two reporting tables.
The temporary table is dropped immediately after use, and it's quite
plausible that this could run into hundreds of temporary table creates and
drops in a single transaction.
I could fairly easily change this procedure to truncate rather than drop
the temporary table, if that would lessen the exposure to the problem.
Would that be likely to help?
(Alternatively, given that the temporary table usage here is quite
inelegant, is there a better way to obtain a consistent database snapshot
across multiple queries without using SERIALIZABLE when inside a PL/pgSQL
function that has to be marked VOLATILE?)
> I'll get you a real fix as soon as I can, but might not be till
> tomorrow.
Thanks!
Michael