On Wed, 2009-01-28 at 22:19 +0200, Heikki Linnakangas wrote:
> Tom Lane wrote:
...
> > Well, those unexpectedly cancelled queries could have represented
> > critical functionality too. I think this argument calls the entire
> > approach into question. If there is no safe setting for the parameter
> > then we need to find a way to not have the parameter.
>
> We've gone through that already. Different ideas were hashed out around
> September. There's four basic feasible approaches to what to do when an
> incoming WAL record conflicts with a running read-only query:
>
> 1. Kill the query. (max_standby_delay=0)
> 2. Wait for the query to finish before continuing (max_standby_delay=-1)
> 3. Have a feedback loop from standby to master, feeding an OldestXmin to
> the master, preventing it from removing tuples that are still needed in
> the standby.
> 4. Allow the query to continue, knowing that it will return wrong results.
>
> I don't consider 4 to be an option. Option 3 has its own set of
> drawbacks, as a standby can then cause bloat in the master, and in any
> case we're not going to have it in this release. And then there's some
> middle ground, like wait a while and then kill the query
> (max_standby_delay > 0).
>
> I don't see any way around the fact that when a tuple is removed, it's
> gone and can't be accessed by queries. Either you don't remove it, or
> you kill the query.
Actually we came up with a solution to this - use filesystem level
snapshots (like LVM2+XFS or ZFS), and redirect backends with
long-running queries to use fs snapshot mounted to a different
mountpoint.
I don't think Simon has yet put full support for it in code, but it is
clearly _the_ solution for those who want to eat the cake and have it
too.
>
> I think the max_standby_delay setting is fairly easy to explain. It
> shouldn't be too hard for a DBA to set it correctly.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>