On 03/06/2021 23:04, Jeff Davis wrote:
> One problem with unlogged tables is that the application has no way to
> tell if they were reset, or they just happen to be empty.
>
> This can be a problem with sharding, where you might have different
> shards of an unlogged table on different servers. If one server
> crashes, you'll be missing only one shard of the data, which may appear
> inconsistent. In that case, you'd like the application (or sharding
> solution) to be able to detect that one shard was lost, and TRUNCATE
> those that remain to get back to a reasonable state.
>
> It would be easy enough for the init fork to have a single page with a
> flag set. That way, when the main fork is replaced with the init fork,
> other code could detect that a reset happened.
I'd suggest using a counter rather than a flag. With a flag, if one
client clears the flag to acknowledge that a truncation happened, others
might miss it. See also ABA problem.
> When detected, depending on a GUC, the behavior could be to auto-
> truncate it (to get the current silent behavior), or refuse to perform
> the operation (except an explicit TRUNCATE), or issue a
> warning/log/notice.
TRUNCATE isn't quite what happens when an unlogged table is
re-initialized. It changes the relfilenode, resets stats, and requires a
more strict lock. So I don't think repurposing TRUNCATE for
re-initializing a table is a good idea. There's also potential for a
race condition, if two connections see that a table needs
re-initialization, and issue "TRUNCATE + INSERT" concurrently. One of
the INSERTs will be lost.
A warning or notice is easy to miss.
> The biggest challenge would be: when should we detect that the reset
> has happened? There might be a lot of entry points. Another idea would
> be to just have a SQL function that the application could call whenever
> it needs to know.
Yeah, a SQL function to get the current "reset counter" would be nice.
- Heikki