Home > mailing lists

Re: Two proposed modifications to the PostgreSQL FDW - Mailing list pgsql-hackers

From	Chris Travers
Subject	Re: Two proposed modifications to the PostgreSQL FDW
Date	August 20, 2018 19:46:32
Msg-id	CAN-RpxA-hAA5Gwo8dWG9JCeJNH6+ib8G8fpn8Y55nCiEEb_P1g@mail.gmail.com Whole thread Raw
In response to	Re: Two proposed modifications to the PostgreSQL FDW (Andres Freund <andres@anarazel.de>)
Responses	Re: Two proposed modifications to the PostgreSQL FDW Re: Two proposed modifications to the PostgreSQL FDW
List	pgsql-hackers

Tree view

On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> 1. INSERTMETHOD=[insert|copy] option on foreign table.
>
> One significant limitation of the PostgreSQL FDW is that it does a prepared
> statement insert on each row written which imposes a per-row latency. This
> hits environments where there is significant latency or few latency
> guarantees particularly hard, for example, writing to a foreign table that
> might be physically located on another continent. The idea is that
> INSERTMETHOD would default to insert and therefore have no changes but
> where needed people could specify COPY which would stream the data out.
> Updates would still be unaffected.

That has a *lot* of semantics issues, because you suddenly don't get
synchronous error reports anymore. I don't think that's OK on a
per-table basis. If we invented something like this, it IMO should be a
per-statement explicit opt in that'd allow streaming.

I want to push back a bit and justify the per table decision here.

This is primarily useful when two things are present:

1. There is significant lag between the servers

2. There are significant bulk operations on the table.

Only when both are present does it make sense to use this option. If you have a low latency connection, don't use it. If you are only writing a few rows here and there, don't use it. If you have a high latency connection *and* you are inserting large numbers of rows at a time, then this is where you run into a problem and the current FDW approach is, frankly, unusable in these cases. These characteristics follow the server (and hence the table) in the first case and the table only in the second case. Therefore the characteristics of the foreign table determine whether it makes sense to consider this option. Doing this on the statement level poses more problems, in my view, than it fixes.

Moreover the idea of batching inserts doesn't help either (and actually would be worse in many cases than the current approach) since the current approach uses a prepared statement which gets called for each row. Doing a batch insert might save you on the per-row round trip but it adds a great deal of complexity to the overall operation.

Now, a per-statement opt-in poses a number of questions. Obviously we'd have to allow this on local tables too, but it would have no effect. So suddenly you have a semantic construct which *may* or *may not* pose the problems you mention and in fact may or may not pose the problem on foreign tables because whether the construct does anything depends on the FDW driver. I don't see what exactly one buys in shifting the problem in this way given that when this is useful it will be used on virtually all inserts and when it is not useful it won't be used at all.

Now, the entire problem is the expectation of synchronous errors on a per row basis. This is what causes the problem with inserting a million rows over a transcontinental link. (insert row, check error, insert row, check error etc).

So on to the question of containing the problems. I think the best way to contain this is to have one COPY statement per insert planner node. And that makes sure that the basic guarantee that a planner node succeeds or errors is met. For sequences, I would expect that supplying a column list without the relevant sequenced value should behave in a tolerable way (otherwise two concurrent copies would have problems).

So that's the primary defense. Now let's look at the main alternatives.

1. You can use a table where you logically replicate inserts, and where you insert then truncate or delete. This works fine until you have enough WAL traffic that you cannot tolerate a replication outage.....

2. You can write an extension which allows you to COPY a current table into a foreign table. The problem here is that SPI has a bunch of protections to keep you from copying to stdout, so when I helped with this proof fo concept we copied to a temp file in a ramdisk and copied from there which is frankly quite a bit worse than this approach.

If you want a per statement opt-in, I guess the questions that come to my mind are:

1. What would this opt-in look like?

2. How important is it that we avoid breaking current FDW interfaces to implement it?

3. How would you expect it to behave if directed at a local table or a foreign table on, say, a MySQL db?

> 2. TWOPHASECOMMIT=[off|on] option

> The second major issue that I see with PostgreSQL's foreign database
> wrappers is the fact that there is no two phase commit which means that a
> single transaction writing to a group of tables has no expectation that all
> backends will commit or rollback together. With this patch an option would
> be applied to foreign tables such that they could be set to use two phase
> commit When this is done, the first write to each backend would register a
> connection with a global transaction handler and a pre-commit and commit
> hooks would be set up to properly process these.
>
> On recommit a per-global-transaction file would be opened in the data
> directory and prepare statements logged to the file. On error, we simply
> roll back our local transaction.
>
> On commit hook , we go through and start to commit the remote global
> transactions. At this point we make a best effort but track whether or not
> we were successfully on all. If successful on all, we delete the file. If
> unsuccessful we fire a background worker which re-reads the file and is
> responsible for cleanup. If global transactions persist, a SQL
> administration function will be made available to restart the cleanup
> process. On rollback, we do like commit but we roll back all transactions
> in the set. The file has enough information to determine whether we should
> be committing or rolling back on cleanup.
>
> I would like to push these both for Pg 12. Is there any feedback on the
> concepts and the problems first

There's been *substantial* work on this. You should at least read the
discussion & coordinate with the relevant developers.

I suppose I should forward this to them directly also.

Yeah. Also the transaction manager code for this I wrote while helping with a proof of concept for this copy-to-remote extension.

There are a few big differences in implementation with the patches you mention and the disagreement was part of why I thought about going this direction.

First, discussion of differences in implementation:

1. I treat the local and remote transactions symmetrically and I make no assumptions about what might happen between prepare and an attempted local commit.

prepare goes into the precommit hook

commit goes into the commit hook and we never raise errors if it fails (because you cannot rollback at that point). Instead a warning is raised and cleanup commences.

rollback goes into the rollback hook and we never raise errors if it fails (because you are already rolling back).

2. By treating this as a property of a table rather than a property of a foreign data wrapper or a server, we can better prevent prepared transactions where they have not been enabled.

This also ensures that we know whether we are guaranteeing two phase commit or not by looking at the table.

3. By making this opt-in it avoids a lot of problems with regards to incorrect configuration etc since if the DBA says "use two phase commit" and failed to enable prepared transactions on the other side...

On to failure modes:

1. Its possible that under high load too many foreign transactions are prepared and things start rolling back instead of committing. Oh well....

2. In the event that a foreign server goes away between prepare and commit, we continue to retry via the background worker. The background worker is very pessimistic and checks every remote system for the named transaction.

3. Administrator action is required in two specific cases. In the first case the controlling server crashes, restarts, etc along with the cleanup worker while cleanup is running. It would be possible to patch the backend to provide a cleanup service of this sort on startup. However this would require some design decisions to be made now. In the second case, a transaction has been around for a while and is causing vacuum horizon issues so needs to be manually resolved. Maybe the reason here is beyond our control such as another recommit hook stalling in another extension.

Part of the reason for this discussion is to try to get feedback not only from the current developers but also those (like Tom) who objected the last time around.

For example, if it is a requirement that we have automatic cleanup in the event of a server restart, then the backend needs to know what to do about these. If not, then we leave the whole code in the FDW. If the backend has to know we probably need a file header sufficient for the backend to decide that this is a PostgreSQL-FDW global transaction log vs maybe one that could be added for the Oracle FDW later.... Maybe mentioning the .so and symbol for the background worker entry point?

See https://commitfest.postgresql.org/19/1574/ and the referenced discussions.

Greetings,

Andres Freund

Best Regards,

Chris Travers

Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com

Saarbrücker Straße 37a, 10405 Berlin

pgsql-hackers by date:

From: Tom Lane
Date: 20 August 2018, 18:45:12
Subject: Re: Truncation failure in autovacuum results in data corruption (duplicate keys)

From: Andrew Gierth
Date: 20 August 2018, 19:58:49
Subject: Re: A really subtle lexer bug

Re: Two proposed modifications to the PostgreSQL FDW - Mailing list pgsql-hackers

Previous

Next