Thread: [BUGS] data inconsistency between client and server at pg_ctl stop -m fast
[BUGS] data inconsistency between client and server at pg_ctl stop -m fast
Hello
Postgresql version of psql (PostgreSQL) 9.6.0
System: Red Hat Enterprise Linux Server release 6.8 (Santiago)
Test method:
- The client (used libpq in c++ or also psql in a perl) tries to insert a counter into testtable
(insert was chosen for easy checks, but mostly we use updates and functions)
- Whenever an error is returned to the client, the client tries is to write it again
Problem:
- the psql client receives an error but the record was inserted into the table!
- So the record is finally insertet twice
I could imagine one of several methods how the problem could be solved, may be some (hidden) method already exists.
1) A shutdown method like pg_ctl stop -m smart would be helpful:
a) which does not allow new connections (as it is now)
b) which finishes active transations but does not allow new WRITING transactions (or also no reading transctions)
2) A similar function like pg_cancel_backup would be helpful, which also DISALLOWS to start new "writing" transactions
(or inhibits any new transaction), and which guarantees that the cancelled transactions are reported with error
and the successful transactions are reported with OK to the client.
3) The problem could be overcome, if the would be a command which disallows any new writing (or also reading) transactions.
This command could be called e.g few seconds before a pg_ctl stop -m fast is executed - so all transactions
which will be finished within these few seconds will have correct results for the client.
4) The problem would be solved if the pg_ctl stop -m fast behaves a little bit different:
if a transaction is finished here (example below), the server should send success to the client
and terminate the connection afterwards
if a transaction is really aborted an error shall be sent to the client
pg_ctl stop -m fast was executed at this time:
At test case when about 1000000 records were insertet 29 entries were inserted twice, the first error was at this time:
<2017-01-26 14:21:31.837 CET H:[local] U:postgres P:103530> LOG: 00000: duration: 1.308 ms statement: select * from insert_testtable(57121);
<2017-01-26 14:21:31.837 CET H:[local] U:postgres P:103530> LOCATION: exec_simple_query, postgres.c:1171
<2017-01-26 14:21:31.837 CET H:[local] U:postgres P:103530> FATAL: 57P01: terminating connection due to administrator command
-------------------------------------------------------------------------------------------------------------------------------
<2017-01-26 14:21:31.837 CET H:[local] U:postgres P:103530> LOCATION: ProcessInterrupts, postgres.c:2875
<2017-01-26 14:21:31.837 CET H:[local] U:postgres P:103530> STATEMENT: select * from insert_testtable(57121);
<2017-01-26 14:21:31.837 CET H:[local] U:postgres P:103530> DEBUG: 00000: shmem_exit(1): 1 before_shmem_exit callbacks to make
At next start of the server (pg_ctl start) the client (psql) inserted the same value (57121) again, as it got an error before.
Readout of the table of this counter:
db1=# select * from testtable where counter = 57121;
counter | insert_time
---------+----------------------------
57121 | 2017-01-26 14:21:31.835887
57121 | 2017-01-26 14:21:35.967155
(2 rows)
The perl program output at this time was:
-------------------------------------------
inserted 57119
insert_testtable
------------------
57120
(1 row)
inserted 57120
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
*** error insert 57121
psql: FATAL: the database system is shutting down
*** error insert 57121
psql: FATAL: the database system is shutting down
*** error insert 57121
....
....
*** error insert 57121
psql: FATAL: the database system is starting up
*** error insert 57121
insert_testtable
------------------
57121
(1 row)
Attachments:
=============
1) postgresql_pgctl_stop_mfast_during_transaction.log ... part where the problem occured
2) database dump (dbname: db1, used table test_table) used for this test (including data content after test)
db1.dump.sql ( pg_dump -p 5445 -U postgres db1 > db1.dump.sql)
3) server config files (postgresql.conf, pg_hba.conf)
4) Perl program - writing incremented counter to testtable: insert_counter.pl
5) script which periodically stops and starts the db server: periodic_start_stop_db.sh
Note: the perl and shell script were executed on database server locally.
Environment:
============
PG_VERSION=9.6
Installed Packages:
version of psql (PostgreSQL) 9.6.0
postgresql96-libs-9.6.0-1PGDG.rhel6.x86_64
postgresql96-contrib-9.6.0-1PGDG.rhel6.x86_64
postgresql96-server-9.6.0-1PGDG.rhel6.x86_64
postgresql96-9.6.0-1PGDG.rhel6.x86_64
Kind regards
Hermann Schauhuber
Sonorys Technology
A-1210 Vienna
Attachment
Re: [BUGS] data inconsistency between client and server at pg_ctlstop -m fast
On Thu, Jan 26, 2017 at 11:16 PM, SCHAUHUBER Hermann <hermann.schauhuber@sonorys.at> wrote: > - The client (used libpq in c++ or also psql in a perl) tries to > insert a counter into testtable > - Whenever an error is returned to the client, the client tries > is to write it again > - the psql client receives an error but the record was inserted > into the table! > > - So the record is finally insertet twice As I understand your complaint, there is a short time between when the request to commit modifications to the database is sent and when the indication of successful commit is received by the client, during which a broken connection leaves the status of the commit in doubt. This much is unavoidable. You further note that `pg_ctl -m fast` has more of a tendency to cause this indeterminate state than is necessary. That may be true. There may be room for a feature enhancement here, although I don't see where it rises to the level of a bug. Such a feature could never eliminate occurrences of the indeterminate state; it could only reduce the frequency of them. Meanwhile, there is a lot you could do to reduce the frequency of this issue. - If you use explicit transactions, the indeterminate state could only occur on a COMMIT request, never on any other request. You could safely assume that an error on any transaction not waiting on a COMMIT request was indeed not committed. - Off-hand, I can't think of where you could see this except when the error level at the client side was FATAL. I think you could safely assume that if the level is anything else that the work was not committed. - You might try performing a graceful termination of client applications before shutting down the database. - I understand that the so-called "smart" shutdown is often impractical for production environments; however where it is used the shutdown cannot cause the problem you describe. - Finally, if you use two-phase commit with a high-quality transaction manager (which independently keeps track of prepared transactions, for retry as needed) you should automatically recover from this condition. What this definitely does not seem to me to be is a bug, which makes this the wrong forum for further discussion of the issue. -- Kevin Grittner -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs