Thread: IO related waits

IO related waits

From
veem v
Date:
Hi, 
One of our application using RDS postgres. In one of our streaming applications(using flink) which processes 100's of millions of transactions each day, we are using row by row transaction processing for inserting data into the postgres database and commit is performed for each row. We are seeing heavy IO:XactSynch wait events during the data load and also high overall response time. 

Architecture team is suggesting to enable asynch io if possible, so that the streaming client will not wait for the commit confirmation from the database. So I want to understand , how asynch io can be enabled and if any downsides of doing this? 

Regards
Veem

Re: IO related waits

From
Christophe Pettus
Date:

> On Sep 16, 2024, at 13:24, veem v <veema0000@gmail.com> wrote:
> Architecture team is suggesting to enable asynch io if possible, so that the streaming client will not wait for the
commitconfirmation from the database. So I want to understand , how asynch io can be enabled and if any downsides of
doingthis?  

"Async I/O" has a specific meaning that's not currently applicable to PostgreSQL.  What is available is
"synchronous_commit". This setting is by default on.  When it's on, each commit waits until the associated WAL
informationhas been flushed to disk, and then returns.  If it is turned off, the commit returns more or less
immediately,and the WAL flush happens asynchronously from the commit. 

The upside is that the session can proceed without waiting for the WAL flush.  The downside is that on a server crash,
sometransactions may not have been fully committed to the database, and will be missing when the database restarts.
Thedatabase won't be corrupted (as in, you try to use it and get errors), but it will be "earlier in time" than the
applicationmight expect.  It's pretty common to turn it off for high-ingestion-rate situations, especially where the
applicationcan detect and replay missing transactions on a crash. 


Re: IO related waits

From
Adrian Klaver
Date:
On 9/16/24 13:24, veem v wrote:
> Hi,
> One of our application using RDS postgres. In one of our streaming 
> applications(using flink) which processes 100's of millions of 
> transactions each day, we are using row by row transaction processing 
> for inserting data into the postgres database and commit is performed 
> for each row. We are seeing heavy IO:XactSynch wait events during the 
> data load and also high overall response time.
> 
> Architecture team is suggesting to enable asynch io if possible, so that 
> the streaming client will not wait for the commit confirmation from the 
> database. So I want to understand , how asynch io can be enabled and if 
> any downsides of doing this?

Are you referring to this?:

https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/

If not then you will need to be more specific.

> 
> Regards
> Veem

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: IO related waits

From
veem v
Date:


On Tue, 17 Sept 2024 at 03:41, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

Are you referring to this?:

https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/

If not then you will need to be more specific.


Yes, I was referring to this one. So what can be the caveats in this approach, considering transactions meant to be ACID compliant as financial transactions.Additionally I was not aware of the parameter "synchronous_commit" in DB side which will mimic the synchronous commit.

Would both of these mimic the same asynchronous behaviour and achieves the same, which means the client data load throughput will increase because the DB will not wait for those data to be written to the WAL and give a confirmation back to the client and also the client will not wait for the DB to give a confirmation back on the data to be persisted in the DB or not?. Also, as in the backend the flushing of the WAL to the disk has to happen anyway(just that it will be delayed now), so can this method cause contention in the database storage side if the speed in which the data gets ingested from the client is not getting written to the disk , and if it can someway impact the data consistency for the read queries?

Re: IO related waits

From
Greg Sabino Mullane
Date:
On Mon, Sep 16, 2024 at 11:56 PM veem v <veema0000@gmail.com> wrote:
So what can be the caveats in this approach, considering transactions meant to be ACID compliant as financial transactions.

Financial transactions need to be handled with care. Only you know your business requirements, but as Christophe pointed out, disabling synchronous commit means your application may think a particular transaction has completed when it has not. Usually that's a big red flag for financial applications.

we are using row by row transaction processing for inserting data into the postgres database and commit is performed for each row.

This is a better place to optimize. Batch many rows per transaction. Remove unused indexes.

flushing of the WAL to the disk has to happen anyway(just that it will be delayed now), so can this method cause contention in the database storage side if the speed in which the data gets ingested from the client is not getting written to the disk , and if it can someway impact the data consistency for the read queries?

Not quite clear what you are asking here re data consistency. The data will always be consistent, even if synchronous_commit is disabled. The only danger window is on a server crash.

(Keep in mind that RDS is not Postgres, so take tuning recommendations and advice with a grain of salt.)

Cheers,
Greg
 

Re: IO related waits

From
veem v
Date:

On Tue, 17 Sept 2024 at 18:43, Greg Sabino Mullane <htamfids@gmail.com> wrote:

This is a better place to optimize. Batch many rows per transaction. Remove unused indexes.

flushing of the WAL to the disk has to happen anyway(just that it will be delayed now), so can this method cause contention in the database storage side if the speed in which the data gets ingested from the client is not getting written to the disk , and if it can someway impact the data consistency for the read queries?

Not quite clear what you are asking here re data consistency. The data will always be consistent, even if synchronous_commit is disabled. The only danger window is on a server crash.

(Keep in mind that RDS is not Postgres, so take tuning recommendations and advice with a grain of salt.)



Thank you Greg.

Yes, our Java application was doing row by row commit and we saw that from pg_stat_database from the column "xact_commit" which was closely the same as the sum of tup_inserted, tup_updated, tup_deleted column. And also we verified in pg_stats_statements the number against the "calls" column were matching to the "rows" column for the INSERT queries, so it means also we are inserting exactly same number of rows as the number of DB calls, so it also suggest that we are doing row by row operations/dmls.

 And we then asked the application tema to make the inserts in batches, but still seeing those figures in these above two views are not changing much the number "xact_commit" is staying almost same and also even the "calls" and the "rows" column in pg_stats_statements also staying almost same. So does it mean that the application code is somehow still doing the same row by row processing or we are doing something wrong in the above analysis? 

And another thing we noticed , even after the data load finished , even then the "xact_commit" was keep increasing along with "tup_fetched", so does it mean that its doing some kind of implicit commit even for the fetch type queries which must be "select" queries i believe? Also not sure if its expected, but here in postgres i have seen unless we put a code within begin and end block , it's default gets committed even we just run it on the console , it doesn't ask for a explicit commit/rollback ,so not sure if that is someway playing a role here.

Regards
Veem

 
 

Re: IO related waits

From
Adrian Klaver
Date:
On 9/16/24 20:55, veem v wrote:
> 
> 
> On Tue, 17 Sept 2024 at 03:41, Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
> 
>     Are you referring to this?:
> 
>     https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/
<https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/>
> 
>     If not then you will need to be more specific.
> 
> 
> Yes, I was referring to this one. So what can be the caveats in this 
> approach, considering transactions meant to be ACID compliant as 
> financial transactions.Additionally I was not aware of the parameter 
> "synchronous_commit" in DB side which will mimic the synchronous commit.
> 
> Would both of these mimic the same asynchronous behaviour and achieves 
> the same, which means the client data load throughput will increase 
> because the DB will not wait for those data to be written to the WAL and 
> give a confirmation back to the client and also the client will not wait 
> for the DB to give a confirmation back on the data to be persisted in 
> the DB or not?. Also, as in the backend the flushing of the WAL to the 
> disk has to happen anyway(just that it will be delayed now), so can this 
> method cause contention in the database storage side if the speed in 
> which the data gets ingested from the client is not getting written to 
> the disk , and if it can someway impact the data consistency for the 
> read queries?

This is not something that I am that familiar with. I suspect though 
this is more complicated then you think. From the link above:

" Prerequisites #

As illustrated in the section above, implementing proper asynchronous 
I/O to a database (or key/value store) requires a client to that 
database that supports asynchronous requests. Many popular databases 
offer such a client.

In the absence of such a client, one can try and turn a synchronous 
client into a limited concurrent client by creating multiple clients and 
handling the synchronous calls with a thread pool. However, this 
approach is usually less efficient than a proper asynchronous client.
"

Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple 
synchronous clients.

On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html

That will return a success signal to the client quicker if 
synchronous_commit is set to off. Though the point of the Flink async 
I/O is not to wait for the response before moving on, so I am not sure 
how much synchronous_commit = off would help.

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: IO related waits

From
veem v
Date:

On Tue, 17 Sept 2024 at 21:24, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple
synchronous clients.

On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html

That will return a success signal to the client quicker if
synchronous_commit is set to off. Though the point of the Flink async
I/O is not to wait for the response before moving on, so I am not sure
how much synchronous_commit = off would help.


 Got it. So it means their suggestion was to set the asynch_io at flink level but not DB level, so that the application will not wait for the commit response from the database. But in that case , won't it overload the DB with more and more requests if database will keep doing the commit ( with synchronous_commit=ON)  and waiting for getting the response back from its storage for the WAL's to be flushed to the disk, while the application will not wait for its response back(for those inserts) and keep flooding the database with more and more incoming Insert requests?

Additionally as I mentioned before, we see that from "pg_stat_database" from the column "xact_commit" , it's almost matching with the sum of "tup_inserted", "tup_updated", "tup_deleted" column. And also we verified in pg_stats_statements the  "calls" column is same as the "rows" column for the INSERT queries, so it means also we are inserting exactly same number of rows as the number of DB calls, so doesn't it suggest that we are doing row by row operations/dmls.

Also after seeing above and asking application team to do the batch commit ,we are still seeing the similar figures from pg_stat_database and pg_stat_statements, so does it mean that we are looking into wrong stats? or the application code change has not been done accurately? and we see even when no inserts are running from the application side, we do see "xact_commit" keep increasing along with "tup_fetched" , why so?

Finally we see in postgres here, even if we just write a DML statement it does commit that by default, until we explicitly put it in a "begin... end" block. Can that be the difference between how a "commit" gets handled in postgres vs other databases?


Re: IO related waits

From
Adrian Klaver
Date:
On 9/17/24 12:34, veem v wrote:
> 
> On Tue, 17 Sept 2024 at 21:24, Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
> 
>     Which means you need to on Flink end:
> 
>     1) Use Flink async I/O .
> 
>     2) Find a client that supports async or fake it by using multiple
>     synchronous clients.
> 
>     On Postgres end there is this:
> 
>     https://www.postgresql.org/docs/current/wal-async-commit.html
>     <https://www.postgresql.org/docs/current/wal-async-commit.html>
> 
>     That will return a success signal to the client quicker if
>     synchronous_commit is set to off. Though the point of the Flink async
>     I/O is not to wait for the response before moving on, so I am not sure
>     how much synchronous_commit = off would help.
> 
> 
>   Got it. So it means their suggestion was to set the asynch_io at flink 
> level but not DB level, so that the application will not wait for the 
> commit response from the database. But in that case , won't it overload 
> the DB with more and more requests if database will keep doing the 
> commit ( with synchronous_commit=ON)  and waiting for getting the 
> response back from its storage for the WAL's to be flushed to the disk, 
> while the application will not wait for its response back(for those 
> inserts) and keep flooding the database with more and more incoming 
> Insert requests?

My point is this is a multi-layer cake with layers:

1) Flink asycnc io

2) Database client async/sync

3) Postgres sync status.

That is a lot of moving parts and determining whether it is suitable is 
going to require rigorous testing over a representative data load.


See more below.

> 
> Additionally as I mentioned before, we see that from "pg_stat_database" 
> from the column "xact_commit" , it's almost matching with the sum of 
> "tup_inserted", "tup_updated", "tup_deleted" column. And also we 
> verified in pg_stats_statements the  "calls" column is same as the 
> "rows" column for the INSERT queries, so it means also we are inserting 
> exactly same number of rows as the number of DB calls, so doesn't it 
> suggest that we are doing row by row operations/dmls.
> 
> Also after seeing above and asking application team to do the batch 
> commit ,we are still seeing the similar figures from pg_stat_database 
> and pg_stat_statements, so does it mean that we are looking into wrong 
> stats? or the application code change has not been done accurately? and 
> we see even when no inserts are running from the application side, we do 
> see "xact_commit" keep increasing along with "tup_fetched" , why so?
> 
> Finally we see in postgres here, even if we just write a DML statement 
> it does commit that by default, until we explicitly put it in a 
> "begin... end" block. Can that be the difference between how a "commit" 
> gets handled in postgres vs other databases?

It does if autocommit is set in the client, that is common to other 
databases also:

https://dev.mysql.com/doc/refman/8.4/en/commit.html


https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are 
using and the code that interacting with it.

In fact I would say you need to review the entire data transfer process 
to see if there are performance gains that can be obtained without 
adding an entirely new async component.

> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: IO related waits

From
veem v
Date:

On Wed, 18 Sept 2024 at 05:07, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 9/17/24 12:34, veem v wrote:
>

It does if autocommit is set in the client, that is common to other
databases also:

https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are
using and the code that interacting with it.

In fact I would say you need to review the entire data transfer process
to see if there are performance gains that can be obtained without
adding an entirely new async component.



You were spot on. When we turned off the "auto commit" we started seeing less number of commits as per the number of batches. 

However we also started seeing deadlock issues. We have foreign key relationships between the tables and during the batch we do insert into the parent first and then to the child , but this does happen from multiple sessions for different batches. So why do we see below error, as we ensure in each batch we first insert into parent and then into the child tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 10443 waits for ShareLock on transaction 220972157; blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked by process 10443.
  Hint: See server log for query details.
  Where: while inserting index tuple (88736,28) in relation "TAB1_p2024_08_29"

Re: IO related waits

From
veem v
Date:


On Thu, 19 Sept 2024 at 02:01, veem v <veema0000@gmail.com> wrote:

On Wed, 18 Sept 2024 at 05:07, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 9/17/24 12:34, veem v wrote:
>

It does if autocommit is set in the client, that is common to other
databases also:

https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are
using and the code that interacting with it.

In fact I would say you need to review the entire data transfer process
to see if there are performance gains that can be obtained without
adding an entirely new async component.



You were spot on. When we turned off the "auto commit" we started seeing less number of commits as per the number of batches. 

However we also started seeing deadlock issues. We have foreign key relationships between the tables and during the batch we do insert into the parent first and then to the child , but this does happen from multiple sessions for different batches. So why do we see below error, as we ensure in each batch we first insert into parent and then into the child tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 10443 waits for ShareLock on transaction 220972157; blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked by process 10443.
  Hint: See server log for query details.
  Where: while inserting index tuple (88736,28) in relation "TAB1_p2024_08_29"

As we are able to get hold of one session, we see "insert into <parent partition table>" was blocked by "insert into <child partition table>". And the "insert into  <child partition table> " was experiencing a "client read" wait event. Still unable to understand why it's happening and how to fix it?

Re: IO related waits

From
Adrian Klaver
Date:

On 9/18/24 1:40 PM, veem v wrote:
> 

>     You were spot on. When we turned off the "auto commit" we started
>     seeing less number of commits as per the number of batches.
> 
>     However we also started seeing deadlock issues. We have foreign key
>     relationships between the tables and during the batch we do insert
>     into the parent first and then to the child , but this does happen
>     from multiple sessions for different batches. So why do we see below
>     error, as we ensure in each batch we first insert into parent and
>     then into the child tables?
> 
>     caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
>        Detail: Process 10443 waits for ShareLock on transaction
>     220972157; blocked by process 10454.
>     Process 10454 waits for ShareLock on transaction 220972155; blocked
>     by process 10443.
>        Hint: See server log for query details.
>        Where: while inserting index tuple (88736,28) in relation
>     "TAB1_p2024_08_29"
> 
> 
> As we are able to get hold of one session, we see "insert into <parent 
> partition table>" was blocked by "insert into <child partition table>". 
> And the "insert into <child partition table> " was experiencing a 
> "client read" wait event. Still unable to understand why it's happening 
> and how to fix it?



This needs clarification.

1) To be clear when you refer to parent and child that is:
                 FK
parent_tbl.fld <--> child_tbl.fld_fk

not parent and child tables in partitioning scheme?

2) What are the table schemas?

3) What is the code that is generating the error?


Overall it looks like this process needs a top to bottom audit to map 
out what is actually being done versus what needs to be done.




-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: IO related waits

From
veem v
Date:

On Thu, 19 Sept 2024 at 03:02, Adrian Klaver <adrian.klaver@aklaver.com> wrote:


This needs clarification.

1) To be clear when you refer to parent and child that is:
                 FK
parent_tbl.fld <--> child_tbl.fld_fk

not parent and child tables in partitioning scheme?

2) What are the table schemas?

3) What is the code that is generating the error?


Overall it looks like this process needs a top to bottom audit to map
out what is actually being done versus what needs to be done.



Yes the tables were actually having parent and child table relationships, not the child/parent table in partitioning scheme.  And the PK and FK are on columns - (ID, part_date) .The table is the daily range partitioned on column part_date.

The error we are seeing is as below in logs:-

deadlock detected
2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL:  Process 14537 waits for ShareLock on transaction 220975629; blocked by process 14548.
Process 14548 waits for ShareLock on transaction 220975630; blocked by process 14537.
Process 14537: INSERT INTO TRANDB.PART_TAB (ID, part_date....)  VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING
Process 14548: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...)  VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING

2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:HINT:  See server log for query details.
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:CONTEXT:  while inserting index tuple (88814,39) in relation "PART_TAB_p2024_08_29"

2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  INSERT INTO TRANDB.PART_TAB (ID, part_date, ...)  VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING

2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:  current transaction is aborted, commands ignored until end of transaction block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  INSERT INTO TRANDB.EXCEP_TAB (...)
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:  current transaction is aborted, commands ignored until end of transaction block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  
2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR:  current transaction is aborted, commands ignored until end of transaction block

*********

2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:ERROR:  deadlock detected
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:DETAIL:  Process 17456 waits for ShareLock on transaction 220978890; blocked by process 17458.
Process 17458 waits for ShareLock on transaction 220978889; blocked by process 17456.
Process 17456: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...)  VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING
Process 17458: INSERT INTO TRANDB.PART_TAB (ID, part_date, ..)  VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:HINT:  See server log for query details.
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:CONTEXT:  while inserting index tuple (88875,13) in relation "PART_TAB_p2024_08_29"
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:STATEMENT:  INSERT INTO TRANDB.PART_TAB (ID, part_date,..)  VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:LOG:  could not receive data from client: Connection reset by peer
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:LOG:  disconnection: session time: 0:08:37.154 user=USER1 database=TRANDB host=XXXXX port=58778

Re: IO related waits

From
Greg Sabino Mullane
Date:
On Thu, Sep 19, 2024 at 5:17 AM veem v <veema0000@gmail.com> wrote:
2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL:  Process 14537 waits for ShareLock on transaction 220975629; blocked by process 14548.

You need to find out exactly what commands, and in what order, all these processes are doing. Deadlocks can be avoided by rearranging your application logic.
 
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:  current transaction is aborted, commands ignored until end of transaction block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  INSERT INTO TRANDB.EXCEP_TAB (...)
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:  current transaction is aborted, commands ignored until end of transaction block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  
2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR:  current transaction is aborted, commands ignored until end of transaction block

Fix your application. It should be checking that each command completed and not just blindly pushing on to the next statement while ignoring the error.

This is really difficult to diagnose from afar with only snippets of logs and half-complete descriptions of your business logic. Pull everyone involved into a room with a whiteboard, and produce a document describing exactly what your application does, and how it is doing it. Switch from reactive to proactive.

Cheers,
Greg

Re: IO related waits

From
veem v
Date:


On Thu, 19 Sept 2024 at 17:54, Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Thu, Sep 19, 2024 at 5:17 AM veem v <veema0000@gmail.com> wrote:
2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL:  Process 14537 waits for ShareLock on transaction 220975629; blocked by process 14548.

You need to find out exactly what commands, and in what order, all these processes are doing. Deadlocks can be avoided by rearranging your application logic.
 
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:  current transaction is aborted, commands ignored until end of transaction block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  INSERT INTO TRANDB.EXCEP_TAB (...)
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:  current transaction is aborted, commands ignored until end of transaction block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:  
2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR:  current transaction is aborted, commands ignored until end of transaction block

Fix your application. It should be checking that each command completed and not just blindly pushing on to the next statement while ignoring the error.

This is really difficult to diagnose from afar with only snippets of logs and half-complete descriptions of your business logic. Pull everyone involved into a room with a whiteboard, and produce a document describing exactly what your application does, and how it is doing it. Switch from reactive to proactive.



Thank you Greg.

I was thinking there might be some oddity or difference in the behaviour here in postgres as compared to others, because I have seen deadlock due to UPDATES but never seen deadlocks with INSERT queries before in other databases (though here we have "insert on conflict do nothing"). But I am now thinking , here we have foreign keys and primary keys exist and if the same PK gets inserted from multiple sessions then one will wait if the other has not been committed and that might be creating a situation of locking first and subsequently deadlock. 

But also we are doing batch inserts from multiple sessions but each session will first insert into the parent and then into the child table for those related to PK and FK and it should not overlap across sessions. So I will check if there is a loophole there.

Also another thing which we encountered here , if the session gets errors out with any error(may be deadlock etc) , it's not executing any further transactions and erroring out with "current transaction aborted, command ignored until end of transaction block". And it seems it will need explicit "rollback" and will not be the default rollback, which I was expecting it to do.

Regards
Veem

Re: IO related waits

From
Adrian Klaver
Date:
On 9/19/24 05:24, Greg Sabino Mullane wrote:
> On Thu, Sep 19, 2024 at 5:17 AM veem v <veema0000@gmail.com 

> This is really difficult to diagnose from afar with only snippets of 
> logs and half-complete descriptions of your business logic. Pull 
> everyone involved into a room with a whiteboard, and produce a document 
> describing exactly what your application does, and how it is doing it. 
> Switch from reactive to proactive.

+1

> 
> Cheers,
> Greg
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: IO related waits

From
veem v
Date:

On Thu, 19 Sept, 2024, 8:40 pm Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 9/19/24 05:24, Greg Sabino Mullane wrote:
> On Thu, Sep 19, 2024 at 5:17 AM veem v <veema0000@gmail.com

> This is really difficult to diagnose from afar with only snippets of
> logs and half-complete descriptions of your business logic. Pull
> everyone involved into a room with a whiteboard, and produce a document
> describing exactly what your application does, and how it is doing it.
> Switch from reactive to proactive.


Able to reproduce this deadlock graph as below.  Now my question is , this is a legitimate scenario in which the same ID can get inserted from multiple sessions and in such cases it's expected to skip that (thus "On conflict Do nothing" is used) row. But as we see it's breaking the code with deadlock error during race conditions where a lot of parallel threads are operating. So how should we handle this scenario? Will setting the "lock_timeout" parameter at session level will help us anyway here?

Create table t1(id numeric primary key);

Session 1:-
Begin
Insert into table1 values(1) on conflict(id) do nothing;

Session 2:
Begin
Insert into table1 values(2) on conflict(id) do nothing;

session 1:-
Insert into table1 values (2) on conflict(id) do nothing;

It got hung as it waits for the session-2 to commit/rollback the transaction

Session-2:-
Insert into table1 values(1) on conflict(id) do nothing;

deadlock detected... and this session terminated.

Regards
Veem

 

Re: IO related waits

From
Tom Lane
Date:
veem v <veema0000@gmail.com> writes:
> Able to reproduce this deadlock graph as below.  Now my question is , this
> is a legitimate scenario in which the same ID can get inserted from
> multiple sessions and in such cases it's expected to skip that (thus "On
> conflict Do nothing" is used) row. But as we see it's breaking the code
> with deadlock error during race conditions where a lot of parallel threads
> are operating. So how should we handle this scenario?

Do you have to batch multiple insertions into a single transaction?
If so, can you arrange to order them consistently across transactions
(eg, sort by primary key before inserting)?

            regards, tom lane



Re: IO related waits

From
Ron Johnson
Date:
On Fri, Sep 20, 2024 at 4:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
veem v <veema0000@gmail.com> writes:
> Able to reproduce this deadlock graph as below.  Now my question is , this
> is a legitimate scenario in which the same ID can get inserted from
> multiple sessions and in such cases it's expected to skip that (thus "On
> conflict Do nothing" is used) row. But as we see it's breaking the code
> with deadlock error during race conditions where a lot of parallel threads
> are operating. So how should we handle this scenario?

Do you have to batch multiple insertions into a single transaction?
If so, can you arrange to order them consistently across transactions
(eg, sort by primary key before inserting)?
 
That's exactly what I did back in the day.  Because of database buffering, sorting the data file at the OS level made the job 3x as fast as when the input data was random.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

Re: IO related waits

From
Adrian Klaver
Date:

On 9/20/24 1:01 PM, veem v wrote:
> 
> On Thu, 19 Sept, 2024, 8:40 pm Adrian Klaver, <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 9/19/24 05:24, Greg Sabino Mullane wrote:
>      > On Thu, Sep 19, 2024 at 5:17 AM veem v <veema0000@gmail.com
>     <mailto:veema0000@gmail.com>
> 
>      > This is really difficult to diagnose from afar with only snippets of
>      > logs and half-complete descriptions of your business logic. Pull
>      > everyone involved into a room with a whiteboard, and produce a
>     document
>      > describing exactly what your application does, and how it is
>     doing it.
>      > Switch from reactive to proactive.
> 
> 
> Able to reproduce this deadlock graph as below.  Now my question is , 
> this is a legitimate scenario in which the same ID can get inserted from 
> multiple sessions and in such cases it's expected to skip that (thus "On 
> conflict Do nothing" is used) row. But as we see it's breaking the code

Yeah, as I see it that would not work with concurrent uncommitted 
sessions as it would be unresolved whether a conflict actually exists 
until at least one of the sessions completes.

> with deadlock error during race conditions where a lot of parallel 
> threads are operating. So how should we handle this scenario? Will 
> setting the "lock_timeout" parameter at session level will help us 
> anyway here?

Serializable transaction?:

https://www.postgresql.org/docs/current/transaction-iso.html#XACT-SERIALIZABLE

Or change the application code to not have this:

"... legitimate scenario in which the same ID can get inserted from 
multiple sessions ..."
-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: IO related waits

From
veem v
Date:


On Sat, 21 Sept 2024 at 03:47, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2024-09-20 14:11:38 -0700, Adrian Klaver wrote:
> On 9/20/24 1:01 PM, veem v wrote:
> > Able to reproduce this deadlock graph as below.  Now my question is ,
> > this is a legitimate scenario in which the same ID can get inserted from
> > multiple sessions and in such cases it's expected to skip that (thus "On
> > conflict Do nothing" is used) row. But as we see it's breaking the code
>
> Yeah, as I see it that would not work with concurrent uncommitted sessions
> as it would be unresolved whether a conflict actually exists until at least
> one of the sessions completes.
>
> > with deadlock error during race conditions where a lot of parallel
> > threads are operating. So how should we handle this scenario? Will
> > setting the "lock_timeout" parameter at session level will help us
> > anyway here?
>
> Serializable transaction?:
>
> https://www.postgresql.org/docs/current/transaction-iso.html#XACT-SERIALIZABLE

Doesn't help here, at least not directly. It would help indirectly
because isolation level serializable makes it very proable that
serialization errors occur. So an application designed for serializable
would have some kind of retry logic already in place.

SO that leads as to another solution:

Retry each batch (possibly after reducing the batch size) until it
succeeds.


Actually here the application is using kafka and  flink stream and is using one of existing code in which it was doing row by row commit which is now changed to Batch commit i.e. the commit point is shifted from row by row to batch now. There are multiple sessions spawning at the same time to process the incoming messages 24/7. And also as described in another ongoing thread and also we saw in the past we did not see much difference between "batch commit" and "Batch insert" performance. We only realized the row by row commit is having worse performance.

Now, in such a scenario when the messages are coming from a streaming framework using kafka +flink and also the insert is happening using row by row only (but just the commit is shifted to batch), i don't see any way to sort the ID columns in this streaming process, so that they won't overlap across session.

In such a situation , the only way seems to have the messages replayed for which the deadlock error happens , as I think during a deadlock error, one session gets terminated by postgres and that messages perhap we can save in some exception table and then replay? 

Re: IO related waits

From
Adrian Klaver
Date:
On 9/21/24 02:36, veem v wrote:
> 
> 

> 
> Actually here the application is using kafka and  flink stream and is 
> using one of existing code in which it was doing row by row commit which 
> is now changed to Batch commit i.e. the commit point is shifted from row 
> by row to batch now. There are multiple sessions spawning at the same 
> time to process the incoming messages 24/7. And also as described in 
> another ongoing thread and also we saw in the past we did not see much 
> difference between "batch commit" and "Batch insert" performance. We 
> only realized the row by row commit is having worse performance.

The bottom line is that to solve this a cost is going to have to be paid 
somewhere. Previously  it was done with autocommit in the form of slow 
insert performance. You improved the speed of the inserts by wrapping 
multiple inserts in transactions and that led you to this problem, where 
open transactions across sessions is leading to deadlock issues due to 
the same id being inserted in concurrent open sessions. Per my and Greg 
Sabino Mullane comments the solution is going to need planning. Right 
now you are playing a round of Whack-a-Mole by making ad-hoc changes of 
portions of the process without reference to the process as a whole.At 
some point the parallel duplicate ids(records) need to be straightened 
out into a serial application of data. You and the other people involved 
need to come up with a coherent view of the process as whole with a goal 
to achieving that. Then you can start planning on where that cost is 
best paid: 1) In the data before the streaming. 2) In the streaming 
process itself 3) In the database or 4) Spread out across 1-4.

> 
> Now, in such a scenario when the messages are coming from a streaming 
> framework using kafka +flink and also the insert is happening using row 
> by row only (but just the commit is shifted to batch), i don't see any 
> way to sort the ID columns in this streaming process, so that they 
> won't overlap across session.
> 
> In such a situation , the only way seems to have the messages replayed 
> for which the deadlock error happens , as I think during a deadlock 
> error, one session gets terminated by postgres and that messages perhap 
> we can save in some exception table and then replay?
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: IO related waits

From
Greg Sabino Mullane
Date:
You may be able to solve this with advisory locks. In particular, transaction-level advisory locks with the "try-pass/fail" variant. Here, "123" is a unique number used by your app, related to this particular table. You also need to force read committed mode, as the advisory locks go away after the transaction ends, so at that point, we switch from advisory locks to the ON CONFLICT clause, which can only work smoothly if other processes can see the new row right away.

drop table if exists t1;
create table t1(id int primary key);

-- Session 1:
begin transaction isolation level read committed;

-- inserts one row:
insert into t1 select 1 where pg_try_advisory_xact_lock(123,1)
  on conflict(id) do nothing;

-- Session 2:
begin transaction isolation level read committed;

-- inserts one row:
insert into t1 select 2 where pg_try_advisory_xact_lock(123,2)
  on conflict(id) do nothing;

-- Session 1:

-- silently 'fails' because no lock is granted, so inserts zero rows:
insert into t1 select 2 where pg_try_advisory_xact_lock(123,2)
  on conflict(id) do nothing;

-- Session 2:

-- silently 'fails' because no lock is granted, so inserts zero rows:
insert into t1 select 1 where pg_try_advisory_xact_lock(123,1)
  on conflict(id) do nothing;

-- inserts one row:
insert into t1 select 3 where pg_try_advisory_xact_lock(123,3)
  on conflict(id) do nothing;

commit; -- lock on 2 and 3 goes away

-- Session 1:

-- silently fails because of the on conflict clause
insert into t1 select 3 where pg_try_advisory_xact_lock(123,3)
  on conflict(id) do nothing;


Cheers,
Greg

Re: IO related waits

From
rob stone
Date:
Hello,

On Sat, 2024-09-21 at 10:20 -0700, Adrian Klaver wrote:
> On 9/21/24 02:36, veem v wrote:
> >
> >
>
> >
> > Actually here the application is using kafka and  flink stream and
> > is
> > using one of existing code in which it was doing row by row commit
> > which
> > is now changed to Batch commit i.e. the commit point is shifted
> > from row
> > by row to batch now. There are multiple sessions spawning at the
> > same
> > time to process the incoming messages 24/7. And also as described
> > in
> > another ongoing thread and also we saw in the past we did not see
> > much
> > difference between "batch commit" and "Batch insert" performance.
> > We
> > only realized the row by row commit is having worse performance.
>
> The bottom line is that to solve this a cost is going to have to be
> paid
> somewhere. Previously  it was done with autocommit in the form of
> slow
> insert performance. You improved the speed of the inserts by wrapping
> multiple inserts in transactions and that led you to this problem,
> where
> open transactions across sessions is leading to deadlock issues due
> to
> the same id being inserted in concurrent open sessions. Per my and
> Greg
> Sabino Mullane comments the solution is going to need planning. Right
> now you are playing a round of Whack-a-Mole by making ad-hoc changes
> of
> portions of the process without reference to the process as a
> whole.At
> some point the parallel duplicate ids(records) need to be
> straightened
> out into a serial application of data. You and the other people
> involved
> need to come up with a coherent view of the process as whole with a
> goal
> to achieving that. Then you can start planning on where that cost is
> best paid: 1) In the data before the streaming. 2) In the streaming
> process itself 3) In the database or 4) Spread out across 1-4.
>

You are using Apache's flink to pump data into your database. It seems
to me that you have this occurring:-

pipelineA --> sessionA
pipelineB --> sessionB
       etc.

You haven't said if the SQL code doing the inserts is coming out of
flink or if it is your own code.
If it is your own code make sure you are handling SQLException events
properly.

If there is a flink mailing list or user group, you'll probably find
more help with other flink users.

Follow Adrian's advice.

HTH,
Rob