Re: IO related waits - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: IO related waits |
Date | |
Msg-id | fd6afa3b-f5db-4632-8e25-678ef66703d5@aklaver.com Whole thread Raw |
In response to | Re: IO related waits (veem v <veema0000@gmail.com>) |
Responses |
Re: IO related waits
|
List | pgsql-general |
On 9/17/24 12:34, veem v wrote: > > On Tue, 17 Sept 2024 at 21:24, Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > > Which means you need to on Flink end: > > 1) Use Flink async I/O . > > 2) Find a client that supports async or fake it by using multiple > synchronous clients. > > On Postgres end there is this: > > https://www.postgresql.org/docs/current/wal-async-commit.html > <https://www.postgresql.org/docs/current/wal-async-commit.html> > > That will return a success signal to the client quicker if > synchronous_commit is set to off. Though the point of the Flink async > I/O is not to wait for the response before moving on, so I am not sure > how much synchronous_commit = off would help. > > > Got it. So it means their suggestion was to set the asynch_io at flink > level but not DB level, so that the application will not wait for the > commit response from the database. But in that case , won't it overload > the DB with more and more requests if database will keep doing the > commit ( with synchronous_commit=ON) and waiting for getting the > response back from its storage for the WAL's to be flushed to the disk, > while the application will not wait for its response back(for those > inserts) and keep flooding the database with more and more incoming > Insert requests? My point is this is a multi-layer cake with layers: 1) Flink asycnc io 2) Database client async/sync 3) Postgres sync status. That is a lot of moving parts and determining whether it is suitable is going to require rigorous testing over a representative data load. See more below. > > Additionally as I mentioned before, we see that from "pg_stat_database" > from the column "xact_commit" , it's almost matching with the sum of > "tup_inserted", "tup_updated", "tup_deleted" column. And also we > verified in pg_stats_statements the "calls" column is same as the > "rows" column for the INSERT queries, so it means also we are inserting > exactly same number of rows as the number of DB calls, so doesn't it > suggest that we are doing row by row operations/dmls. > > Also after seeing above and asking application team to do the batch > commit ,we are still seeing the similar figures from pg_stat_database > and pg_stat_statements, so does it mean that we are looking into wrong > stats? or the application code change has not been done accurately? and > we see even when no inserts are running from the application side, we do > see "xact_commit" keep increasing along with "tup_fetched" , why so? > > Finally we see in postgres here, even if we just write a DML statement > it does commit that by default, until we explicitly put it in a > "begin... end" block. Can that be the difference between how a "commit" > gets handled in postgres vs other databases? It does if autocommit is set in the client, that is common to other databases also: https://dev.mysql.com/doc/refman/8.4/en/commit.html https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16 You probably need to take a closer look at the client/driver you are using and the code that interacting with it. In fact I would say you need to review the entire data transfer process to see if there are performance gains that can be obtained without adding an entirely new async component. > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: