Thread: xmin and very high number of concurrent transactions
I was asked this question in one of my demos, and it was interesting one. we update xmin for new inserts with the current txid. now in a very high concurrent scenario where there are more than 2000 concurrent users trying to insert new data, will updating xmin value be a bottleneck? i know we should use pooling solutions to reduce concurrent connections but given we have enough resources to take care of spawning a new process for a new connection, Regards, Vijay
On 3/12/19 12:19 PM, Vijaykumar Jain wrote: > I was asked this question in one of my demos, and it was interesting one. > > we update xmin for new inserts with the current txid. Why? > now in a very high concurrent scenario where there are more than 2000 > concurrent users trying to insert new data, > will updating xmin value be a bottleneck? > > i know we should use pooling solutions to reduce concurrent > connections but given we have enough resources to take care of > spawning a new process for a new connection, > > Regards, > Vijay > > -- Adrian Klaver adrian.klaver@aklaver.com
no i mean not we end users, postgres does it (?) via the xmin and xmax fields from inherited tables :) if that is what you wanted in a why or are you asking, does postgres even update those rows and i am wrong assuming it that way? since the values need to be atomic, consider the below analogy assuming i(postgres) am person giving out token to people(connections/tx) in a queue. if there is a single line, (sequential) then it is easy for me to simply give them 1 token incrementing the value and so on. but if there are thousands of users in parallel lines, i am only one person delivering the token, will operate sequentially, and the other person is "blocked" for sometime before it gets the token with the required value. so if there are 1000s or users with the "delay" may impact my performance coz i need to maintain the value of the token to be able to know what token value i need to give to next person? i do not know if am explaining it correctly, pardon my analogy, Regards, Vijay On Wed, Mar 13, 2019 at 1:10 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > On 3/12/19 12:19 PM, Vijaykumar Jain wrote: > > I was asked this question in one of my demos, and it was interesting one. > > > > we update xmin for new inserts with the current txid. > > Why? > > > now in a very high concurrent scenario where there are more than 2000 > > concurrent users trying to insert new data, > > will updating xmin value be a bottleneck? > > > > i know we should use pooling solutions to reduce concurrent > > connections but given we have enough resources to take care of > > spawning a new process for a new connection, > > > > Regards, > > Vijay > > > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com
On 3/12/19 1:02 PM, Vijaykumar Jain wrote: > no i mean not we end users, postgres does it (?) via the xmin and xmax > fields from inherited tables :) if that is what you wanted in a why > or are you asking, does postgres even update those rows and i am wrong > assuming it that way? Not sure where the inherited tables come in? See below for more info: https://www.postgresql.org/docs/11/storage-page-layout.html AFAIK xmin and xmax are just done as part of the insert or delete operations so there is no updating involved. I would say the impact to performance would come from the overhead of each connection rather then maintaining xmin/xmax. > > since the values need to be atomic, > consider the below analogy > assuming i(postgres) am person giving out token to > people(connections/tx) in a queue. > if there is a single line, (sequential) then it is easy for me to > simply give them 1 token incrementing the value and so on. > but if there are thousands of users in parallel lines, i am only one > person delivering the token, will operate sequentially, and the other > person is "blocked" for sometime before it gets the token with the > required value. > so if there are 1000s or users with the "delay" may impact my > performance coz i need to maintain the value of the token to be able > to know what token value i need to give to next person? > > i do not know if am explaining it correctly, pardon my analogy, > > > Regards, > Vijay > > On Wed, Mar 13, 2019 at 1:10 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> >> On 3/12/19 12:19 PM, Vijaykumar Jain wrote: >>> I was asked this question in one of my demos, and it was interesting one. >>> >>> we update xmin for new inserts with the current txid. >> >> Why? >> >>> now in a very high concurrent scenario where there are more than 2000 >>> concurrent users trying to insert new data, >>> will updating xmin value be a bottleneck? >>> >>> i know we should use pooling solutions to reduce concurrent >>> connections but given we have enough resources to take care of >>> spawning a new process for a new connection, >>> >>> Regards, >>> Vijay >>> >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com -- Adrian Klaver adrian.klaver@aklaver.com
I may have misunderstood the documentation or your question, but I had the understanding that xmin is not updated, but is only set on insert (but yes, also for update, but updates are also inserts for Postgres as updates are executed as delete/insert) from https://www.postgresql.org/docs/10/ddl-system-columns.html > xmin > The identity (transaction ID) of the inserting transaction for this row version. (A row version is an individual state of > row; each update of a row creates a new row version for the same logical row.) therfore I assume, there are no actual updates of xmin values Stefan On 12.03.2019 20:19, Vijaykumar Jain wrote: > I was asked this question in one of my demos, and it was interesting one. > > we update xmin for new inserts with the current txid. > now in a very high concurrent scenario where there are more than 2000 > concurrent users trying to insert new data, > will updating xmin value be a bottleneck? > > i know we should use pooling solutions to reduce concurrent > connections but given we have enough resources to take care of > spawning a new process for a new connection, > > Regards, > Vijay >
Vijaykumar Jain wrote: > I was asked this question in one of my demos, and it was interesting one. > > we update xmin for new inserts with the current txid. > now in a very high concurrent scenario where there are more than 2000 > concurrent users trying to insert new data, > will updating xmin value be a bottleneck? > > i know we should use pooling solutions to reduce concurrent > connections but given we have enough resources to take care of > spawning a new process for a new connection, You can read the function GetNewTransactionId in src/backend/access/transam/varsup.c for details. Transaction ID creation is serialized with a "light-weight lock", so it could potentially be a bottleneck. Often that is dwarfed by the I/O requirements from many concurrent commits, but if most of your transactions are rolled back or you use "synchronous_commit = off", I can imagine that it could matter. It is not a matter of how many clients there are, but of how often a new writing transaction is started. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Wed, Mar 13, 2019 at 9:50 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Vijaykumar Jain wrote: > > I was asked this question in one of my demos, and it was interesting one. > > > > we update xmin for new inserts with the current txid. > > now in a very high concurrent scenario where there are more than 2000 > > concurrent users trying to insert new data, > > will updating xmin value be a bottleneck? > > > > i know we should use pooling solutions to reduce concurrent > > connections but given we have enough resources to take care of > > spawning a new process for a new connection, > > You can read the function GetNewTransactionId in > src/backend/access/transam/varsup.c for details. > > Transaction ID creation is serialized with a "light-weight lock", > so it could potentially be a bottleneck. Also I think that GetSnapshotData() would be the major bottleneck way before GetNewTransactionId() becomes problematic. Especially with such a high number of active backends.
Thank you everyone for responding.
Appreciate your help.
Looks like I need to understand the concepts a little more in detail , to be able to ask the right questions, but atleast now I can look at the relevant docs.
On Wed, 13 Mar 2019 at 2:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Mar 13, 2019 at 9:50 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> Vijaykumar Jain wrote:
> > I was asked this question in one of my demos, and it was interesting one.
> >
> > we update xmin for new inserts with the current txid.
> > now in a very high concurrent scenario where there are more than 2000
> > concurrent users trying to insert new data,
> > will updating xmin value be a bottleneck?
> >
> > i know we should use pooling solutions to reduce concurrent
> > connections but given we have enough resources to take care of
> > spawning a new process for a new connection,
>
> You can read the function GetNewTransactionId in
> src/backend/access/transam/varsup.c for details.
>
> Transaction ID creation is serialized with a "light-weight lock",
> so it could potentially be a bottleneck.
Also I think that GetSnapshotData() would be the major bottleneck way
before GetNewTransactionId() becomes problematic. Especially with
such a high number of active backends.
Vijay