Thread: Read write performance check
Hello All,
Its Aurora postgresql database in AWS. We have a couple of custom tables created with some ~100 odd columns and required indexes and constraints created on them and that represents a certain business use case. Currently we don't have any data in them. Requirement is to test read and write queries performance for this sample use cases/tables. We need to evaluate both row by row and batch read/write performance. Also performance with concurrent read/write from multiple threads/sessions.
As i understand below things to be done here,
1)For write performance , the rows needs to be inserted from multiple sessions at same time, with required random values as per the data types i.e. Character, Number, date columns. And this needs to be tested for row by row insert and batched insert.
2)For the read performance test , the table first has to be populated with those ~100million rows. Then querying will happen on that table row by row and batched way from the concurrent session.
I am new to postgresql but mostly worked with Oracle, so I wanted to understand if the above can be achieved by creating a simple procedure or will a major effort be needed? And I saw a few blogs , pgbench to be used for concurrency tests. I want to know if this will still work on Aurora postgresql from intellij client worksheet.
Its Aurora postgresql database in AWS. We have a couple of custom tables created with some ~100 odd columns and required indexes and constraints created on them and that represents a certain business use case. Currently we don't have any data in them. Requirement is to test read and write queries performance for this sample use cases/tables. We need to evaluate both row by row and batch read/write performance. Also performance with concurrent read/write from multiple threads/sessions.
As i understand below things to be done here,
1)For write performance , the rows needs to be inserted from multiple sessions at same time, with required random values as per the data types i.e. Character, Number, date columns. And this needs to be tested for row by row insert and batched insert.
2)For the read performance test , the table first has to be populated with those ~100million rows. Then querying will happen on that table row by row and batched way from the concurrent session.
I am new to postgresql but mostly worked with Oracle, so I wanted to understand if the above can be achieved by creating a simple procedure or will a major effort be needed? And I saw a few blogs , pgbench to be used for concurrency tests. I want to know if this will still work on Aurora postgresql from intellij client worksheet.
Regards
Veem
Hi Veem, On Tue, Dec 19, 2023 at 7:36 AM veem v <veema0000@gmail.com> wrote: > 1)For write performance , the rows needs to be inserted from multiple sessions at same time, with required random valuesas per the data types i.e. Character, Number, date columns. And this needs to be tested for row by row insert and batchedinsert. > > 2)For the read performance test , the table first has to be populated with those ~100million rows. Then querying will happenon that table row by row and batched way from the concurrent session. > > I am new to postgresql but mostly worked with Oracle, so I wanted to understand if the above can be achieved by creatinga simple procedure or will a major effort be needed? And I saw a few blogs , pgbench to be used for concurrency tests. Yes, you are right, pgbench with customized script is what you are looking for >I want to know if this will still work on Aurora postgresql from intellij client worksheet. pgbench would work with aurora as with normal postgres, it is basically a shell script which connects to the database. Not sure if idea worksheet would help you in such case however, you can run it just from any machine with pgbench installed Best regards, Ilya -- Ilya Kosmodemiansky CEO, Founder Data Egret GmbH Your remote PostgreSQL DBA team T.: +49 6821 919 3297 ik@dataegret.com
Thank you for the confirmation.
So at first, we need to populate the base tables with the necessary data (say 100million rows) with required skewness using random functions to generate the variation in the values of different data types. Then in case of row by row write/read test , we can traverse in a cursor loop. and in case of batch write/insert , we need to traverse in a bulk collect loop. Something like below and then this code can be wrapped into a procedure and passed to the pgbench and executed from there. Please correct me if I'm wrong.
Also can you please guide how the batch(say batch size of ~1000) Insert can be written ?
-- Row by row write
FOR i IN 1..total_rows LOOP
data_row := (SELECT
md5(random()::TEXT),
floor(random() * 100)::INT,
random() * 1000::NUMERIC,
NOW()::TIMESTAMP
);
INSERT INTO BASE_TABLE(column1, column2, column3, column4)
VALUES (data_row.column1, data_row.column2, data_row.column3, data_row.column4);
END LOOP;
--Row by row read
BEGIN
FOR i IN 1..total_rows LOOP
-- Row by row read
SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
END LOOP;
END;
-- Batch read
BEGIN
-- Batch read
OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
CLOSE data_set;
END;
FOR i IN 1..total_rows LOOP
data_row := (SELECT
md5(random()::TEXT),
floor(random() * 100)::INT,
random() * 1000::NUMERIC,
NOW()::TIMESTAMP
);
INSERT INTO BASE_TABLE(column1, column2, column3, column4)
VALUES (data_row.column1, data_row.column2, data_row.column3, data_row.column4);
END LOOP;
--Row by row read
BEGIN
FOR i IN 1..total_rows LOOP
-- Row by row read
SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
END LOOP;
END;
-- Batch read
BEGIN
-- Batch read
OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
CLOSE data_set;
END;
On Tue, 19 Dec 2023 at 12:58, Ilya Kosmodemiansky <ik@dataegret.com> wrote:
Hi Veem,
On Tue, Dec 19, 2023 at 7:36 AM veem v <veema0000@gmail.com> wrote:
> 1)For write performance , the rows needs to be inserted from multiple sessions at same time, with required random values as per the data types i.e. Character, Number, date columns. And this needs to be tested for row by row insert and batched insert.
>
> 2)For the read performance test , the table first has to be populated with those ~100million rows. Then querying will happen on that table row by row and batched way from the concurrent session.
>
> I am new to postgresql but mostly worked with Oracle, so I wanted to understand if the above can be achieved by creating a simple procedure or will a major effort be needed? And I saw a few blogs , pgbench to be used for concurrency tests.
Yes, you are right, pgbench with customized script is what you are looking for
>I want to know if this will still work on Aurora postgresql from intellij client worksheet.
pgbench would work with aurora as with normal postgres, it is
basically a shell script which connects to the database. Not sure if
idea worksheet would help you in such case however, you can run it
just from any machine with pgbench installed
Best regards,
Ilya
--
Ilya Kosmodemiansky
CEO, Founder
Data Egret GmbH
Your remote PostgreSQL DBA team
T.: +49 6821 919 3297
ik@dataegret.com
On 12/19/23 12:14, veem v wrote: > Thank you for the confirmation. > > So at first, we need to populate the base tables with the necessary > data (say 100million rows) with required skewness using random > functions to generate the variation in the values of different data > types. Then in case of row by row write/read test , we can traverse in > a cursor loop. and in case of batch write/insert , we need to traverse > in a bulk collect loop. Something like below and then this code can be > wrapped into a procedure and passed to the pgbench and executed from > there. Please correct me if I'm wrong. > > Also can you please guide how the batch(say batch size of ~1000) > Insert can be written ? > > -- Row by row write > FOR i IN 1..total_rows LOOP > data_row := (SELECT > md5(random()::TEXT), > floor(random() * 100)::INT, > random() * 1000::NUMERIC, > NOW()::TIMESTAMP > ); > INSERT INTO BASE_TABLE(column1, column2, column3, column4) > VALUES (data_row.column1, data_row.column2, data_row.column3, > data_row.column4); > END LOOP; > > --Row by row read > BEGIN > FOR i IN 1..total_rows LOOP > -- Row by row read > SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1; > END LOOP; > END; > This row by row is guaranteed to be slow if there's no index on the 100M rows > -- Batch read > BEGIN > -- Batch read > OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows; > CLOSE data_set; > END; > Does this batch read in the entire 100M row table? And some suspicious syntax PS: Notice that top posting is frowned upon on this list.
On 2023-12-20 00:44:48 +0530, veem v wrote: > So at first, we need to populate the base tables with the necessary data (say > 100million rows) with required skewness using random functions to generate the > variation in the values of different data types. Then in case of row by row > write/read test , we can traverse in a cursor loop. and in case of batch write/ > insert , we need to traverse in a bulk collect loop. Something like below and > then this code can be wrapped into a procedure and passed to the pgbench and > executed from there. Please correct me if I'm wrong. One important point to consider for benchmarks is that your benchmark has to be similar to the real application to be useful. If your real application runs on a different node and connects to the database over the network, a benchmark running within a stored procedure may not be very indicative of real performance. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Thank you.
So wanted to understand from experts here, if this approach is fine? Or some other approach is advisable?
I agree to the point that , network will play a role in real world app, but here, we are mainly wanted to see the database capability, as network will always play a similar kind of role across all databases. Do you suggest some other approach to achieve this objective?
On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:
On 2023-12-20 00:44:48 +0530, veem v wrote:
> So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.
One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
As Rob mentioned, the syntax you posted is not correct. You need to process or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one shot.
But again your uses case seems common one considering you want to compare the read and write performance on multiple databases with similar table structure as per your usecase. So in that case, you may want to use some test scripts which others must have already done rather reinventing the wheel.
On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:
Thank you.Yes, actually we are trying to compare and see what maximum TPS are we able to reach with both of these row by row and batch read/write test. And then afterwards, this figure may be compared with other databases etc with similar setups.So wanted to understand from experts here, if this approach is fine? Or some other approach is advisable?I agree to the point that , network will play a role in real world app, but here, we are mainly wanted to see the database capability, as network will always play a similar kind of role across all databases. Do you suggest some other approach to achieve this objective?On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:On 2023-12-20 00:44:48 +0530, veem v wrote:
> So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.
One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Thank you.
That would really be helpful if such test scripts or similar setups are already available. Can someone please guide me to some docs or blogs or sample scripts, on same please.
On Wed, 20 Dec, 2023, 10:34 am Lok P, <loknath.73@gmail.com> wrote:
As Rob mentioned, the syntax you posted is not correct. You need to process or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one shot.But again your uses case seems common one considering you want to compare the read and write performance on multiple databases with similar table structure as per your usecase. So in that case, you may want to use some test scripts which others must have already done rather reinventing the wheel.On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:Thank you.Yes, actually we are trying to compare and see what maximum TPS are we able to reach with both of these row by row and batch read/write test. And then afterwards, this figure may be compared with other databases etc with similar setups.So wanted to understand from experts here, if this approach is fine? Or some other approach is advisable?I agree to the point that , network will play a role in real world app, but here, we are mainly wanted to see the database capability, as network will always play a similar kind of role across all databases. Do you suggest some other approach to achieve this objective?On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:On 2023-12-20 00:44:48 +0530, veem v wrote:
> So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.
One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Can someone please guide me, if any standard scripting is available for doing such read/write performance test? Or point me to any available docs?
On Wed, 20 Dec, 2023, 10:39 am veem v, <veema0000@gmail.com> wrote:
Thank you.That would really be helpful if such test scripts or similar setups are already available. Can someone please guide me to some docs or blogs or sample scripts, on same please.On Wed, 20 Dec, 2023, 10:34 am Lok P, <loknath.73@gmail.com> wrote:As Rob mentioned, the syntax you posted is not correct. You need to process or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one shot.But again your uses case seems common one considering you want to compare the read and write performance on multiple databases with similar table structure as per your usecase. So in that case, you may want to use some test scripts which others must have already done rather reinventing the wheel.On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:Thank you.Yes, actually we are trying to compare and see what maximum TPS are we able to reach with both of these row by row and batch read/write test. And then afterwards, this figure may be compared with other databases etc with similar setups.So wanted to understand from experts here, if this approach is fine? Or some other approach is advisable?I agree to the point that , network will play a role in real world app, but here, we are mainly wanted to see the database capability, as network will always play a similar kind of role across all databases. Do you suggest some other approach to achieve this objective?On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:On 2023-12-20 00:44:48 +0530, veem v wrote:
> So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.
One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
As I mentioned your scenario looks generic one, but I don't have any sample scripts/docs to share, sorry for that. Other people may suggest any sample scripts etc if any. or you may post it on performance group, if someone has done similar stuff in the past.
But as per me, the performance test scripts will look like 1) row by row insert/select in cursor loop as you were showing earlier in this thread and another one with batch/bulk inserts. And then calling it through pgbench or any other scheduler for creating concurrency.
On Thu, Dec 21, 2023 at 7:00 PM veem v <veema0000@gmail.com> wrote:
Can someone please guide me, if any standard scripting is available for doing such read/write performance test? Or point me to any available docs?On Wed, 20 Dec, 2023, 10:39 am veem v, <veema0000@gmail.com> wrote:Thank you.That would really be helpful if such test scripts or similar setups are already available. Can someone please guide me to some docs or blogs or sample scripts, on same please.On Wed, 20 Dec, 2023, 10:34 am Lok P, <loknath.73@gmail.com> wrote:As Rob mentioned, the syntax you posted is not correct. You need to process or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one shot.But again your uses case seems common one considering you want to compare the read and write performance on multiple databases with similar table structure as per your usecase. So in that case, you may want to use some test scripts which others must have already done rather reinventing the wheel.On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:Thank you.Yes, actually we are trying to compare and see what maximum TPS are we able to reach with both of these row by row and batch read/write test. And then afterwards, this figure may be compared with other databases etc with similar setups.So wanted to understand from experts here, if this approach is fine? Or some other approach is advisable?I agree to the point that , network will play a role in real world app, but here, we are mainly wanted to see the database capability, as network will always play a similar kind of role across all databases. Do you suggest some other approach to achieve this objective?On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:On 2023-12-20 00:44:48 +0530, veem v wrote:
> So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.
One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On Thu, Dec 21, 2023 at 8:31 AM veem v <veema0000@gmail.com> wrote:
Can someone please guide me, if any standard scripting is available for doing such read/write performance test? Or point me to any available docs?
...
Veem, first things first... "Top Posting" is when you reply at the top of the email... Notice how I replied at the bottom (and I deleted context, clearly).
This is the style we prefer here.
This is the style we prefer here.
Second, since you are new to postgreSQL... Let me recommend some reading. Cybertec has articles on performance (Tom Kyte style).
Also, read the "Don't Do That" wiki, and finally, have a look at pgbench and psql documentation. And specifically look at GENERATE_SERIES(),
but the Cybertec articles will touch on that. Regardless... Reading the docs is insightful.
Links:
https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
https://wiki.postgresql.org/wiki/Don't_Do_This
https://www.postgresql.org/docs/current/app-psql.html
https://www.postgresql.org/docs/current/pgbench.html
https://www.postgresql.org/docs/16/functions-srf.html
HTH,
Kirk Out!
Also, read the "Don't Do That" wiki, and finally, have a look at pgbench and psql documentation. And specifically look at GENERATE_SERIES(),
but the Cybertec articles will touch on that. Regardless... Reading the docs is insightful.
Links:
https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
https://wiki.postgresql.org/wiki/Don't_Do_This
https://www.postgresql.org/docs/current/app-psql.html
https://www.postgresql.org/docs/current/pgbench.html
https://www.postgresql.org/docs/16/functions-srf.html
HTH,
Kirk Out!
Veem
You should also be familiar with Aurora Postgres's storage architecture, which is very different from regular Postgres (see https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html)
Aurora is remote storage, which means if your read workload can't fit into the PG's shared buffers, it will have a very different performance than if the storage is a local SSD.
On write, it writes six copies to three different availability zones for high durability and availablity. So having enough network bandwidth is a factor as well.
Ken
On Tue, Dec 26, 2023 at 11:30 PM Kirk Wolak <wolakk@gmail.com> wrote:
On Thu, Dec 21, 2023 at 8:31 AM veem v <veema0000@gmail.com> wrote:Can someone please guide me, if any standard scripting is available for doing such read/write performance test? Or point me to any available docs?
...Veem, first things first... "Top Posting" is when you reply at the top of the email... Notice how I replied at the bottom (and I deleted context, clearly).
This is the style we prefer here.Second, since you are new to postgreSQL... Let me recommend some reading. Cybertec has articles on performance (Tom Kyte style).
Also, read the "Don't Do That" wiki, and finally, have a look at pgbench and psql documentation. And specifically look at GENERATE_SERIES(),
but the Cybertec articles will touch on that. Regardless... Reading the docs is insightful.
Links:
https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
https://wiki.postgresql.org/wiki/Don't_Do_This
https://www.postgresql.org/docs/current/app-psql.html
https://www.postgresql.org/docs/current/pgbench.html
https://www.postgresql.org/docs/16/functions-srf.html
HTH,
Kirk Out!