Thread: Parallel queries for a web-application |performance testing
Hello,
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.
I will have a web application having postgres 8.4+ as backend. At any given time, there will be max of 1000 parallel web-users interacting with the database (read/write)
I wish to do performance testing of 1000 simultaneous read/write to the database.
I can do a simple unix script on the postgres server and have parallel updates fired for example with an ampersand at the end. Example:
echo '\timing \\update "DAPP".emp_data set f1 = 123where emp_id =0;' | "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> "/home/user/Documents/temp/logs/$NUM.txt" &
pid1=$!
echo '\timing \\update "DAPP".emp_data set f1 = 123 where emp_id =2;' | "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> "/home/user/Documents/temp/logs/$NUM.txt" &
pid2=$!
echo '\timing \\update "DAPP".emp_data set f1 = 123 where emp_id =4;' | "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> "/home/user/Documents/temp/logs/$NUM.txt" &
pid3=$!
.......
......
My question is:
Am I losing something by firing these queries directly off the server and should I look at firing the queries from different IP address (as it would happen in a web application). Would the way postgres opens sockets/allocates buffer etc change in the two approaches and I get non-realistic results by a unix script on the server ?
It will be very tedious exercise to have 1000 different machines (IP address) and each firing a query; all the same time. But at the same time, I want to be absolutely sure my test would give the same result in production (requirements for latency for read/write is very very low)
I am not interested in the network time; just the database read/write time.
Thanks for any tips !
-Bala
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.
Balkrishna Sharma <b_ki@hotmail.com> wrote: > I wish to do performance testing of 1000 simultaneous read/write > to the database. You should definitely be using a connection pool of some sort. Both your throughput and response time will be better that way. You'll want to test with different pool sizes, but I've found that a size which allows the number of active queries in PostgreSQL to be somewhere around (number_of_cores * 2) + effective_spindle_count to be near the optimal size. > My question is:Am I losing something by firing these queries > directly off the server and should I look at firing the queries > from different IP address (as it would happen in a web application). If you run the client side of your test on the database server, the CPU time used by the client will probably distort your results. I would try using one separate machine to generate the requests, but monitor to make sure that the client machine isn't hitting some bottleneck (like CPU time). If the client is the limiting factor, you may need to use more than one client machine. No need to use 1000 different client machines. :-) -Kevin
Balkrishna Sharma <b_ki@hotmail.com> writes: > I will have a web application having postgres 8.4+ as backend. At any given time, there will be max of 1000 parallel web-usersinteracting with the database (read/write) > I wish to do performance testing of 1000 simultaneous read/write to > the database. See about tsung, and either benckmarck only the PostgreSQL side of things, or at the HTTP side of things directly : that will run your application code against PostgreSQL. http://tsung.erlang-projects.org/ And as Kevin said, consider using a connection pool, such as pgbouncer. Once you have setup the benchmark with Tsung, adding pgbouncer and comparing the results will be easy. Regards, -- dim
On Wed, 16 Jun 2010, Balkrishna Sharma wrote: > Hello,I will have a web application having postgres 8.4+ as backend. At > any given time, there will be max of 1000 parallel web-users interacting > with the database (read/write)I wish to do performance testing of 1000 > simultaneous read/write to the database. When you set up a server that has high throughput requirements, the last thing you want to do is use it in a manner that cripples its throughput. Don't try and have 1000 parallel Postgres backends - it will process those queries slower than the optimal setup. You should aim to have approximately ((2 * cpu core count) + effective spindle count) number of backends, as that is the point at which throughput is the greatest. You can use pgbouncer to achieve this. > I can do a simple unix script on the postgres server and have parallel > updates fired for example with an ampersand at the end. Example: > echo '\timing \\update "DAPP".emp_data set f1 = 123where emp_id =0;' | > "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> > "/home/user/Documents/temp/logs/$NUM.txt" &pid1=$! echo '\timing > \\update "DAPP".emp_data set f1 = 123 where emp_id =2;' | "psql" test1 > postgres|grep "Time:"|cut -d' ' -f2- >> > "/home/user/Documents/temp/logs/$NUM.txt" &pid2=$! echo '\timing > \\update "DAPP".emp_data set f1 = 123 where emp_id =4;' | "psql" test1 > postgres|grep "Time:"|cut -d' ' -f2- >> > "/home/user/Documents/temp/logs/$NUM.txt" &pid3=$! ............. Don't do that. The overhead of starting up an echo, a psql, and a grep will limit the rate at which these queries can be fired at Postgres, and consume quite a lot of CPU. Use a proper benchmarking tool, possibly on a different server. Also, you should be using a different username to "postgres" - that one is kind of reserved for superuser operations. Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann
> When you set up a server that has high throughput requirements, the last > thing you want to do is use it in a manner that cripples its throughput. > Don't try and have 1000 parallel Postgres backends - it will process > those queries slower than the optimal setup. You should aim to have > approximately ((2 * cpu core count) + effective spindle count) number of > backends, as that is the point at which throughput is the greatest. You > can use pgbouncer to achieve this. The same is true of a web server : 1000 active php interpreters (each eating several megabytes or more) are not ideal for performance ! For php, I like lighttpd with php-fastcgi : the webserver proxies requests to a small pool of php processes, which are only busy while generating the page. Once the page is generated the webserver handles all (slow) IO to the client. An interesting side effect is that the number of database connections is limited to the number of PHP processes in the pool, so you don't even need a postgres connection pooler (unless you have lots of php boxes)...
"Pierre C" <lists@peufeu.com> writes: > The same is true of a web server : 1000 active php interpreters (each eating > several megabytes or more) are not ideal for performance ! > > For php, I like lighttpd with php-fastcgi : the webserver proxies requests > to a small pool of php processes, which are only busy while generating the > page. Once the page is generated the webserver handles all (slow) IO to the > client. I use haproxy for that, it handles requests queues very effectively. -- dim