Thread: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Miernik
Date:
I have a PostgreSQL database on a very low-resource Xen virtual machine, 48 MB RAM. When two queries run at the same time, it takes longer to complete then if run in sequence. Is there perhaps a way to install something like a query sequencer, which would process queries in a FIFO manner, one at a time, even if a new query comes before the last one running is finished, it would not give the new query to the server before the one running now finishes? That would greatly improve performance. Any tips in general for running PostgreSQL on such low-resource machine? I have: shared_buffers = 5MB work_mem = 1024kB are these good values, or could perhaps changing something improve it a bit? Any other parameters to look at? -- Miernik http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
"Scott Marlowe"
Date:
On Wed, Jul 23, 2008 at 9:21 AM, Miernik <public@public.miernik.name> wrote: > I have a PostgreSQL database on a very low-resource Xen virtual machine, > 48 MB RAM. When two queries run at the same time, it takes longer to > complete then if run in sequence. Is there perhaps a way to install > something like a query sequencer, which would process queries in a FIFO > manner, one at a time, even if a new query comes before the last one > running is finished, it would not give the new query to the server > before the one running now finishes? That would greatly improve > performance. > > Any tips in general for running PostgreSQL on such low-resource machine? > > I have: > > shared_buffers = 5MB > work_mem = 1024kB > > are these good values, or could perhaps changing something improve it a > bit? Any other parameters to look at? Well, you're basically working on a really limited machine there. I'd set shared buffers up by 1 meg at a time and see if that helps. But you're basically looking at a very narrow problem that most people won't ever run into. Why such an incredibly limited virtual machine? Even my cell phone came with 256 meg built in two years ago.
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Alvaro Herrera
Date:
Miernik wrote: > I have a PostgreSQL database on a very low-resource Xen virtual machine, > 48 MB RAM. When two queries run at the same time, it takes longer to > complete then if run in sequence. Is there perhaps a way to install > something like a query sequencer, which would process queries in a FIFO > manner, one at a time, even if a new query comes before the last one > running is finished, it would not give the new query to the server > before the one running now finishes? That would greatly improve > performance. One idea I just had was to have a connection pooler (say pgpool) and allow only one connection slot. If the pooler is capable to be configured to block new connections until the slot is unused, this would do what you want. (I don't know whether poolers allow you to do this). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Miernik
Date:
Scott Marlowe <scott.marlowe@gmail.com> wrote: > won't ever run into. Why such an incredibly limited virtual machine? > Even my cell phone came with 256 meg built in two years ago. Because I don't want to spend too much money on the machine rent, and a 48 MB RAM Xen is about all I can get with a budget of 100$ per year. Well, there are a few providers which will give me a 128 MB Xen for that money, but will the difference in performance be worth the hassle to switch providers? My current provider gives me almost perfect relaliability for that 100$ and I don't know how the providers which give more RAM for the same money perform, maybe they are often down or something. And spending more then 100$ yearly on this would be really overkill. My thing runs fine, only a bit slow, but reasonable. I just want to find out if I could maybe make it better with a little tweaking. Can I expect it to work at least three times faster on 128 MB RAM? Getting 256 MB would certainly cost too much. Or maybe there are some providers which can give me much more performance PostgreSQL server with at least several GB of storage for well... not more then 50$ per year. (because I must still rent another server to run and SMTP server and few other small stuff). My DB has several tables with like 100000 to 1 million rows each, running sorts, joins, updates etc on them several times per hour. About 10000 inserts and selects each hour, the whole DB takes 1.5 GB on disk now, 500 MB dumped. If I could shorten the time it takes to run each query by a factor of 3 that's something worth going for. -- Miernik http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
"Scott Marlowe"
Date:
On Wed, Jul 23, 2008 at 2:32 PM, Miernik <public@public.miernik.name> wrote: > Scott Marlowe <scott.marlowe@gmail.com> wrote: >> won't ever run into. Why such an incredibly limited virtual machine? >> Even my cell phone came with 256 meg built in two years ago. > > Because I don't want to spend too much money on the machine rent, and a > 48 MB RAM Xen is about all I can get with a budget of 100$ per year. > Well, there are a few providers which will give me a 128 MB Xen for that > money, but will the difference in performance be worth the hassle to > switch providers? My current provider gives me almost perfect > relaliability for that 100$ and I don't know how the providers which > give more RAM for the same money perform, maybe they are often down or > something. And spending more then 100$ yearly on this would be really > overkill. My thing runs fine, only a bit slow, but reasonable. I just > want to find out if I could maybe make it better with a little tweaking. > Can I expect it to work at least three times faster on 128 MB RAM? > Getting 256 MB would certainly cost too much. Or maybe there are some > providers which can give me much more performance PostgreSQL server with > at least several GB of storage for well... not more then 50$ per year. > (because I must still rent another server to run and SMTP server and few > other small stuff). > > My DB has several tables with like 100000 to 1 million rows each, > running sorts, joins, updates etc on them several times per hour. > About 10000 inserts and selects each hour, the whole DB takes 1.5 GB on > disk now, 500 MB dumped. > > If I could shorten the time it takes to run each query by a factor of 3 > that's something worth going for. Well, my guess is that by running under Xen you're already sacrificing quite a bit of performance, and running it with only 48 Megs of ram is making it even worse. But if your budget is $100 a year, I guess you're probably stuck with such a setup. I would see if you can get a trial Xen at the other hosts with 128M or more of memory and compare.
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Miernik
Date:
Scott Marlowe <scott.marlowe@gmail.com> wrote: > Well, my guess is that by running under Xen you're already sacrificing > quite a bit of performance, and running it with only 48 Megs of ram is > making it even worse. But if your budget is $100 a year, I guess > you're probably stuck with such a setup. I would see if you can get a > trial Xen at the other hosts with 128M or more of memory and compare. Is running in a 48 MB Xen any different in terms of performance to running on a hardware 48 MB RAM machine? I see that I am not the only one with such requirements, one guy even set up a site listing all VPS providers which host for under 7$ per month: http://www.lowendbox.com/virtual-server-comparison/ You can see that the most you can get for that money is a 128 MB OpenVZ I wonder if running PostgreSQL on OpenVZ is any different to running it on Xen in terms of performance. Oh here is something more: http://vpsempire.com/action.php?do=vpslite 256 MB for 7.45$ per month 512 MB for 11.95$ per month however it doesn't say what is the virtualization software, so don't really know what it is. -- Miernik http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Albert Cervera Areny
Date:
A Dimecres 23 Juliol 2008, Miernik va escriure: > I have a PostgreSQL database on a very low-resource Xen virtual machine, > 48 MB RAM. When two queries run at the same time, it takes longer to > complete then if run in sequence. Is there perhaps a way to install > something like a query sequencer, which would process queries in a FIFO > manner, one at a time, even if a new query comes before the last one > running is finished, it would not give the new query to the server > before the one running now finishes? That would greatly improve > performance. You didn't mention your PostgreSQL version. Since 8.3 there's "synchronized scans" which greatly improves performance if concurrent queries have to do a sequential scan on the same table. Of course, if queries don't hit the same table there'll be no improvements in performance... > > Any tips in general for running PostgreSQL on such low-resource machine? > > I have: > > shared_buffers = 5MB > work_mem = 1024kB > > are these good values, or could perhaps changing something improve it a > bit? Any other parameters to look at? > > -- > Miernik > http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Richard Huxton
Date:
Miernik wrote: > Scott Marlowe <scott.marlowe@gmail.com> wrote: >> won't ever run into. Why such an incredibly limited virtual machine? >> Even my cell phone came with 256 meg built in two years ago. > > Because I don't want to spend too much money on the machine rent, and a > 48 MB RAM Xen is about all I can get with a budget of 100$ per year. [snip] > My DB has several tables with like 100000 to 1 million rows each, > running sorts, joins, updates etc on them several times per hour. > About 10000 inserts and selects each hour, the whole DB takes 1.5 GB on > disk now, 500 MB dumped. > > If I could shorten the time it takes to run each query by a factor of 3 > that's something worth going for. Firstly, congratulations on providing quite a large database on such a limited system. I think most people on such plans have tables with a few hundred to a thousand rows in them, not a million. Many of the people here are used to budgets a hundred or a thousand times of yours, so bear in mind you're as much an expert as them :-) If you're going to get the most out of this, you'll want to set up your own Xen virtual machine on a local system so you can test changes. You'll be trading your time against the budget, so bear that in mind. If you know other small organisations locally in a similar position perhaps consider sharing a physical machine and managing Xen yourselves - that can be cheaper. Changes First step is to make sure you're running version 8.3 - there are some useful improvements there that reduce the size of shorter text fields, as well as the synchronised scans Albert mentions below. Second step is to make turn off any other processes you don't need. Tune down the number of consoles, apache processes, mail processes etc. Normally not worth the trouble, but getting another couple of MB is worthwhile in your case. Might be worth turning off autovacuum and running a manual vacuum full overnight if your database is mostly reads. Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro said) and set them to allow only one connection in the pool. I know that pgbouncer offers per-transaction connection sharing which will make this more practical. Even so, it will help if your application can co-operate by closing the connection as soon as possible. -- Richard Huxton Archonet Ltd
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Miernik
Date:
Richard Huxton <dev@archonet.com> wrote: > Firstly, congratulations on providing quite a large database on such a > limited system. I think most people on such plans have tables with a > few hundred to a thousand rows in them, not a million. Many of the > people here are used to budgets a hundred or a thousand times of > yours, so bear in mind you're as much an expert as them :-) Well, I proved that it can reasonably well work, and I am finetuning the system step by step, so it can work better. > If you're going to get the most out of this, you'll want to set up > your own Xen virtual machine on a local system so you can test > changes. Good idea. > If you know other small organisations locally in a similar position > perhaps consider sharing a physical machine and managing Xen > yourselves - that can be cheaper. Well, maybe, but its also a lot of hassle, not sure it's worth it, just looking to get the most out of thje existing system. > First step is to make sure you're running version 8.3 - there are some > useful improvements there that reduce the size of shorter text fields, > as well as the synchronised scans Albert mentions below. I am running 8.3.3 > Second step is to make turn off any other processes you don't need. > Tune down the number of consoles, apache processes, mail processes > etc. Normally not worth the trouble, but getting another couple of MB > is worthwhile in your case. There is no apache, but lighttpd, right now: root@polica:~# free total used free shared buffers cached Mem: 49344 47840 1504 0 4 23924 -/+ buffers/cache: 23912 25432 Swap: 257000 9028 247972 root@polica:~# > Might be worth turning off autovacuum and running a manual vacuum full > overnight if your database is mostly reads. I run autovacum, and the database has a lot of updates all the time, also TRUNCATING tables and refilling them, usually one or two INSERTS/UPDATES per second. > Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro > said) and set them to allow only one connection in the pool. I know > that pgbouncer offers per-transaction connection sharing which will > make this more practical. Even so, it will help if your application > can co-operate by closing the connection as soon as possible. I just installed pgpool2 and whoaaa! Everything its like about 3 times faster! My application are bash scripts using psql -c "UPDATE ...". I plan to rewrite it in Python, not sure if it would improve performance, but will at least be a "cleaner" implementation. In /etc/pgpool.conf I used: # number of pre-forked child process num_init_children = 1 # Number of connection pools allowed for a child process max_pool = 1 Wanted to install pgbouncer, but it is broken currently in Debian. And why is it in contrib and not in main (speaking of Debian location)? -- Miernik http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Richard Huxton
Date:
Miernik wrote: >> Might be worth turning off autovacuum and running a manual vacuum full >> overnight if your database is mostly reads. > > I run autovacum, and the database has a lot of updates all the time, > also TRUNCATING tables and refilling them, usually one or two > INSERTS/UPDATES per second. OK >> Finally, I think it's worth looking at pgpool or pgbouncer (as Alvaro >> said) and set them to allow only one connection in the pool. I know >> that pgbouncer offers per-transaction connection sharing which will >> make this more practical. Even so, it will help if your application >> can co-operate by closing the connection as soon as possible. > > I just installed pgpool2 and whoaaa! Everything its like about 3 times > faster! My application are bash scripts using psql -c "UPDATE ...". Probably spending most of their time setting up a new connection, then clearing it down again. > I plan to rewrite it in Python, not sure if it would improve > performance, but will at least be a "cleaner" implementation. Careful of introducing any more overheads though. If libraries end up using another 2.5MB of RAM then that's 10% of your disk-cache gone. > In /etc/pgpool.conf I used: > > # number of pre-forked child process > num_init_children = 1 > > # Number of connection pools allowed for a child process > max_pool = 1 Might need to increase that to 2 or 3. > Wanted to install pgbouncer, but it is broken currently in Debian. And > why is it in contrib and not in main (speaking of Debian location)? Not well known enough on the Debian side of the fence? It's simple enough to install from source though. Takes about one minute. -- Richard Huxton Archonet Ltd
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Miernik
Date:
Richard Huxton <dev@archonet.com> wrote: >> I just installed pgpool2 and whoaaa! Everything its like about 3 times >> faster! My application are bash scripts using psql -c "UPDATE ...". > > Probably spending most of their time setting up a new connection, then > clearing it down again. If I do it in Python it could do all queries in the same connection, so should be faster? Besides that 'psql' is written in perl, so its also heavy, by not using psql I get rid of perl library in RAM. Also the script uses wget to poll some external data sources a lot, also needlessly opening new connection to the webserver, so I want to make the script save the http connection, which means I must get rid of wget. Maybe I should write some parts in C? BTW, doesn't there exist any tool does what "psql -c" does, but is written in plain C, not perl? I was looking for such psql replacement, but couldn't find any. >> # Number of connection pools allowed for a child process >> max_pool = 1 > > Might need to increase that to 2 or 3. Why? The website says: max_pool The maximum number of cached connections in pgpool-II children processes. pgpool-II reuses the cached connection if an incoming connection is connecting to the same database by the same username. But all my connections are to the same database and the same username, and I only ever want my application to do 1 connection to the database at a time, so why would I want/need 2 or 3 in max_pool? > Not well known enough on the Debian side of the fence? It's simple > enough to install from source though. Takes about one minute. But is there any advantage for me compared to pgpool2, which works really nice? In some parts, like doing some count(*) stuff, it now does things in about one second, which took a few minutes to finish before (if the other part of the scripts where doing something else on the database at the same time). -- Miernik http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Theo Kramer
Date:
On 31 Jul 2008, at 10:29AM, Miernik wrote: > Richard Huxton <dev@archonet.com> wrote: >>> I just installed pgpool2 and whoaaa! Everything its like about 3 >>> times >>> faster! My application are bash scripts using psql -c "UPDATE ...". >> >> Probably spending most of their time setting up a new connection, >> then >> clearing it down again. > > If I do it in Python it could do all queries in the same connection, > so > should be faster? Besides that 'psql' is written in perl, so its also > heavy, by not using psql I get rid of perl library in RAM. Also the > script uses wget to poll some external data sources a lot, also > needlessly opening new connection to the webserver, so I want to make > the script save the http connection, which means I must get rid of > wget. > Maybe I should write some parts in C? > > BTW, doesn't there exist any tool does what "psql -c" does, but is > written in plain C, not perl? I was looking for such psql replacement, > but couldn't find any. ? file `which psql` /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped -- Regards Theo
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Richard Huxton
Date:
Miernik wrote: > Richard Huxton <dev@archonet.com> wrote: >>> I just installed pgpool2 and whoaaa! Everything its like about 3 times >>> faster! My application are bash scripts using psql -c "UPDATE ...". >> Probably spending most of their time setting up a new connection, then >> clearing it down again. > > If I do it in Python it could do all queries in the same connection, so > should be faster? Besides that 'psql' is written in perl, so its also > heavy, by not using psql I get rid of perl library in RAM. Nope - "C" all through. > Also the > script uses wget to poll some external data sources a lot, also > needlessly opening new connection to the webserver, so I want to make > the script save the http connection, which means I must get rid of wget. > Maybe I should write some parts in C? > > BTW, doesn't there exist any tool does what "psql -c" does, but is > written in plain C, not perl? I was looking for such psql replacement, > but couldn't find any Well ECPG lets you embed SQL directly in your "C". >>> # Number of connection pools allowed for a child process >>> max_pool = 1 >> Might need to increase that to 2 or 3. > > Why? The website says: > > max_pool > > The maximum number of cached connections in pgpool-II children > processes. pgpool-II reuses the cached connection if an incoming > connection is connecting to the same database by the same username. > > But all my connections are to the same database and the same username, > and I only ever want my application to do 1 connection to the database > at a time, so why would I want/need 2 or 3 in max_pool? From the subject line of your question: "how to fix problem then when two queries run at the same time..." Of course if you don't actually want to run two simultaneous queries, then max_pool=1 is what you want. >> Not well known enough on the Debian side of the fence? It's simple >> enough to install from source though. Takes about one minute. > > But is there any advantage for me compared to pgpool2, which works > really nice? Can't say. Given your limited RAM, it's probably worth looking at both and seeing which leaves you more memory. Your main concern has got to be to reduce wasted RAM. > In some parts, like doing some count(*) stuff, it now does > things in about one second, which took a few minutes to finish before (if > the other part of the scripts where doing something else on the database > at the same time). That will be because you're only running one query, I'd have thought. Two queries might be sending you into swap. -- Richard Huxton Archonet Ltd
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Miernik
Date:
Theo Kramer <theo@flame.co.za> wrote: > file `which psql` > /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1 > (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, > stripped miernik@tarnica:~$ file `which psql` /usr/bin/psql: symbolic link to `../share/postgresql-common/pg_wrapper' miernik@tarnica:~$ file /usr/share/postgresql-common/pg_wrapper /usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script text executable miernik@tarnica:~$ -- Miernik http://miernik.name/
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Richard Huxton
Date:
Miernik wrote: > Theo Kramer <theo@flame.co.za> wrote: >> file `which psql` >> /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1 >> (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, >> stripped > > miernik@tarnica:~$ file `which psql` > /usr/bin/psql: symbolic link to `../share/postgresql-common/pg_wrapper' > miernik@tarnica:~$ file /usr/share/postgresql-common/pg_wrapper > /usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script text executable That's not psql though, that's Debian's wrapper around it which lets you install multiple versions of PostgreSQL on the same machine. Might be worth bypassing it and calling it directly. -- Richard Huxton Archonet Ltd
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Theo Kramer
Date:
On 31 Jul 2008, at 11:17AM, Miernik wrote: > Theo Kramer <theo@flame.co.za> wrote: >> file `which psql` >> /usr/bin/psql: ELF 32-bit LSB executable, Intel 80386, version 1 >> (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, >> stripped > > miernik@tarnica:~$ file `which psql` > /usr/bin/psql: symbolic link to `../share/postgresql-common/ > pg_wrapper' > miernik@tarnica:~$ file /usr/share/postgresql-common/pg_wrapper > /usr/share/postgresql-common/pg_wrapper: a /usr/bin/perl -w script > text executable > miernik@tarnica:~$ Hmm - looks like you are on a debian or debian derivative. However, pg_wrapper is not psql. It invokes psql which is written in C. Once psql is invoked pg_wrapper drops away. -- Regards Theo
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Craig Ringer
Date:
Miernik wrote: > BTW, doesn't there exist any tool does what "psql -c" does, but is > written in plain C, not perl? I was looking for such psql replacement, > but couldn't find any. As others have noted, psql is written in C, and you're using a wrapper. Assuming your're on Debian or similar you should be able to invoke the real psql with: /usr/lib/postgresql/8.3/bin/psql psql is a C executable that uses libpq directly, and is really rather low-overhead and fast. As for how to issue multiple commands in one statement in a shell script: one way is to use a here document instead of "-c". Eg: psql <<__END__ UPDATE blah SET thingy = 7 WHERE otherthingy = 4; DELETE FROM sometable WHERE criterion = -1; __END__ You can of course wrap statements in explicit transaction BEGIN/COMMIT, etc, as appropriate. As you start doing more complex things, and especially once you start wanting to have both good performance *and* good error handling, you'll want to move away from sql scripts with psql and toward using perl, python, or similar so you can use their native interfaces to libpq. -- Craig Ringer
Re: how to fix problem then when two queries run at the same time, it takes longer to complete then if run in sequence
From
Steve Crawford
Date:
> Wanted to install pgbouncer, but it is broken currently in Debian. And > why is it in contrib and not in main (speaking of Debian location) Pgbouncer has worked very well for us. Wasn't available in default repos for Ubuntu server when I did my original setup but installing from source is quite easy. Running heavy load benchmarks on a web app (high-rate simple-queries) I saw about a 10-fold improvement just by using pgbouncer. Cheers, Steve