Thread: postgresql is slow with larger table even it is in RAM
Dear Friends,
I have a table with 32 lakh record in it. Table size is nearly 700 MB, and my machine had a 1 GB + 256 MB RAM, i had created the table space in RAM, and then created this table in this RAM.
So now everything is in RAM, if i do a count(*) on this table it returns 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is happening, swap is also not used )
Any Idea on this ???
I searched a lot in newsgroups ... can't find relevant things.... ( because everywhere they are speaking about disk access speed, here i don't want to worry about disk access )
If required i will give more information on this.
I have a table with 32 lakh record in it. Table size is nearly 700 MB, and my machine had a 1 GB + 256 MB RAM, i had created the table space in RAM, and then created this table in this RAM.
So now everything is in RAM, if i do a count(*) on this table it returns 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is happening, swap is also not used )
Any Idea on this ???
I searched a lot in newsgroups ... can't find relevant things.... ( because everywhere they are speaking about disk access speed, here i don't want to worry about disk access )
If required i will give more information on this.
On Tue, Mar 25, 2008 at 2:09 PM, jose javier parra sanchez <jojapasa@gmail.com> wrote:
It's been said zillions of times on the maillist. Using a select
count(*) in postgres is slow, and probably will be slow for a long
time. So that function is not a good way to measure perfomance.
Yes, but if the data is in HDD then we can say this...
but now the data is in RAM
but now the data is in RAM
On Tue, Mar 25, 2008 at 02:05:20PM +0530, sathiya psql wrote: > Any Idea on this ??? yes. dont use count(*). if you want whole-table row count, use triggers to store the count. it will be slow. regeardless of whether it's in ram or on hdd. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
hubert depesz lubaczewski wrote: > On Tue, Mar 25, 2008 at 02:05:20PM +0530, sathiya psql wrote: >> Any Idea on this ??? > > yes. dont use count(*). > > if you want whole-table row count, use triggers to store the count. > > it will be slow. regeardless of whether it's in ram or on hdd. In other words, if you're having performance problems please provide EXPLAIN ANALYZE output from a more useful query that does real work, rather than something like count(*). COUNT(*) can be slow due to some MVCC limitations; it's been discussed frequently here so you should search the archives for information. -- Craig Ringer
sathiya psql escribió: > So now everything is in RAM, if i do a count(*) on this table it returns > 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that > no Disk I/O is happening. It has to scan every page and examine visibility for every record. Even if there's no I/O involved, there's a lot of work to do. I am not sure if with your hardware it is expected for it to take 3 seconds though. Do you see high CPU usage during that period? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In response to "sathiya psql" <sathiya.psql@gmail.com>: > Dear Friends, > I have a table with 32 lakh record in it. Table size is nearly 700 MB, > and my machine had a 1 GB + 256 MB RAM, i had created the table space in > RAM, and then created this table in this RAM. > > So now everything is in RAM, if i do a count(*) on this table it returns > 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that > no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is > happening, swap is also not used ) > > Any Idea on this ??? Yes. It takes your hardware about 3 seconds to read through 700M of ram. Keep in mind that you're not just reading RAM. You're pushing system requests through the VFS layer of your operating system, which is treating the RAM like a disk (with cylinder groups and inodes and blocks, etc) so you have all that processing overhead as well. What filesystem did you format the RAM disk with? Why are you doing this? If you have enough RAM to store the table, why not just allocate it to shared buffers? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Yes. It takes your hardware about 3 seconds to read through 700M of ram.
Keep in mind that you're not just reading RAM. You're pushing system
requests through the VFS layer of your operating system, which is treating
the RAM like a disk (with cylinder groups and inodes and blocks, etc) so
you have all that processing overhead as well. What filesystem did you
format the RAM disk with?
tmpfs
Why are you doing this? If you have enough RAM to store the table, why
not just allocate it to shared buffers?
just allocating will make read from hdd to RAM at first time, to eliminate that
are you saying it will take 3 seconds surely if i have 50 lakh record
In response to "sathiya psql" <sathiya.psql@gmail.com>: > > > > Yes. It takes your hardware about 3 seconds to read through 700M of ram. > > > > > > Keep in mind that you're not just reading RAM. You're pushing system > > requests through the VFS layer of your operating system, which is treating > > the RAM like a disk (with cylinder groups and inodes and blocks, etc) so > > you have all that processing overhead as well. What filesystem did you > > format the RAM disk with? > > tmpfs I'm not an expert, but according to wikipedia: "tmpfs (previously known as shmfs) distinguishes itself from the Linux ramdisk device by allocating memory dynamically andby allowing less-used pages to be moved onto swap space." Both dynamically allocating and swapping are potential problems, but I don't know how to tell you to determine if they're issues or not. > > Why are you doing this? If you have enough RAM to store the table, why > > not just allocate it to shared buffers? > > just allocating will make read from hdd to RAM at first time, to eliminate > that PostgreSQL is still going to copy the data from your RAM disk into shared buffers before working with it, so you still have that overhead. All you're escaping is the time involved in physical disk activity, which is what shared_buffers are designed to avoid. > are you saying it will take 3 seconds surely if i have 50 lakh record No. That is dependent on your hardware and other factors. You are trying to use the system in a non-standard configuration. If it doesn't work that way, don't be surprised. Also, what are you expectations? Honestly, I don't see any problems with the results you're getting, they're about what I would expect. Are you trying to compare PostgreSQL to MySQL/MyISAM? More directly, what is your purpose in starting this email conversation? What are you hoping to accomplish? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Hello Sathiya,
1st: you should not use a ramdisk for this, it will slow things down as compared to simply having the table on disk. Scanning it the first time when on disk will load it into the OS IO cache, after which you will get memory speed.
2nd: you should expect the “SELECT COUNT(*)” to run at a maximum of about 350 – 600 MB/s (depending on PG version and CPU speed). It is CPU speed limited to that rate of counting rows no matter how fast your IO is.
So, for your 700 MB table, you should expect a COUNT(*) to run in about 1-2 seconds best case. This will approximate the speed at which other queries can run against the table.
- Luke
On 3/25/08 1:35 AM, "sathiya psql" <sathiya.psql@gmail.com> wrote:
1st: you should not use a ramdisk for this, it will slow things down as compared to simply having the table on disk. Scanning it the first time when on disk will load it into the OS IO cache, after which you will get memory speed.
2nd: you should expect the “SELECT COUNT(*)” to run at a maximum of about 350 – 600 MB/s (depending on PG version and CPU speed). It is CPU speed limited to that rate of counting rows no matter how fast your IO is.
So, for your 700 MB table, you should expect a COUNT(*) to run in about 1-2 seconds best case. This will approximate the speed at which other queries can run against the table.
- Luke
On 3/25/08 1:35 AM, "sathiya psql" <sathiya.psql@gmail.com> wrote:
Dear Friends,
I have a table with 32 lakh record in it. Table size is nearly 700 MB, and my machine had a 1 GB + 256 MB RAM, i had created the table space in RAM, and then created this table in this RAM.
So now everything is in RAM, if i do a count(*) on this table it returns 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is happening, swap is also not used )
Any Idea on this ???
I searched a lot in newsgroups ... can't find relevant things.... ( because everywhere they are speaking about disk access speed, here i don't want to worry about disk access )
If required i will give more information on this.
1st: you should not use a ramdisk for this, it will slow things down as compared to simply having the table on disk. Scanning it the first time when on disk will load it into the OS IO cache, after which you will get memory speed.
absolutely....
after getting some replies, i dropped the table from ramdisk,
and started to have that in the disk itself..
after getting some replies, i dropped the table from ramdisk,
and started to have that in the disk itself..
2nd: you should expect the "SELECT COUNT(*)" to run at a maximum of about 350 – 600 MB/s (depending on PG version and CPU speed). It is CPU speed limited to that rate of counting rows no matter how fast your IO is.
am using 8.1
pentium duo core
pentium duo core
So, for your 700 MB table, you should expect a COUNT(*) to run in about 1-2 seconds best case. This will approximate the speed at which other queries can run against the table.
ok count(*) per say, but other queries is taking much time...
ok i ll do more experimentations and i ll be back....
Very great thanks for all of your replies GUYZ.....
ok i ll do more experimentations and i ll be back....
Very great thanks for all of your replies GUYZ.....
- LukeDear Friends,
I have a table with 32 lakh record in it. Table size is nearly 700 MB, and my machine had a 1 GB + 256 MB RAM, i had created the table space in RAM, and then created this table in this RAM.
So now everything is in RAM, if i do a count(*) on this table it returns 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is happening, swap is also not used )
Any Idea on this ???
I searched a lot in newsgroups ... can't find relevant things.... ( because everywhere they are speaking about disk access speed, here i don't want to worry about disk access )
If required i will give more information on this.
sathiya.psql@gmail.com ("sathiya psql") writes: > On Tue, Mar 25, 2008 at 2:09 PM, jose javier parra sanchez <jojapasa@gmail.com> wrote: > > > It's been said zillions of times on the maillist. Using a select > count(*) in postgres is slow, and probably will be slow for a long > time. So that function is not a good way to measure perfomance. > > > > Yes, but if the data is in HDD then we can say this... > but now the data is in RAM Even if the data all is in RAM, it will still take some definitely non-zero time to examine all of the pages, looking for tuples, and then to determine which of those tuples are visible from the perspective of your DB connection. If 500MB of relevant data is sitting on disk, then it will take whatever time it takes to pull it from disk; if it is in memory, there is still work to be done... -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://linuxdatabases.info/info/finances.html Rules of the Evil Overlord #76. "If the hero runs up to my roof, I will not run up after him and struggle with him in an attempt to push him over the edge. I will also not engage him at the edge of a cliff. (In the middle of a rope-bridge over a river of molten lava is not even worth considering.)" <http://www.eviloverlord.com/>
On Tue, Mar 25, 2008 at 3:35 AM, sathiya psql <sathiya.psql@gmail.com> wrote: > Dear Friends, > I have a table with 32 lakh record in it. Table size is nearly 700 MB, > and my machine had a 1 GB + 256 MB RAM, i had created the table space in > RAM, and then created this table in this RAM. > > So now everything is in RAM, if i do a count(*) on this table it returns > 327600 in 3 seconds, why it is taking 3 seconds ????? because am sure that > no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is > happening, swap is also not used ) > > Any Idea on this ??? > > I searched a lot in newsgroups ... can't find relevant things.... ( because > everywhere they are speaking about disk access speed, here i don't want to > worry about disk access ) > > If required i will give more information on this. Two things: - Are you VACUUM'ing regularly? It could be that you have a lot of dead rows and the table is spread out over a lot of pages of mostly dead space. That would cause *very* slow seq scans. - What is your shared_buffers set to? If it's really low then postgres could be constantly swapping from ram-disk to memory. Not much would be cached, and performance would suffer. FWIW, I did a select count(*) on a table with just over 300000 rows, and it only took 0.28 sec. Peter
So your table is about 80 MB in size, or perhaps 120 MB if it fits in shared_buffers. You can check it using “SELECT pg_size_pretty(pg_relation_size(‘mytable’))”
- Luke
On 3/26/08 4:48 PM, "Peter Koczan" <pjkoczan@gmail.com> wrote:
- Luke
On 3/26/08 4:48 PM, "Peter Koczan" <pjkoczan@gmail.com> wrote:
FWIW, I did a select count(*) on a table with just over 300000 rows,
and it only took 0.28 sec.