Thread: getting count for a specific querry
Per a thread a while back the discussion was along the lines of serving data up to the web quick.
Our app currently pulls a bunch of data to several query pages.
I have tried and not gotten the queries to return as fast as they do now which is a huge disappointment as the hardware is twice as powerful as our current production. I can get it pretty close on some thing but others are just plain slower.
My idea is to use the limit and offset to return just the first 50 records, if they hit next I can set the offset.
My understanding was this gets slower as you move further into the data, but we have several options to modify the search, and I do not believe our clients will page very far intro a dataset.
One problem I think I will have though is they currently have count of the records matching their request and I would like to keep that as a display field
So given a table of associates my default query will be something like
Select * from tblassoc where clientnum = ‘WAZ’ and isdeleted is false
The user could add stuff like and where first name like ‘Joel’
Currently it returns all records with a count and a display of the records your viewing like 1-50 of 470, next page is 51-100 etc.
Is there a fast way to get the count? Will this concept fly? Also I am getting heat that my search is now case sensitive. What is the best way to get a case insensitive search? I could use ~* or perhaps do an UPPER(firstname) in the select etc?
Thanks for any ideas here. I have tried playing with various settings and have not seen my times change much, I will persue this on the performance mailing list.
Joel Fradkin
On Fri, Apr 08, 2005 at 09:29:13AM -0400, Joel Fradkin wrote: > My understanding was this gets slower as you move further into the data, but > we have several options to modify the search, and I do not believe our > clients will page very far intro a dataset. > It gets slower because when you do an offset of 50, you have to pass through the first 50 before picking up the ones you want. If you offset 100, you scan through the first 100. &c. If you don't want to pay that, you need to use a cursor, but this causes the problem that you have to keep your cursor open across page views, which is a tricky issue on the Web. > Currently it returns all records with a count and a display of the records > your viewing like 1-50 of 470, next page is 51-100 etc. > > Is there a fast way to get the count? Not really, no. You have to perform a count() to get it, which is possibly expensive. One way to do it, though, is to do SELECT count(*) FROM tablename WHERE condition LIMIT n; or something like that. Assuming the condition is reasonably limited (i.e. it's not going to cost you a fortune to run this), you'll get the right number back if the number is < n or else you'll get n. If you have n, your application can say "viewing 1-50 of at least n records". This is something you see from time to time in this sort of application. > getting heat that my search is now case sensitive. What is the best way to > get a case insensitive search? I could use ~* or perhaps do an > UPPER(firstname) in the select etc? The upper() (or lower() -- whatever) stragegy is what I'd use. In any case, you want to make sure you put functional indexes on all such columns, because otherwise you'll never get an index scan. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
On Fri, 2005-04-08 at 11:07 -0400, Andrew Sullivan wrote: > On Fri, Apr 08, 2005 at 09:29:13AM -0400, Joel Fradkin wrote: > > > > Is there a fast way to get the count? > > Not really, no. You have to perform a count() to get it, which is > possibly expensive. One way to do it, though, is to do > > SELECT count(*) FROM tablename WHERE condition LIMIT n; > > or something like that. Assuming the condition is reasonably limited > (i.e. it's not going to cost you a fortune to run this), you'll get > the right number back if the number is < n or else you'll get > n. come again ? test=# select count(*) from a;count ------- 3 (1 row) test=# select count(*) from a limit 2;count ------- 3 (1 row) the LIMIT clause limits the number of rows returned by the select, in this case 1 row. maybe you mean something like: test=# select count(*) from (select * from a limit 2) as foo;count ------- 2 (1 row) gnari
On Fri, 2005-04-08 at 09:29 -0400, Joel Fradkin wrote: > Our app currently pulls a bunch of data to several query pages. > > My idea is to use the limit and offset to return just the first 50 > records, if they hit next I can set the offset. > > My understanding was this gets slower as you move further into the > data, but we have several options to modify the search, and I do not > believe our clients will page very far intro a dataset. you might reduce the performance loss if your dataset is ordered by a UNIQUE index. select * from mytable where somecondition ORDER by uniquecol limit 50; and next: select * from mytable where somecondition AND uniquecol>? ORDER by uniquecol limit 50 OFFSET 50; where the ? is placeholder for last value returned by last query. if your unique index is a multi-column one, the method is slightly more complicated, but the same idea. gnari
On Fri, Apr 08, 2005 at 04:17:45PM +0000, Ragnar Hafstað wrote: > On Fri, 2005-04-08 at 11:07 -0400, Andrew Sullivan wrote: > > > > SELECT count(*) FROM tablename WHERE condition LIMIT n; > the LIMIT clause limits the number of rows returned by the select, > in this case 1 row. > > maybe you mean something like: > > test=# select count(*) from (select * from a limit 2) as foo; Yes, that was stupid of me. That's what I meant, though. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes
Ragnar Hafstað <gnari@simnet.is> writes: > you might reduce the performance loss if your dataset is ordered by > a UNIQUE index. > select * from mytable where somecondition > ORDER by uniquecol limit 50; > and next: > select * from mytable where somecondition AND uniquecol>? > ORDER by uniquecol limit 50 OFFSET 50; > where the ? is placeholder for last value returned by last query. Uh, you don't want the OFFSET there do you? But otherwise, yeah, this is a popular solution for paging through some rows. Doesn't really help with the problem of counting the total dataset though ... regards, tom lane
> > select * from mytable where somecondition AND uniquecol>? > > ORDER by uniquecol limit 50 OFFSET 50; > > > where the ? is placeholder for last value returned by last query. > > Uh, you don't want the OFFSET there do you? But otherwise, yeah, > this is a popular solution for paging through some rows. Doesn't really > help with the problem of counting the total dataset though ... In the past I've done an EXPLAIN and parsed the plan to see what PostgreSQL estimated for the number of <somecondition>. If less than $threshhold, I would do a count(*). If more than $threshhold I would display to the user "approx N records". This seemed to be good enough for most cases. --
On Fri, 2005-04-08 at 12:32 -0400, Tom Lane wrote: > Ragnar Hafstað <gnari@simnet.is> writes: > > you might reduce the performance loss if your dataset is ordered by > > a UNIQUE index. > > > select * from mytable where somecondition > > ORDER by uniquecol limit 50; > > > and next: > > > select * from mytable where somecondition AND uniquecol>? > > ORDER by uniquecol limit 50 OFFSET 50; > > > where the ? is placeholder for last value returned by last query. > > Uh, you don't want the OFFSET there do you? ooops! of course not. the uniquecol>? is meant to REPLACE the OFFSET. gnari
Thanks all. I might have to add a button to do the count on command so they don't get the hit. I would want it to return the count of the condition, not the currently displayed number of rows. Is there any other database engines that provide better performance? (We just 2 moths moving to postgres and it is not live yet, but if I am going to get results back slower then my 2 proc box running MSSQL in 2 gig and 2 processor I cant see any reason to move to it) The Postgres is on a 4 proc Dell with 8 gigs of memory. I thought I could analyze our queries and our config to optimize. Joel Fradkin
On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > Thanks all. > I might have to add a button to do the count on command so they don't get > the hit. > I would want it to return the count of the condition, not the currently > displayed number of rows. > > Is there any other database engines that provide better performance? > (We just 2 moths moving to postgres and it is not live yet, but if I am > going to get results back slower then my 2 proc box running MSSQL in 2 gig > and 2 processor I cant see any reason to move to it) > The Postgres is on a 4 proc Dell with 8 gigs of memory. > I thought I could analyze our queries and our config to optimize. Judging postgresql on one single data point (count(*) performance) is quite unfair. Unless your system only operates on static data and is used to mostly do things like counting, in which case, why are you using a database? PostgreSQL is a great fit for certain loads, and a poor fit for others. Are you going to have lots of people updating the database WHILE the select count(*) queries are running? Are you going to be doing other, more interesting things than simply counting? If so, you really should build a test case that emulates what you're really going to be doing with the system. I've found that the poor performance of aggregates in PostgreSQL is generally more than made up for by the outstanding behaviour it exhibits when under heavy parallel load. Note that the basic design of PostgreSQL's MVCC system is such that without using some kind of trigger to maintain pre-calculated aggregate information, it will NEVER be as fast as most other databases at doing aggregates across large chunks of your data.
On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:
> Thanks all.
> I might have to add a button to do the count on command so they don't get
> the hit.
> I would want it to return the count of the condition, not the currently
> displayed number of rows.
>
> Is there any other database engines that provide better performance?
> (We just 2 moths moving to postgres and it is not live yet, but if I am
> going to get results back slower then my 2 proc box running MSSQL in 2 gig
> and 2 processor I cant see any reason to move to it)
> The Postgres is on a 4 proc Dell with 8 gigs of memory.
> I thought I could analyze our queries and our config to optimize.
Judging postgresql on one single data point (count(*) performance) is
quite unfair. Unless your system only operates on static data and is
used to mostly do things like counting, in which case, why are you using
a database?
PostgreSQL is a great fit for certain loads, and a poor fit for others.
Are you going to have lots of people updating the database WHILE the
select count(*) queries are running? Are you going to be doing other,
more interesting things than simply counting? If so, you really should
build a test case that emulates what you're really going to be doing
with the system.
I've found that the poor performance of aggregates in PostgreSQL is
generally more than made up for by the outstanding behaviour it exhibits
when under heavy parallel load.
Note that the basic design of PostgreSQL's MVCC system is such that
without using some kind of trigger to maintain pre-calculated aggregate
information, it will NEVER be as fast as most other databases at doing
aggregates across large chunks of your data.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Believe me I just spent two months converting our app, I do not wish to give up on that work. We do a great deal more then count. Specifically many of our queries run much slower on postgres. As mentioned I purchased a 4 proc box with 8 gigs of memory for this upgrade (Dell may have been a poor choice based on comments I have received). Even when I could see a query like select * from tblassoc where clientnum = 'WAZ' using indexed joins on location and jobtitle it is still taking 22 seconds to run compared to the 9 seconds on MSSQL on a 2 proc 2 gig box. I got one of my querries to run faster using a page cost of .2 but then the assoc query was running 50 seconds, so I adjusted to a cost of 2 (tried 1.2, 2, 3, and 4 and did not see hug changes in the assoc except it did not like .2). I have placed a call to commandprompt.com and am going to pay for some support to see if they have anything meaningful to add. It could be something with my hardware, my hardware config, my postgres config. I am just not sure. I know I have worked diligently to try to learn all I can and I used to think I was kinda smart. I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15k drives. I am using links to those from the install directory. It starts and stops ok this way, but maybe it should be different. I can tell you I am very happy to have this forum as I could not have gotten to the point I am without the many usefull comments from folks on the list. I greatly appreciate everyone who has helped. But truth is if I cant get to work better then I have I may have to ditch the effort and bite the 70K bullet. Its compounded by using 3 developers time for two months to yield an answer that my boss may just fire me for. I figured since my first test showed I could get data faster on the postgres box that I could with enough study get all our data to go faster, but I am afraid I have not been very successful. My failure is not a reflection postgres as you mentioned it is definatley great at some things. I have 90 some views not to mention as many stored procedures that have been converted. I wrote an app to move the data and it works great. But if it too slow I just will not be able to use for production. Joel Judging postgresql on one single data point (count(*) performance) is quite unfair. Unless your system only operates on static data and is used to mostly do things like counting, in which case, why are you using a database? PostgreSQL is a great fit for certain loads, and a poor fit for others. Are you going to have lots of people updating the database WHILE the select count(*) queries are running? Are you going to be doing other, more interesting things than simply counting? If so, you really should build a test case that emulates what you're really going to be doing with the system. I've found that the poor performance of aggregates in PostgreSQL is generally more than made up for by the outstanding behaviour it exhibits when under heavy parallel load. Note that the basic design of PostgreSQL's MVCC system is such that without using some kind of trigger to maintain pre-calculated aggregate information, it will NEVER be as fast as most other databases at doing aggregates across large chunks of your data.
Bob Henkel <luckyratfoot@gmail.com> writes: > From a simple/high level perspective why is this? That is why can't > PostgreSQL do aggregates as well across large chunks of data. I'm > assuming it extremely complicated. Otherwise the folks around here > would have churned out a fix in a month or less and made this issue a > past story. You can find very detailed discussions of this in the archives, but the basic reason is that we have a very general/extensible view of aggregates (which is how come we can support custom aggregates). An aggregate is a function that you feed all the input rows to, one at a time, and then it produces the answer. Nice, general, extensible, and not at all optimizable :-( Now in general that is the only way to do it, and so Scott's implication that we always suck compared to other databases is really an overstatement. Ask another database to do a standard deviation calculation, for instance, and it'll be just as slow. However there are special cases that other DBs can optimize that we don't even try to. The big ones are: * COUNT(*) across a whole table --- most non-MVCC databases keep tabs of the physical number of the rows in the table, and so they can answer this very quickly. Postgres doesn't keep such a count, and under MVCC rules it wouldn't necessarily be the right answer if we had it. (BTW, count of rows satisfying a particular condition is a different ballgame entirely; in most cases that can't be optimized at all, AFAIK.) If you are willing to accept approximate answers there are various tricks you can use --- see the archives --- but we don't get to fudge on COUNT(*) itself because it's in the SQL standard. * MIN or MAX of an indexed column --- most DBs can use an index scan to find such a row relatively quickly, although whether this trick works or not depends a whole lot on whether you have WHERE or GROUP BY and just what those conditions look like. You can fake the min/max answer in Postgres by doing the transformstion to an indexable query by hand, for instance instead of MAX(col) doSELECT col FROM tab ORDER BY col DESC LIMIT 1; There are periodic discussions in the hackers list about teaching the planner to do that automatically, and it will probably happen someday; but it's a complicated task and not exceedingly high on the priority list. regards, tom lane
On Fri, Apr 08, 2005 at 03:23:25PM -0400, Joel Fradkin wrote: > Believe me I just spent two months converting our app, I do not wish to give > our queries run much slower on postgres. As mentioned I purchased a 4 proc I suspect you want the -performance list. And it'd be real handy to get some EXPLAIN ANALYSE results for the offending queries in order to help you (where "handy" is read as "necessary"). A -- Andrew Sullivan | ajs@crankycanuck.ca Information security isn't a technological problem. It's an economics problem. --Bruce Schneier
As always thanks Tom, I will definitely look at what I can do. Since it is a count of matched condition records I may not have a way around. I don't think my clients would like me to aprox as it is a count of their records. What I plan on doing assuming I can get all my other problems fixed (as mentioned I am going to try and get paid help to see if I goofed it up some where) is make the count a button, so they don't wait everytime, but can choose to wait if need be, maybe I can store the last count with a count on day for the generic search it defaults to, and just have them do a count on demand if they have a specific query. Our screens have several criteria fields in each application. Joel Fradkin -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, April 08, 2005 2:28 PM To: Bob Henkel Cc: Scott Marlowe; Joel Fradkin; Andrew Sullivan; pgsql-sql@postgresql.org Subject: Re: [SQL] getting count for a specific querry Bob Henkel <luckyratfoot@gmail.com> writes: > From a simple/high level perspective why is this? That is why can't > PostgreSQL do aggregates as well across large chunks of data. I'm > assuming it extremely complicated. Otherwise the folks around here > would have churned out a fix in a month or less and made this issue a > past story. You can find very detailed discussions of this in the archives, but the basic reason is that we have a very general/extensible view of aggregates (which is how come we can support custom aggregates). An aggregate is a function that you feed all the input rows to, one at a time, and then it produces the answer. Nice, general, extensible, and not at all optimizable :-( Now in general that is the only way to do it, and so Scott's implication that we always suck compared to other databases is really an overstatement. Ask another database to do a standard deviation calculation, for instance, and it'll be just as slow. However there are special cases that other DBs can optimize that we don't even try to. The big ones are: * COUNT(*) across a whole table --- most non-MVCC databases keep tabs of the physical number of the rows in the table, and so they can answer this very quickly. Postgres doesn't keep such a count, and under MVCC rules it wouldn't necessarily be the right answer if we had it. (BTW, count of rows satisfying a particular condition is a different ballgame entirely; in most cases that can't be optimized at all, AFAIK.) If you are willing to accept approximate answers there are various tricks you can use --- see the archives --- but we don't get to fudge on COUNT(*) itself because it's in the SQL standard. * MIN or MAX of an indexed column --- most DBs can use an index scan to find such a row relatively quickly, although whether this trick works or not depends a whole lot on whether you have WHERE or GROUP BY and just what those conditions look like. You can fake the min/max answer in Postgres by doing the transformstion to an indexable query by hand, for instance instead of MAX(col) doSELECT col FROM tab ORDER BY col DESC LIMIT 1; There are periodic discussions in the hackers list about teaching the planner to do that automatically, and it will probably happen someday; but it's a complicated task and not exceedingly high on the priority list. regards, tom lane
Believe me I just spent two months converting our app, I do not wish to give
up on that work. We do a great deal more then count. Specifically many of
our queries run much slower on postgres. As mentioned I purchased a 4 proc
box with 8 gigs of memory for this upgrade (Dell may have been a poor choice
based on comments I have received). Even when I could see a query like
select * from tblassoc where clientnum = 'WAZ' using indexed joins on
location and jobtitle it is still taking 22 seconds to run compared to the 9
seconds on MSSQL on a 2 proc 2 gig box. I got one of my querries to run
faster using a page cost of .2 but then the assoc query was running 50
seconds, so I adjusted to a cost of 2 (tried 1.2, 2, 3, and 4 and did not
see hug changes in the assoc except it did not like .2).
I have placed a call to commandprompt.com and am going to pay for some
support to see if they have anything meaningful to add.
It could be something with my hardware, my hardware config, my postgres
config. I am just not sure. I know I have worked diligently to try to learn
all I can and I used to think I was kinda smart.
I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15k
drives. I am using links to those from the install directory. It starts and
stops ok this way, but maybe it should be different.
I can tell you I am very happy to have this forum as I could not have gotten
to the point I am without the many usefull comments from folks on the list.
I greatly appreciate everyone who has helped. But truth is if I cant get to
work better then I have I may have to ditch the effort and bite the 70K
bullet. Its compounded by using 3 developers time for two months to yield an
answer that my boss may just fire me for. I figured since my first test
showed I could get data faster on the postgres box that I could with enough
study get all our data to go faster, but I am afraid I have not been very
successful.
My failure is not a reflection postgres as you mentioned it is definatley
great at some things. I have 90 some views not to mention as many stored
procedures that have been converted. I wrote an app to move the data and it
works great. But if it too slow I just will not be able to use for
production.
Joel
Judging postgresql on one single data point (count(*) performance) is
quite unfair. Unless your system only operates on static data and is
used to mostly do things like counting, in which case, why are you using
a database?
PostgreSQL is a great fit for certain loads, and a poor fit for others.
Are you going to have lots of people updating the database WHILE the
select count(*) queries are running? Are you going to be doing other,
more interesting things than simply counting? If so, you really should
build a test case that emulates what you're really going to be doing
with the system.
I've found that the poor performance of aggregates in PostgreSQL is
generally more than made up for by the outstanding behaviour it exhibits
when under heavy parallel load.
Note that the basic design of PostgreSQL's MVCC system is such that
without using some kind of trigger to maintain pre-calculated aggregate
information, it will NEVER be as fast as most other databases at doing
aggregates across large chunks of your data.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Have you posted your postgresql config files for the folks here to review? I can't say I can help you with that because I too can only read the docs and go from there. But for specific situations you need specific configs. I would think you can get more out of postgresql with a some time and help from the people around here. Though count(*) looks like it may be slow.
Quoting Scott Marlowe <smarlowe@g2switchworks.com>: > On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > > I might have to add a button to do the count on command so they don't get > > the hit. > > I would want it to return the count of the condition, not the currently > > displayed number of rows. > > Judging postgresql on one single data point (count(*) performance) is > quite unfair. Unless your system only operates on static data and is > used to mostly do things like counting, in which case, why are you using > a database? For the general discussion of slowness of count(*), and given no entry on the subject in http://www.postgresql.org/docs/faqs.FAQ.html ... I guess everyone has to be pointed at:http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php However, the gist of this person's problem is that an adhoc query, NOT just a 'select count(*) from table', can take remarkably long. Again, the problem is that PG can't just scan an index. ---------------------- One workaround for this is to use EXPLAIN. THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES. It's pointless overhead, otherwise. default_statistics_target is cranked up to 200 on all such tables, and pg_autovacuum is running. (If there were anything to improve, it would be refining the thresholds on this). If the "(cost...rows=nnnn" string returns a number higher than the QUERY row limit, the user is derailed ("That's not specific enough to answer immediately; do you want an emailed report?"). Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query itself. If the "(actual...rows=nnnn...)" is higher than the RESULT row limit (PAGE limit). It then runs the query, with the PAGE rows offset and limit --- and happily, practically everything that query needs is now in shared_buffers. The count from the EXPLAIN analyze is displayed in the web page. -- "Dreams come true, not free." -- S.Sondheim, ITW
I have asked specific questions and paid attention to the various threads on configuration.
I will take my config files and post on the performance thread that is a good suggestion (personnaly I have more faith in this forum then a paid consultant, but at this point I am willing to try both).
Thanks again.
The count thing I can get around using stored results and on demand counting, but some of my statistical reporting is just a must have.
I enclosed one of my views, I realize to get help I should also include tables and indexes etc, and maybe I will do that.
It is just there are so many of them. This one in particular did not run at all when I first got my data loaded.
I ended up adding a few indexes and not sure what else and got it to run faster on postgres.
Now it is running horrid, so I am back to the drawing board I change one thing and something else breaks.
I am just frustrated, maybe Monday I will have better strength to figure it all out.
Joel Fradkin
Attachment
I will also look at doing it the way you describe, they do have wide liberty. Thanks so much for the ideas. Sorry I did not do a perusal of the archives first (I normally try that, but think I am brain dead today). Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, IncThis email message is for the use of the intended recipient(s) andmay contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Mischa Sandberg Sent: Friday, April 08, 2005 2:40 PM To: Scott Marlowe Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] getting count for a specific querry Quoting Scott Marlowe <smarlowe@g2switchworks.com>: > On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > > I might have to add a button to do the count on command so they don't get > > the hit. > > I would want it to return the count of the condition, not the currently > > displayed number of rows. > > Judging postgresql on one single data point (count(*) performance) is > quite unfair. Unless your system only operates on static data and is > used to mostly do things like counting, in which case, why are you using > a database? For the general discussion of slowness of count(*), and given no entry on the subject in http://www.postgresql.org/docs/faqs.FAQ.html ... I guess everyone has to be pointed at:http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php However, the gist of this person's problem is that an adhoc query, NOT just a 'select count(*) from table', can take remarkably long. Again, the problem is that PG can't just scan an index. ---------------------- One workaround for this is to use EXPLAIN. THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES. It's pointless overhead, otherwise. default_statistics_target is cranked up to 200 on all such tables, and pg_autovacuum is running. (If there were anything to improve, it would be refining the thresholds on this). If the "(cost...rows=nnnn" string returns a number higher than the QUERY row limit, the user is derailed ("That's not specific enough to answer immediately; do you want an emailed report?"). Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query itself. If the "(actual...rows=nnnn...)" is higher than the RESULT row limit (PAGE limit). It then runs the query, with the PAGE rows offset and limit --- and happily, practically everything that query needs is now in shared_buffers. The count from the EXPLAIN analyze is displayed in the web page. -- "Dreams come true, not free." -- S.Sondheim, ITW ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote: > I set up the data on 4 10k scsi drives in a powervault and my wal on 2 > 15k > drives. I am using links to those from the install directory. It > starts and > stops ok this way, but maybe it should be different. > Your problem might just be the choice of using a Dell RAID controller. I have a 1 year old box connected to a 14 disk powervault (PowerEdge 2650) and it is dog slow compared to a dual opteron with 8 disks that is replacing it. It is all I/O for me, and the dell's just are not known for speedy I/O. Vivek Khera, Ph.D. +1-301-869-4449 x806
On Fri, 2005-04-08 at 15:23, Vivek Khera wrote: > On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote: > > > I set up the data on 4 10k scsi drives in a powervault and my wal on 2 > > 15k > > drives. I am using links to those from the install directory. It > > starts and > > stops ok this way, but maybe it should be different. > > > > Your problem might just be the choice of using a Dell RAID controller. > I have a 1 year old box connected to a 14 disk powervault (PowerEdge > 2650) and it is dog slow compared to a dual opteron with 8 disks that > is replacing it. It is all I/O for me, and the dell's just are not > known for speedy I/O. Note that there are several different RAID controllers you can get with a DELL. I had good luck with the PERC 4C (AMI MegaRAID based) at my last job. In a dual 2400Mz machine with 2 gigs ram, it handily outran a 4 way (about 1200 MHz CPUs) windows / MSSQL box with 4 gigs of ram at most tasks. Especially inserts / updates. The windows machine had the more generic PERC 3I type controller in it.
On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:
> I set up the data on 4 10k scsi drives in a powervault and my wal on 2
> 15k
> drives. I am using links to those from the install directory. It
> starts and
> stops ok this way, but maybe it should be different.
>
Your problem might just be the choice of using a Dell RAID controller.
I have a 1 year old box connected to a 14 disk powervault (PowerEdge
2650) and it is dog slow compared to a dual opteron with 8 disks that
is replacing it. It is all I/O for me, and the dell's just are not
known for speedy I/O.
Vivek Khera, Ph.D.
+1-301-869-4449 x806
On Apr 8, 2005, at 4:31 PM, Scott Marlowe wrote: > Note that there are several different RAID controllers you can get with > a DELL. I had good luck with the PERC 4C (AMI MegaRAID based) at my > I've had bad luck regarding speed with *all* of them, AMI MegaRAID and Adaptec based ones, under high load. Under moderate to low load they're acceptable. Vivek Khera, Ph.D. +1-301-869-4449 x806
On Apr 8, 2005, at 4:35 PM, Bob Henkel wrote: > desktop SATA drive with no RAID? I'm by any means as knowledgeable > about I/O > setup as many of you are but my 2 cents wonders if the Dell RAID is > really > that much slower than a competitively priced/speced alternative? Would > Joel's problems just fade away if he wasn't using a Dell RAID? > "Dell RAID" is not one thing. They sell "altered" RAID cards from Adaptec and LSI. Whatever alteration they do to them tends to make them run not so fast. I have a Dell SATA RAID (adaptec based) on the office server and it is OK, though not something I'd buy again. I have various PERC 3 and PERC 4 RAID controllers on my servers (SCSI based) and they suck under heavy I/O load. I wonder why the name-brand LSI cards work so much faster... perhaps it is the motherboard? I don't know, and I don't care... :-) For me, high performance DB and Dell servers are mutually exclusive. Vivek Khera, Ph.D. +1-301-869-4449 x806
On Fri, 2005-04-08 at 15:35, Bob Henkel wrote: > On Apr 8, 2005 3:23 PM, Vivek Khera <vivek@khera.org> wrote: > On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote: > > > I set up the data on 4 10k scsi drives in a powervault and > my wal on 2 > > 15k > > drives. I am using links to those from the install > directory. It > > starts and > > stops ok this way, but maybe it should be different. > > > > Your problem might just be the choice of using a Dell RAID > controller. > I have a 1 year old box connected to a 14 disk powervault > (PowerEdge > 2650) and it is dog slow compared to a dual opteron with 8 > disks that > is replacing it. It is all I/O for me, and the dell's just > are not > known for speedy I/O. > > Vivek Khera, Ph.D. > +1-301-869-4449 x806 > > > > But that is relative I would think. Is the Dell RAID much faster than > my desktop SATA drive with no RAID? I'm by any means as knowledgeable > about I/O setup as many of you are but my 2 cents wonders if the Dell > RAID is really that much slower than a competitively priced/speced > alternative? Would Joel's problems just fade away if he wasn't using > a Dell RAID? My experience with the 3i controllers (See my earlier post) was that my old Pentium Pro200x2 machine with 512 meg ram and a generic Ultra Wide SCSI card and a half dozen drives running software RAID 5 was faster. Seriously. So was my P-II-350 desktop with the same controller, and an older Dual P-III-750 with only UltraSCSI running in a RAID-1 mirror set. The 3I is REALLY slow (or at least WAS slow) under linux.
On Fri, 2005-04-08 at 15:36, Vivek Khera wrote: > On Apr 8, 2005, at 4:31 PM, Scott Marlowe wrote: > > > Note that there are several different RAID controllers you can get with > > a DELL. I had good luck with the PERC 4C (AMI MegaRAID based) at my > > > > I've had bad luck regarding speed with *all* of them, AMI MegaRAID and > Adaptec based ones, under high load. Under moderate to low load > they're acceptable. Were you using battery backed cache with write-back enabled on the 4C? The 3C is also megaraid based, but it a pretty old design and not very fast.
On Fri, 2005-04-08 at 15:35, Bob Henkel wrote:
> On Apr 8, 2005 3:23 PM, Vivek Khera <vivek@khera.org> wrote:
> On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:
>
> > I set up the data on 4 10k scsi drives in a powervault and
> my wal on 2
> > 15k
> > drives. I am using links to those from the install
> directory. It
> > starts and
> > stops ok this way, but maybe it should be different.
> >
>
> Your problem might just be the choice of using a Dell RAID
> controller.
> I have a 1 year old box connected to a 14 disk powervault
> (PowerEdge
> 2650) and it is dog slow compared to a dual opteron with 8
> disks that
> is replacing it. It is all I/O for me, and the dell's just
> are not
> known for speedy I/O.
>
> Vivek Khera, Ph.D.
> +1-301-869-4449 x806
>
>
>
> But that is relative I would think. Is the Dell RAID much faster than
> my desktop SATA drive with no RAID? I'm by any means as knowledgeable
> about I/O setup as many of you are but my 2 cents wonders if the Dell
> RAID is really that much slower than a competitively priced/speced
> alternative? Would Joel's problems just fade away if he wasn't using
> a Dell RAID?
My experience with the 3i controllers (See my earlier post) was that my
old Pentium Pro200x2 machine with 512 meg ram and a generic Ultra Wide
SCSI card and a half dozen drives running software RAID 5 was faster.
Seriously. So was my P-II-350 desktop with the same controller, and an
older Dual P-III-750 with only UltraSCSI running in a RAID-1 mirror set.
The 3I is REALLY slow (or at least WAS slow) under linux.
Interesting... Maybe Joel after a weekend of rest can try it on a different setup even if that different setup is just a power users development machine to see if he has same or worse timing results. Be wonderful if it magically sped up.
On Fri, 2005-04-08 at 15:41, Vivek Khera wrote: > On Apr 8, 2005, at 4:35 PM, Bob Henkel wrote: > > > desktop SATA drive with no RAID? I'm by any means as knowledgeable > > about I/O > > setup as many of you are but my 2 cents wonders if the Dell RAID is > > really > > that much slower than a competitively priced/speced alternative? Would > > Joel's problems just fade away if he wasn't using a Dell RAID? > > > > "Dell RAID" is not one thing. They sell "altered" RAID cards from > Adaptec and LSI. Whatever alteration they do to them tends to make > them run not so fast. > > I have a Dell SATA RAID (adaptec based) on the office server and it is > OK, though not something I'd buy again. > > I have various PERC 3 and PERC 4 RAID controllers on my servers (SCSI > based) and they suck under heavy I/O load. > > I wonder why the name-brand LSI cards work so much faster... perhaps it > is the motherboard? I don't know, and I don't care... :-) For me, > high performance DB and Dell servers are mutually exclusive. It would be nice to be able to put a stock ami megaraid in one and see. Do you run your 2650s with hyperthreading on? I found that slowed mine down under load, but we never had more than a couple dozen users hitting the db at once, so we may well have had a different load profile than what you're seeing.
I turned off hyperthreading (I saw that on the list that it did not help on Linux). I am using a pretty lightweight windows box Optiplex with IDE 750-meg internal 2.4 mghz cpu. My desktop has 2 gig, so might not be bad idea to try it local (I have installed), but me thinks its not totally a hardware issue for us. Joel Fradkin Do you run your 2650s with hyperthreading on? I found that slowed mine down under load, but we never had more than a couple dozen users hitting the db at once, so we may well have had a different load profile than what you're seeing. ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
> Since it is a count of matched condition records I may not have a way > around. What you could do is cache the search results (just caching the id's of the rows to display is enough and uses little space) in a cache table, numbering them with your sort order using a temporary sequence, so that you can :SELECT ... FROM cache WHERE row_position BETWEEN page_no*per_page AND (page_no+1)*per_page-1to get the count :SELECT row_position FROM CACHE ORDER BY row_position DESC LIMIT 1 Add a session_id referencing your sessions table with an ON DELETE CASCADE and the cache will be auto-purged when sessions expire.
Please run this disk throughput test on your system : http://boutiquenumerique.com/pf/multi_io.py It just spawns N threads which will write a lot of data simultaneously to the disk, then measures the total time. Same for read. Modify the parameters in the source... it's set to generate 10G of files in the current directory and re-read them, all with 8 threads. How much I/O do you get ? Also hdparm -t /dev/hd? would be interesting. On Fri, 08 Apr 2005 21:51:02 +0200, Joel Fradkin <jfradkin@wazagua.com> wrote: > I will also look at doing it the way you describe, they do have wide > liberty. Thanks so much for the ideas. Sorry I did not do a perusal of > the > archives first (I normally try that, but think I am brain dead today). > > Joel Fradkin > Wazagua, Inc. > 2520 Trailmate Dr > Sarasota, Florida 34243 > Tel. 941-753-7111 ext 305 > jfradkin@wazagua.com > www.wazagua.com > Powered by Wazagua > Providing you with the latest Web-based technology & advanced tools. > C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc > This email message is for the use of the intended recipient(s) and may > contain confidential and privileged information. Any unauthorized > review, > use, disclosure or distribution is prohibited. If you are not the > intended > recipient, please contact the sender by reply email and delete and > destroy > all copies of the original message, including attachments. > > > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org] > On Behalf Of Mischa Sandberg > Sent: Friday, April 08, 2005 2:40 PM > To: Scott Marlowe > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] getting count for a specific querry > > Quoting Scott Marlowe <smarlowe@g2switchworks.com>: > >> On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: >> > I might have to add a button to do the count on command so they don't > get >> > the hit. >> > I would want it to return the count of the condition, not the >> currently >> > displayed number of rows. >> >> Judging postgresql on one single data point (count(*) performance) is >> quite unfair. Unless your system only operates on static data and is >> used to mostly do things like counting, in which case, why are you using >> a database? > > For the general discussion of slowness of count(*), > and given no entry on the subject in > http://www.postgresql.org/docs/faqs.FAQ.html > ... I guess everyone has to be pointed at: > http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php > > However, the gist of this person's problem is that an adhoc query, > NOT just a 'select count(*) from table', can take remarkably long. > Again, the problem is that PG can't just scan an index. > ---------------------- > One workaround for this is to use EXPLAIN. > THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES. > It's pointless overhead, otherwise. > > default_statistics_target is cranked up to 200 on all such tables, > and pg_autovacuum is running. (If there were anything to improve, > it would be refining the thresholds on this). > > If the "(cost...rows=nnnn" string returns a number higher than the > QUERY row limit, the user is derailed ("That's not specific enough to > answer > immediately; do you want an emailed report?"). > > Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query > itself. > If the "(actual...rows=nnnn...)" is higher than the RESULT row limit > (PAGE > limit). > > It then runs the query, with the PAGE rows offset and limit --- and > happily, > practically everything that query needs is now in shared_buffers. > The count from the EXPLAIN analyze is displayed in the web page. >
On Apr 8, 2005, at 3:37 PM, Joel Fradkin wrote: > I don't think my clients would like me to aprox as it is a count of > their > records. What I plan on doing assuming I can get all my other problems > fixed > (as mentioned I am going to try and get paid help to see if I goofed > it up > some where) is make the count a button, so they don't wait everytime, > but > can choose to wait if need be, maybe I can store the last count with a > count > on day for the generic search it defaults to, and just have them do a > count > on demand if they have a specific query. Our screens have several > criteria > fields in each application. Here is an interface idea I'm working on for displaying query results in PostgreSQL. Maybe it will work for you if your connection method does not prevent you from using cursors. I create a cursor an then fetch the first 1000 rows. The status display has 4 paging buttons, something like this: |< < rows 1 - 1000 of ? > >| The user can hit the "next" button to get the next 1000. If less than 1000 are fetched the ? is replaced with the actual count. They can press the "last" button to move to the end of the cursor and get the actual count if they need it. So here the initial query should be fast, the user can get the count if they need it, and you don't have to re-query using limit and offset. The problem I'm looking into now (which I just posted on the general list) is I don't see a way to get the table and column information from a cursor. If I fetch from a cursor, the table OID and column number values are 0 in the row description. If I execute the same query directly without a cursor, the row description has the correct values for table OID and column number. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
On Apr 8, 2005, at 3:37 PM, Joel Fradkin wrote:
> I don't think my clients would like me to aprox as it is a count of
> their
> records. What I plan on doing assuming I can get all my other problems
> fixed
> (as mentioned I am going to try and get paid help to see if I goofed
> it up
> some where) is make the count a button, so they don't wait everytime,
> but
> can choose to wait if need be, maybe I can store the last count with a
> count
> on day for the generic search it defaults to, and just have them do a
> count
> on demand if they have a specific query. Our screens have several
> criteria
> fields in each application.
Here is an interface idea I'm working on for displaying query results
in PostgreSQL. Maybe it will work for you if your connection method
does not prevent you from using cursors. I create a cursor an then
fetch the first 1000 rows. The status display has 4 paging buttons,
something like this:
|< < rows 1 - 1000 of ? > >|
The user can hit the "next" button to get the next 1000. If less than
1000 are fetched the ? is replaced with the actual count. They can
press the "last" button to move to the end of the cursor and get the
actual count if they need it. So here the initial query should be fast,
the user can get the count if they need it, and you don't have to
re-query using limit and offset.
The problem I'm looking into now (which I just posted on the general
list) is I don't see a way to get the table and column information from
a cursor. If I fetch from a cursor, the table OID and column number
values are 0 in the row description. If I execute the same query
directly without a cursor, the row description has the correct values
for table OID and column number.
John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL
Oracle Forms uses a similar method as you described and it works just fine. It will say Record 1 of ?(But I think the developer can set the amount of records cached so that if you set it to 10 and queried 5 records it would say record 1 of 5 because it would be under the cache amount.) Forms also offers a button that say get hit count. So if you really need to know the record count you can get it without moving off the current record.
On Apr 9, 2005, at 11:43 AM, Bob Henkel wrote: > Forms also offers a button that say get hit count. So if you really > need to know the record count you can get it without moving off the > current record. That's a good idea too. Maybe in my interface you could click on the ? to get the count without changing the rows you are viewing. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
On Apr 8, 2005, at 4:50 PM, Scott Marlowe wrote: > Do you run your 2650s with hyperthreading on? I found that slowed mine > down under load, but we never had more than a couple dozen users > hitting > the db at once, so we may well have had a different load profile than > what you're seeing. > Yep. Turned off as per various recommendations on this list. The RAID card on this box is a PERC 3/DC. It is a very big disappointment. The Opteron based generic system totally outperforms this Dell box. Vivek Khera, Ph.D. +1-301-869-4449 x806
On Tue, 2005-04-12 at 14:29, Vivek Khera wrote: > On Apr 8, 2005, at 4:50 PM, Scott Marlowe wrote: > > > Do you run your 2650s with hyperthreading on? I found that slowed mine > > down under load, but we never had more than a couple dozen users > > hitting > > the db at once, so we may well have had a different load profile than > > what you're seeing. > > > > Yep. Turned off as per various recommendations on this list. The RAID > card on this box is a PERC 3/DC. It is a very big disappointment. The > Opteron based generic system totally outperforms this Dell box. How much memory is in the box? I've heard horror stories about performance with >2 gigs of ram, which is why I made them order mine with 2 gigs. Does the 3/DC have battery backed cache set to write back?
On Apr 12, 2005, at 4:23 PM, Scott Marlowe wrote: > How much memory is in the box? I've heard horror stories about > performance with >2 gigs of ram, which is why I made them order mine > with 2 gigs. Does the 3/DC have battery backed cache set to write > back? > 4GB RAM and battery backed cache set to write-back mode. FreeBSD 4.11. Vivek Khera, Ph.D. +1-301-869-4449 x806
On Tue, 2005-04-12 at 15:32, Vivek Khera wrote: > On Apr 12, 2005, at 4:23 PM, Scott Marlowe wrote: > > > How much memory is in the box? I've heard horror stories about > > performance with >2 gigs of ram, which is why I made them order mine > > with 2 gigs. Does the 3/DC have battery backed cache set to write > > back? > > > > 4GB RAM and battery backed cache set to write-back mode. FreeBSD 4.11. If you've got the time, try running it with only 2 gigs and compare the speed. I never really got a chance to run mine with >2 gigs, but I know that I read plenty of posts at the time that the chipset in the 2650 was REALLY slow at using memory over 2 gig.