Thread: name search query speed
I have about 5M names stored on my DB. Currently the searches are very quick unless, they are on a very common last name ie. SMITH. The Index is always used, but I still hit 10-20 seconds on a SMITH or Jones search, and I average about 6 searches a second and max out at about 30/s. Any suggestions on how I could arrange things to make this search quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I can increase this speed w/o a HW upgrade. thanx, -jj- -- You probably wouldn't worry about what people think of you if you could know how seldom they do. -- Olin Miller.
I'm not sure what the answer is but maybe I can help? Would clustering the name index make this faster? I thought that would bunch up the pages so the names were more or less in order, which would improve search time. Just a guess though. Ken ----- Original Message ----- From: "Jeremiah Jahn" <jeremiah@cs.earlham.edu> To: "postgres performance" <pgsql-performance@postgresql.org> Sent: Thursday, March 03, 2005 11:38 AM Subject: [PERFORM] name search query speed >I have about 5M names stored on my DB. Currently the searches are very > quick unless, they are on a very common last name ie. SMITH. The Index > is always used, but I still hit 10-20 seconds on a SMITH or Jones > search, and I average about 6 searches a second and max out at about > 30/s. Any suggestions on how I could arrange things to make this search > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > can increase this speed w/o a HW upgrade. > > thanx, > -jj- > > > > -- > You probably wouldn't worry about what people think of you if you could > know how seldom they do. > -- Olin Miller. > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
yes, it does. I forgot to mention, that I also have clustering on that table by my name_field index. My Bad. On Thu, 2005-03-03 at 12:00 -0500, Ken Egervari wrote: > I'm not sure what the answer is but maybe I can help? Would clustering the > name index make this faster? I thought that would bunch up the pages so the > names were more or less in order, which would improve search time. Just a > guess though. > > Ken > > ----- Original Message ----- > From: "Jeremiah Jahn" <jeremiah@cs.earlham.edu> > To: "postgres performance" <pgsql-performance@postgresql.org> > Sent: Thursday, March 03, 2005 11:38 AM > Subject: [PERFORM] name search query speed > > > >I have about 5M names stored on my DB. Currently the searches are very > > quick unless, they are on a very common last name ie. SMITH. The Index > > is always used, but I still hit 10-20 seconds on a SMITH or Jones > > search, and I average about 6 searches a second and max out at about > > 30/s. Any suggestions on how I could arrange things to make this search > > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > > can increase this speed w/o a HW upgrade. > > > > thanx, > > -jj- > > > > > > > > -- > > You probably wouldn't worry about what people think of you if you could > > know how seldom they do. > > -- Olin Miller. > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- You probably wouldn't worry about what people think of you if you could know how seldom they do. -- Olin Miller.
Jeremiah, > I have about 5M names stored on my DB. Currently the searches are very > quick unless, they are on a very common last name ie. SMITH. The Index > is always used, but I still hit 10-20 seconds on a SMITH or Jones > search, and I average about 6 searches a second and max out at about > 30/s. Any suggestions on how I could arrange things to make this search > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > can increase this speed w/o a HW upgrade. First off, see http://www.powerpostgresql.com/PerfList about your configuration settings. The problem you're running into with SMITH is that, if your query is going to return a substantial number of rows (variable, but generally anything over 5% of the table and 1000 rows) is not able to make effective use of an index. This makes it fall back on a sequential scan, and based on you execution time, I'd guess that the table is a bit too large to fit in memory. AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on your database, if you're still having problems post an EXPLAIN ANALYZE of the query to this list. -- Josh Berkus Aglio Database Solutions San Francisco
Jeremiah Jahn wrote: >I have about 5M names stored on my DB. Currently the searches are very >quick unless, they are on a very common last name ie. SMITH. The Index >is always used, but I still hit 10-20 seconds on a SMITH or Jones >search, and I average about 6 searches a second and max out at about >30/s. Any suggestions on how I could arrange things to make this search >quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I >can increase this speed w/o a HW upgrade. > >thanx, >-jj- > > > > > It sounds like the problem is just that you have a lot of rows that need to be returned. Can you just put a limit on the query? And then change the client app to recognize when the limit is reached, and either give a link to more results, or refine query, or something like that. John =:->
Attachment
Hi, Jeremiah, Jeremiah Jahn schrieb: > yes, it does. I forgot to mention, that I also have clustering on that > table by my name_field index. My Bad. Fine. Did you run ANALYZE and CLUSTER on the table after every large bunch of insertions / updates? Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/
On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote: > Jeremiah Jahn wrote: > > >I have about 5M names stored on my DB. Currently the searches are very > >quick unless, they are on a very common last name ie. SMITH. The Index > >is always used, but I still hit 10-20 seconds on a SMITH or Jones > >search, and I average about 6 searches a second and max out at about > >30/s. Any suggestions on how I could arrange things to make this search > >quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > >can increase this speed w/o a HW upgrade. > > > >thanx, > >-jj- > > > > > > > > > > > It sounds like the problem is just that you have a lot of rows that need > to be returned. Can you just put a limit on the query? And then change > the client app to recognize when the limit is reached, and either give a > link to more results, or refine query, or something like that. Not really, about 2% of the returned rows are thrown away for security reasons based on the current user, security groups they belong to and different flags in the data itself. So the count for this is generated on the fly needed for pagination in the app which expresses the total number of finds, but only displays 40 of them. If any one knows a way to determine the total number of matches without needing to iterate through them using jdbc, I'm all ears as this would save me huge amounts of time and limit/offset would become an option. > > John > =:-> > -- "A power so great, it can only be used for Good or Evil!" -- Firesign Theatre, "The Giant Rat of Summatra"
On Thu, 2005-03-03 at 09:44 -0800, Josh Berkus wrote: > Jeremiah, > > > I have about 5M names stored on my DB. Currently the searches are very > > quick unless, they are on a very common last name ie. SMITH. The Index > > is always used, but I still hit 10-20 seconds on a SMITH or Jones > > search, and I average about 6 searches a second and max out at about > > 30/s. Any suggestions on how I could arrange things to make this search > > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > > can increase this speed w/o a HW upgrade. > > First off, see http://www.powerpostgresql.com/PerfList about your > configuration settings. > > The problem you're running into with SMITH is that, if your query is going to > return a substantial number of rows (variable, but generally anything over 5% > of the table and 1000 rows) is not able to make effective use of an index. > This makes it fall back on a sequential scan, and based on you execution > time, I'd guess that the table is a bit too large to fit in memory. > > AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on > your database, if you're still having problems post an EXPLAIN ANALYZE of the > query to this list. > ie. throw more hardware at it. All of the other things on the list, except for effective_cache_size have always been done. I bumped it up from the default to 2600000. Will see if that makes a difference. thanx, -jj- -- "A power so great, it can only be used for Good or Evil!" -- Firesign Theatre, "The Giant Rat of Summatra"
> -----Original Message----- > From: Jeremiah Jahn [mailto:jeremiah@cs.earlham.edu] > Sent: Thursday, March 03, 2005 2:15 PM > To: John A Meinel > Cc: postgres performance > Subject: Re: [PERFORM] name search query speed > > [...] > So the count for this is generated on the fly needed for > pagination in the app which expresses the total number of > finds, but only displays 40 of them. If any one knows a way > to determine the total number of matches without needing to > iterate through them using jdbc, I'm all ears as this would > save me huge amounts of time and limit/offset would become > an option. Is there a reason you can't do a count(field) query first? If so, you can get the number of records returned by setting absolute(-1) and getting the row number. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
doesn't that cause two queries? I used to do it that way and cut my time substantially by counting in-line. Even though the results were cached it still took more time. Also since the tables is constantly be updated the returned total would not always match the number of results on the second query. On Thu, 2005-03-03 at 14:26 -0600, Dave Held wrote: > > -----Original Message----- > > From: Jeremiah Jahn [mailto:jeremiah@cs.earlham.edu] > > Sent: Thursday, March 03, 2005 2:15 PM > > To: John A Meinel > > Cc: postgres performance > > Subject: Re: [PERFORM] name search query speed > > > > [...] > > So the count for this is generated on the fly needed for > > pagination in the app which expresses the total number of > > finds, but only displays 40 of them. If any one knows a way > > to determine the total number of matches without needing to > > iterate through them using jdbc, I'm all ears as this would > > save me huge amounts of time and limit/offset would become > > an option. > > Is there a reason you can't do a count(field) query first? If > so, you can get the number of records returned by setting > absolute(-1) and getting the row number. > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 -- "A power so great, it can only be used for Good or Evil!" -- Firesign Theatre, "The Giant Rat of Summatra"
Jeremiah Jahn wrote: >On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote: > > ... >Not really, about 2% of the returned rows are thrown away for security >reasons based on the current user, security groups they belong to and >different flags in the data itself. So the count for this is generated >on the fly needed for pagination in the app which expresses the total >number of finds, but only displays 40 of them. If any one knows a way to >determine the total number of matches without needing to iterate through >them using jdbc, I'm all ears as this would save me huge amounts of time >and limit/offset would become an option. > > > Well, what is wrong with "select count(*) from <the query I would have done>"? Are you saying 2% are thrown away, or only 2% are kept? Is this being done at the client side? Is there a way to incorporate the security info into the database, so that the query actually only returns the rows you care about? That seems like it would be a decent way to speed it up, if you can restrict the number of rows that it needs to look at. There are other alternatives, such as materialized views, or temp tables, where you select into the temp table the rows that the user would request, and then you generate limit/offset from that. The first query would be a little slow, since it would get all the rows, but all subsequent accesses for that user could be really fast. The other possibility is to do "limit 200", and then in your list of pages, you could have: 1, 2, 3, 4, 5, ... This means that you don't have to worry about getting 10,000 entries, which probably isn't really useful for the user anyway, and you can still break things into 40 entry pages, just 200 entries at a time. John =:-> >>John >>=:-> >> >> >>
Attachment
Hi, Jeremiah, Jeremiah Jahn schrieb: > doesn't that cause two queries? I used to do it that way and cut my time > substantially by counting in-line. Even though the results were cached > it still took more time. This sounds rather strange. > Also since the tables is constantly be updated > the returned total would not always match the number of results on the > second query. Did you run both queries in the same transaction, with transaction isolation level set to serializable? If yes, you found a serious bug in PostgreSQL transaction engine. Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/
Jeremiah Jahn wrote: > I have about 5M names stored on my DB. Currently the searches are very > quick unless, they are on a very common last name ie. SMITH. The Index > is always used, but I still hit 10-20 seconds on a SMITH or Jones > search, and I average about 6 searches a second and max out at about > 30/s. Any suggestions on how I could arrange things to make this search > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > can increase this speed w/o a HW upgrade. If it's just "SMITH", the only fix is to throw more hardware at the problem. I've got my own database of medical providers & facilities in the millions and anytime somebody tries to search for MEDICAL FACILITY, it takes forever. I've tried every optimization possible but when you have 500K records with the word "MEDICAL" in it, what can you do? You've got to check all 500K records to see if it matches your criteria. For multi-word searches, what I've found does work is to periodically generate stats on work frequencies and use those stats to search the least common words first. For example, if somebody enters "ALTABATES MEDICAL HOSPITAL", I can get the ~50 providers with ALTABATES in the name and then do a 2nd and 3rd pass to filter against MEDICAL and HOSPITAL.
Jeremiah Jahn wrote: > I have about 5M names stored on my DB. Currently the searches are very > quick unless, they are on a very common last name ie. SMITH. The Index > is always used, but I still hit 10-20 seconds on a SMITH or Jones > search, and I average about 6 searches a second and max out at about > 30/s. Any suggestions on how I could arrange things to make this search > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I > can increase this speed w/o a HW upgrade. > > thanx, > -jj- > is there a chance you could benefit from indices spanning over multiple columns? maybe the user that searches for SMITH knows more then the last name, ie first name, location (zip code, name of city, etc.)?