Thread: name search query speed

name search query speed

From
Jeremiah Jahn
Date:
I have about 5M names stored on my DB. Currently the searches are very
quick unless, they are on a very common last name ie. SMITH. The Index
is always used, but I still hit 10-20 seconds on a SMITH or Jones
search, and I average about 6 searches a second and max out at about
30/s. Any suggestions on how I could arrange things to make this search
quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
can increase this speed w/o a HW upgrade.

thanx,
-jj-



--
You probably wouldn't worry about what people think of you if you could
know how seldom they do.
                -- Olin Miller.


Re: name search query speed

From
"Ken Egervari"
Date:
I'm not sure what the answer is but maybe I can help?  Would clustering the
name index make this faster? I thought that would bunch up the pages so the
names were more or less in order, which would improve search time.  Just a
guess though.

Ken

----- Original Message -----
From: "Jeremiah Jahn" <jeremiah@cs.earlham.edu>
To: "postgres performance" <pgsql-performance@postgresql.org>
Sent: Thursday, March 03, 2005 11:38 AM
Subject: [PERFORM] name search query speed


>I have about 5M names stored on my DB. Currently the searches are very
> quick unless, they are on a very common last name ie. SMITH. The Index
> is always used, but I still hit 10-20 seconds on a SMITH or Jones
> search, and I average about 6 searches a second and max out at about
> 30/s. Any suggestions on how I could arrange things to make this search
> quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> can increase this speed w/o a HW upgrade.
>
> thanx,
> -jj-
>
>
>
> --
> You probably wouldn't worry about what people think of you if you could
> know how seldom they do.
>                -- Olin Miller.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


Re: name search query speed

From
Jeremiah Jahn
Date:
yes, it does. I forgot to mention, that I also have clustering on that
table by my name_field index. My Bad.

On Thu, 2005-03-03 at 12:00 -0500, Ken Egervari wrote:
> I'm not sure what the answer is but maybe I can help?  Would clustering the
> name index make this faster? I thought that would bunch up the pages so the
> names were more or less in order, which would improve search time.  Just a
> guess though.
>
> Ken
>
> ----- Original Message -----
> From: "Jeremiah Jahn" <jeremiah@cs.earlham.edu>
> To: "postgres performance" <pgsql-performance@postgresql.org>
> Sent: Thursday, March 03, 2005 11:38 AM
> Subject: [PERFORM] name search query speed
>
>
> >I have about 5M names stored on my DB. Currently the searches are very
> > quick unless, they are on a very common last name ie. SMITH. The Index
> > is always used, but I still hit 10-20 seconds on a SMITH or Jones
> > search, and I average about 6 searches a second and max out at about
> > 30/s. Any suggestions on how I could arrange things to make this search
> > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> > can increase this speed w/o a HW upgrade.
> >
> > thanx,
> > -jj-
> >
> >
> >
> > --
> > You probably wouldn't worry about what people think of you if you could
> > know how seldom they do.
> >                -- Olin Miller.
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> >               http://archives.postgresql.org
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
You probably wouldn't worry about what people think of you if you could
know how seldom they do.
                -- Olin Miller.


Re: name search query speed

From
Josh Berkus
Date:
Jeremiah,

> I have about 5M names stored on my DB. Currently the searches are very
> quick unless, they are on a very common last name ie. SMITH. The Index
> is always used, but I still hit 10-20 seconds on a SMITH or Jones
> search, and I average about 6 searches a second and max out at about
> 30/s. Any suggestions on how I could arrange things to make this search
> quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> can increase this speed w/o a HW upgrade.

First off, see http://www.powerpostgresql.com/PerfList about your
configuration settings.

The problem you're running into with SMITH is that, if your query is going to
return a substantial number of rows (variable, but generally anything over 5%
of the table and 1000 rows) is not able to make effective use of an index.
This makes it fall back on a sequential scan, and based on you execution
time, I'd guess that the table is a bit too large to fit in memory.

AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on
your database, if you're still having problems post an EXPLAIN ANALYZE of the
query to this list.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: name search query speed

From
John A Meinel
Date:
Jeremiah Jahn wrote:

>I have about 5M names stored on my DB. Currently the searches are very
>quick unless, they are on a very common last name ie. SMITH. The Index
>is always used, but I still hit 10-20 seconds on a SMITH or Jones
>search, and I average about 6 searches a second and max out at about
>30/s. Any suggestions on how I could arrange things to make this search
>quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
>can increase this speed w/o a HW upgrade.
>
>thanx,
>-jj-
>
>
>
>
>
It sounds like the problem is just that you have a lot of rows that need
to be returned. Can you just put a limit on the query? And then change
the client app to recognize when the limit is reached, and either give a
link to more results, or refine query, or something like that.

John
=:->


Attachment

Re: name search query speed

From
Markus Schaber
Date:
Hi, Jeremiah,

Jeremiah Jahn schrieb:
> yes, it does. I forgot to mention, that I also have clustering on that
> table by my name_field index. My Bad.

Fine. Did you run ANALYZE and CLUSTER on the table after every large
bunch of insertions / updates?

Markus


--
Markus Schaber | Dipl. Informatiker | Software Development GIS

Fight against software patents in EU! http://ffii.org/
                                      http://nosoftwarepatents.org/

Re: name search query speed

From
Jeremiah Jahn
Date:
On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote:
> Jeremiah Jahn wrote:
>
> >I have about 5M names stored on my DB. Currently the searches are very
> >quick unless, they are on a very common last name ie. SMITH. The Index
> >is always used, but I still hit 10-20 seconds on a SMITH or Jones
> >search, and I average about 6 searches a second and max out at about
> >30/s. Any suggestions on how I could arrange things to make this search
> >quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> >can increase this speed w/o a HW upgrade.
> >
> >thanx,
> >-jj-
> >
> >
> >
> >
> >
> It sounds like the problem is just that you have a lot of rows that need
> to be returned. Can you just put a limit on the query? And then change
> the client app to recognize when the limit is reached, and either give a
> link to more results, or refine query, or something like that.
Not really, about 2% of the returned rows are thrown away for security
reasons based on the current user, security groups they belong to and
different flags in the data itself. So the count for this is generated
on the fly needed for pagination in the app which expresses the total
number of finds, but only displays 40 of them. If any one knows a way to
determine the total number of matches without needing to iterate through
them using jdbc, I'm all ears as this would save me huge amounts of time
and limit/offset would become an option.

>
> John
> =:->
>
--
"A power so great, it can only be used for Good or Evil!"
                -- Firesign Theatre, "The Giant Rat of Summatra"


Re: name search query speed

From
Jeremiah Jahn
Date:
On Thu, 2005-03-03 at 09:44 -0800, Josh Berkus wrote:
> Jeremiah,
>
> > I have about 5M names stored on my DB. Currently the searches are very
> > quick unless, they are on a very common last name ie. SMITH. The Index
> > is always used, but I still hit 10-20 seconds on a SMITH or Jones
> > search, and I average about 6 searches a second and max out at about
> > 30/s. Any suggestions on how I could arrange things to make this search
> > quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> > can increase this speed w/o a HW upgrade.
>
> First off, see http://www.powerpostgresql.com/PerfList about your
> configuration settings.
>
> The problem you're running into with SMITH is that, if your query is going to
> return a substantial number of rows (variable, but generally anything over 5%
> of the table and 1000 rows) is not able to make effective use of an index.
> This makes it fall back on a sequential scan, and based on you execution
> time, I'd guess that the table is a bit too large to fit in memory.
>
> AFTER you've made the configuration changes above, AND run VACUUM ANALYZE on
> your database, if you're still having problems post an EXPLAIN ANALYZE of the
> query to this list.
>

ie. throw more hardware at it. All of the other things on the list,
except for effective_cache_size have always been done. I bumped it up
from the default to 2600000. Will see if that makes a difference.

thanx,
-jj-


--
"A power so great, it can only be used for Good or Evil!"
                -- Firesign Theatre, "The Giant Rat of Summatra"


Re: name search query speed

From
"Dave Held"
Date:
> -----Original Message-----
> From: Jeremiah Jahn [mailto:jeremiah@cs.earlham.edu]
> Sent: Thursday, March 03, 2005 2:15 PM
> To: John A Meinel
> Cc: postgres performance
> Subject: Re: [PERFORM] name search query speed
>
> [...]
> So the count for this is generated on the fly needed for
> pagination in the app which expresses the total number of
> finds, but only displays 40 of them. If any one knows a way
> to determine the total number of matches without needing to
> iterate through them using jdbc, I'm all ears as this would
> save me huge amounts of time and limit/offset would become
> an option.

Is there a reason you can't do a count(field) query first?  If
so, you can get the number of records returned by setting
absolute(-1) and getting the row number.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

Re: name search query speed

From
Jeremiah Jahn
Date:
doesn't that cause two queries? I used to do it that way and cut my time
substantially by counting in-line. Even though the results were cached
it still took more time. Also since the tables is constantly be updated
the returned total would not always match the number of results on the
second query.

On Thu, 2005-03-03 at 14:26 -0600, Dave Held wrote:
> > -----Original Message-----
> > From: Jeremiah Jahn [mailto:jeremiah@cs.earlham.edu]
> > Sent: Thursday, March 03, 2005 2:15 PM
> > To: John A Meinel
> > Cc: postgres performance
> > Subject: Re: [PERFORM] name search query speed
> >
> > [...]
> > So the count for this is generated on the fly needed for
> > pagination in the app which expresses the total number of
> > finds, but only displays 40 of them. If any one knows a way
> > to determine the total number of matches without needing to
> > iterate through them using jdbc, I'm all ears as this would
> > save me huge amounts of time and limit/offset would become
> > an option.
>
> Is there a reason you can't do a count(field) query first?  If
> so, you can get the number of records returned by setting
> absolute(-1) and getting the row number.
>
> __
> David B. Held
> Software Engineer/Array Services Group
> 200 14th Ave. East,  Sartell, MN 56377
> 320.534.3637 320.253.7800 800.752.8129
--
"A power so great, it can only be used for Good or Evil!"
                -- Firesign Theatre, "The Giant Rat of Summatra"


Re: name search query speed

From
John Arbash Meinel
Date:
Jeremiah Jahn wrote:

>On Thu, 2005-03-03 at 11:46 -0600, John A Meinel wrote:
>
>
...

>Not really, about 2% of the returned rows are thrown away for security
>reasons based on the current user, security groups they belong to and
>different flags in the data itself. So the count for this is generated
>on the fly needed for pagination in the app which expresses the total
>number of finds, but only displays 40 of them. If any one knows a way to
>determine the total number of matches without needing to iterate through
>them using jdbc, I'm all ears as this would save me huge amounts of time
>and limit/offset would become an option.
>
>
>
Well, what is wrong with "select count(*) from <the query I would have
done>"?
Are you saying 2% are thrown away, or only 2% are kept?
Is this being done at the client side? Is there a way to incorporate the
security info into the database, so that the query actually only returns
the rows you care about? That seems like it would be a decent way to
speed it up, if you can restrict the number of rows that it needs to
look at.

There are other alternatives, such as materialized views, or temp
tables, where you select into the temp table the rows that the user
would request, and then you generate limit/offset from that. The first
query would be a little slow, since it would get all the rows, but all
subsequent accesses for that user could be really fast.

The other possibility is to do "limit 200", and then in your list of
pages, you could have:
1, 2, 3, 4, 5, ...
This means that you don't have to worry about getting 10,000 entries,
which probably isn't really useful for the user anyway, and you can
still break things into 40 entry pages, just 200 entries at a time.
John
=:->

>>John
>>=:->
>>
>>
>>


Attachment

Re: name search query speed

From
Markus Schaber
Date:
Hi, Jeremiah,

Jeremiah Jahn schrieb:
> doesn't that cause two queries? I used to do it that way and cut my time
> substantially by counting in-line. Even though the results were cached
> it still took more time.

This sounds rather strange.

> Also since the tables is constantly be updated
> the returned total would not always match the number of results on the
> second query.

Did you run both queries in the same transaction, with transaction
isolation level set to serializable? If yes, you found a serious bug in
PostgreSQL transaction engine.

Markus

--
Markus Schaber | Dipl. Informatiker | Software Development GIS

Fight against software patents in EU! http://ffii.org/
                                      http://nosoftwarepatents.org/

Re: name search query speed

From
William Yu
Date:
Jeremiah Jahn wrote:
> I have about 5M names stored on my DB. Currently the searches are very
> quick unless, they are on a very common last name ie. SMITH. The Index
> is always used, but I still hit 10-20 seconds on a SMITH or Jones
> search, and I average about 6 searches a second and max out at about
> 30/s. Any suggestions on how I could arrange things to make this search
> quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> can increase this speed w/o a HW upgrade.

If it's just "SMITH", the only fix is to throw more hardware at the
problem. I've got my own database of medical providers & facilities in
the millions and anytime somebody tries to search for MEDICAL FACILITY,
it takes forever. I've tried every optimization possible but when you
have 500K records with the word "MEDICAL" in it, what can you do? You've
got to check all 500K records to see if it matches your criteria.

For multi-word searches, what I've found does work is to periodically
generate stats on work frequencies and use those stats to search the
least common words first. For example, if somebody enters "ALTABATES
MEDICAL HOSPITAL", I can get the ~50 providers with ALTABATES in the
name and then do a 2nd and 3rd pass to filter against MEDICAL and HOSPITAL.

Re: name search query speed

From
stig erikson
Date:
Jeremiah Jahn wrote:
> I have about 5M names stored on my DB. Currently the searches are very
> quick unless, they are on a very common last name ie. SMITH. The Index
> is always used, but I still hit 10-20 seconds on a SMITH or Jones
> search, and I average about 6 searches a second and max out at about
> 30/s. Any suggestions on how I could arrange things to make this search
> quicker? I have 4gb of mem on a raid 5 w/ 3 drives. I'm hoping that I
> can increase this speed w/o a HW upgrade.
>
> thanx,
> -jj-
>

is there a chance you could benefit from indices spanning over multiple columns?
maybe the user that searches for SMITH knows more then the last name, ie first
name, location (zip code, name of city, etc.)?