Re: Query results caching? - Mailing list pgsql-general

From Ben-Nes Yonatan
Subject Re: Query results caching?
Date
Msg-id 430A515B.6080307@canaan.co.il
Whole thread Raw
In response to Re: Query results caching?  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Query results caching?  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-general
> On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
>
>>I think that I was misunderstood, Ill make an example:
>>Lets say that im making the following query for the first time on the
>>"motorcycles" table which got an index on the "manufacturer" field:
>>
>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
>>manufacturer='suzuki';
>>... Total runtime: 3139.587 ms
>>
>>Now im doing the same query again and i get a much faster result (cause
>>of the "caching"): Total runtime: 332.53 ms
>>
>>After both of those queries I drop the index and query the table again
>>with the exact same query as before and now I receive: Total runtime:
>>216834.871 ms
>>
>>And for my last check I run the exact same query again (without creating
>>the INDEX back again) and I get quite similar result to my third query:
>>Total runtime: 209218.01 ms
>>
>>
>>My problem is that (maybe I just dont understand something basic
>>here...) the last 2 (also the second query but I dont care about that)
>>queries were using the "cache" that was created after the first query
>>(which had an INDEX) so none of them actually showed me what will happen
>>if a client will do such a search (without an INDEX) for the first time.
>>
>>I want to delete that "caching" after I do the first 2 queries so my
>>next queries will show me "real life results".
>
>

Ok I tried to handle both of your replies cause I got them at 2 seperate
emails.

Dann Corbit wrote:
 > These results are all what I would expect.  When you delete the index,
 > the query will be forced to do a table scan (to examine every single
 > record in the table one by one).  If the table is non-trivial it is
 > unlikely that either the OS or the database will cache the whole thing
 > in memory.  However, when you query a small record set, then it is
 > likely to be retained in RAM which is literally thousands of times
 > faster than disk.

Didnt know that, good to know though doesnt assure me...
What if I drop the INDEX but create a diffrent INDEX which also make the
process alot faster then without an INDEX but slower/faster then the one
before, will it wont use the former "caching"?

 > If a search is to be made on a frequent basis, you should create an
 > index.
 > The query results above show you why.

Obvious :)

 > Think about this for a minute.  The real life results you want are
 > very fast results.  For that reason, you should try to model the
 > customer queries as nearly as possible.  If you have a canned
 > application like order entry, then the real parameterized query set
 > will probably be quite small in real life.  If you are creating a
 > server for ad-hoc queries then it will be far more difficult to model
 > in real life.
 >
 > What is the real purpose of the application that you are writing?
 >
 > Will users be using a pre-programmed front end, or will they be typing
 > in queries free-form for whatever their heart desires?

Ok ill try to describe the system as short & precise as possible (its
also passed midnight here :)).
Each day I receive about 4 million rows of data (products) which I
insert into table1 (after I delete all of the previous data it had),
along it I receive for every row about another 15 keywords which I
insert into table2 (where as in table1 I delete all of the previous data
it had also), this process is a fact that I cant change.
Now the users of the site can search for data from table1 by typing
whichever (and up to 4) words as they want at a text field (search input
string) and the server should display the correct results by querying
table1 & join table2 for its keywords.
I succeded to do it quite fast but when I tried to ORDER BY my results
its times jumped up drastically (2-3 seconds for a query... and thats
after the caching..).
I can't allow a situation where a user will search with a keyword which
wasnt 'cached' before and because of that he will wait 15 seconds for a
result.

Jim C. Nasby wrote:
> Emptying the cache will not show real-life results. You are always going
> to have some stuff cached, even if you get a query for something new. In
> this case (since you'll obviously want those indexes there), after some
> amount of time you will have most (if not all) of the non-leaf index
> pages cached, since they take a fairly small amount of memory and are
> frequently accessed. This makes index traversal *much* faster than your
> initial case shows, even if you query on something different each time.
> Testing with a completely empty cache just isn't that realistic.

As far as I understand it at my situation where all of the data is
deleted and inserted each day from the start (INDEX will get lost with
it..) & the endless variety of possible keywords search's & the immense
size of the tables, the following reason wont last.. or am I wrong here?



Because of all of that I want to be able to see how much time a query
takes when its the first time its being run..... or I'm wrong again and
failing to understand something?

Again everyone THANKS ALOT its really amazing the help that I receive
from you!
Ben-Nes Yonatan

pgsql-general by date:

Previous
From: Ben-Nes Yonatan
Date:
Subject: Re: Query results caching?
Next
From: "Dann Corbit"
Date:
Subject: Re: Query results caching?