Re: Query results caching? - Mailing list pgsql-general

From Dann Corbit
Subject Re: Query results caching?
Date
Msg-id D425483C2C5C9F49B5B7A41F8944154757CFF6@postal.corporate.connx.com
Whole thread Raw
In response to Query results caching?  (Ben-Nes Yonatan <da@canaan.co.il>)
List pgsql-general
> -----Original Message-----
> From: Ben-Nes Yonatan [mailto:da@canaan.co.il]
> Sent: Monday, August 22, 2005 1:14 PM
> To: Sean Davis; Dann Corbit
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query results caching?
>
> Sean Davis wrote:
> > On 8/22/05 1:59 PM, "Dann Corbit" <DCorbit@connx.com> wrote:
> >
> >
> >>
> >>>-----Original Message-----
> >>>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> >>>owner@postgresql.org] On Behalf Of Ben-Nes Yonatan
> >>>Sent: Monday, August 22, 2005 9:03 AM
> >>>To: pgsql-general@postgresql.org
> >>>Subject: [GENERAL] Query results caching?
> >>>
> >>>Hi all,
> >>>
> >>>I dont know how its called but I noticed that when I query the db
for
> >>>the first time it give me the result slower then the next times ill
> >>>repeat the same exact query, I figure that its some kind of caching
so
> >>>henceforth the title of the mail :)
> >>
> >>The operating system and the database will both percolate frequently
> >>used information from disk into memory.  Particularly if they are
SELECT
> >>queries, they will get faster and faster.
> >>
> >>
> >>>Anyway I would want to be able to delete that "caching" after every
> >>>query test that I run, cause I want to see the real time results
for
> >>
> >>my
> >>
> >>>queries (its for a searching option for users so it will vary
alot).
> >>
> >>Those are the real times for your queries.
> >>
> >>
> >>>Is it possible to do it manually each time or maybe only from the
> >>>configuration?
> >>
> >>You will have to query a different table each time.
> >
> >
> > Just to extend this notion a bit, if you want to test your
application
> > speed, you may want to generate "real-world" input to determine the
> actual
> > behavior/speed under real conditions.  As Dann pointed out, the
results
> for
> > timings are "real" in that if the user generated the queries as you
did,
> the
> > timing results would be (nearly) the same as for you.  It seems that
> your
> > concern is that the user will not generate the same type of input
that
> you
> > did (that it will vary more), so the best solution may be to
actually
> > generate some test queries that actually conform to what you think
the
> user
> > input will look like.
> >
> > Sean
> >
> I think that I was misunderstood, Ill make an example:
> Lets say that im making the following query for the first time on the
> "motorcycles" table which got an index on the "manufacturer" field:
>
> EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
> manufacturer='suzuki';
> ... Total runtime: 3139.587 ms
>
> Now im doing the same query again and i get a much faster result
(cause
> of the "caching"): Total runtime: 332.53 ms
>
> After both of those queries I drop the index and query the table again
> with the exact same query as before and now I receive: Total runtime:
> 216834.871 ms
>
> And for my last check I run the exact same query again (without
creating
> the INDEX back again) and I get quite similar result to my third
query:
> Total runtime: 209218.01 ms

These results are all what I would expect.  When you delete the index,
the query will be forced to do a table scan (to examine every single
record in the table one by one).  If the table is non-trivial it is
unlikely that either the OS or the database will cache the whole thing
in memory.  However, when you query a small record set, then it is
likely to be retained in RAM which is literally thousands of times
faster than disk.

> My problem is that (maybe I just dont understand something basic
> here...) the last 2 (also the second query but I dont care about that)
> queries were using the "cache" that was created after the first query
> (which had an INDEX) so none of them actually showed me what will
happen
> if a client will do such a search (without an INDEX) for the first
time.

If a search is to be made on a frequent basis, you should create an
index.
The query results above show you why.

> I want to delete that "caching" after I do the first 2 queries so my
> next queries will show me "real life results".

Think about this for a minute.  The real life results you want are very
fast results.  For that reason, you should try to model the customer
queries as nearly as possible.  If you have a canned application like
order entry, then the real parameterized query set will probably be
quite small in real life.  If you are creating a server for ad-hoc
queries then it will be far more difficult to model in real life.

What is the real purpose of the application that you are writing?

Will users be using a pre-programmed front end, or will they be typing
in queries free-form for whatever their heart desires?


> Thanks alot again,
> Yonatan

pgsql-general by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Query results caching?
Next
From: "A. Kretschmer"
Date:
Subject: Re: Query results caching?