Home > mailing lists
Re: Query results caching? - Mailing list pgsql-general

From	Ben-Nes Yonatan
Subject	Re: Query results caching?
Date	August 25, 2005 13:13:53
Msg-id	430D8DA5.2040704@canaan.co.il Whole thread Raw
In response to	Re: Query results caching? ("Dann Corbit" <DCorbit@connx.com>)
List	pgsql-general
Tree view
Dann Corbit wrote:

>>-----Original Message-----
>>From: Ben-Nes Yonatan [mailto:da@canaan.co.il]
>>Sent: Monday, August 22, 2005 3:28 PM
>>To: Jim C. Nasby; Sean Davis; Dann Corbit
>>Cc: pgsql-general@postgresql.org
>>Subject: Re: [GENERAL] Query results caching?
>>
>>
>>
>>>On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote:
>>>
>>>
>>>
>>>>I think that I was misunderstood, Ill make an example:
>>>>Lets say that im making the following query for the first time on
>>>>
>>>>
>the
>
>
>>>>"motorcycles" table which got an index on the "manufacturer" field:
>>>>
>>>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE
>>>>manufacturer='suzuki';
>>>>... Total runtime: 3139.587 ms
>>>>
>>>>Now im doing the same query again and i get a much faster result
>>>>
>>>>
>(cause
>
>
>>>>of the "caching"): Total runtime: 332.53 ms
>>>>
>>>>After both of those queries I drop the index and query the table
>>>>
>>>>
>again
>
>
>>>>with the exact same query as before and now I receive: Total
>>>>
>>>>
>runtime:
>
>
>>>>216834.871 ms
>>>>
>>>>And for my last check I run the exact same query again (without
>>>>
>>>>
>creating
>
>
>>>>the INDEX back again) and I get quite similar result to my third
>>>>
>>>>
>query:
>
>
>>>>Total runtime: 209218.01 ms
>>>>
>>>>
>>>>My problem is that (maybe I just dont understand something basic
>>>>here...) the last 2 (also the second query but I dont care about
>>>>
>>>>
>that)
>
>
>>>>queries were using the "cache" that was created after the first
>>>>
>>>>
>query
>
>
>>>>(which had an INDEX) so none of them actually showed me what will
>>>>
>>>>
>happen
>
>
>>>>if a client will do such a search (without an INDEX) for the first
>>>>
>>>>
>time.
>
>
>>>>I want to delete that "caching" after I do the first 2 queries so my
>>>>next queries will show me "real life results".
>>>>
>>>>
>>>
>>>
>>Ok I tried to handle both of your replies cause I got them at 2
>>
>>
>seperate
>
>
>>emails.
>>
>>Dann Corbit wrote:
>> > These results are all what I would expect.  When you delete the
>>
>>
>index,
>
>
>> > the query will be forced to do a table scan (to examine every
>>
>>
>single
>
>
>> > record in the table one by one).  If the table is non-trivial it is
>> > unlikely that either the OS or the database will cache the whole
>>
>>
>thing
>
>
>> > in memory.  However, when you query a small record set, then it is
>> > likely to be retained in RAM which is literally thousands of times
>> > faster than disk.
>>
>>Didnt know that, good to know though doesnt assure me...
>>What if I drop the INDEX but create a diffrent INDEX which also make
>>
>>
>the
>
>
>>process alot faster then without an INDEX but slower/faster then the
>>
>>
>one
>
>
>>before, will it wont use the former "caching"?
>>
>>
>
>You can add several indexes to a single table.
>If you do some statistics on the query patterns, you can find what
>indexes are needed to make the queries as fast as possible.
>
>
>
>> > If a search is to be made on a frequent basis, you should create an
>> > index.
>> > The query results above show you why.
>>
>>Obvious :)
>>
>> > Think about this for a minute.  The real life results you want are
>> > very fast results.  For that reason, you should try to model the
>> > customer queries as nearly as possible.  If you have a canned
>> > application like order entry, then the real parameterized query set
>> > will probably be quite small in real life.  If you are creating a
>> > server for ad-hoc queries then it will be far more difficult to
>>
>>
>model
>
>
>> > in real life.
>> >
>> > What is the real purpose of the application that you are writing?
>> >
>> > Will users be using a pre-programmed front end, or will they be
>>
>>
>typing
>
>
>> > in queries free-form for whatever their heart desires?
>>
>>Ok ill try to describe the system as short & precise as possible (its
>>also passed midnight here :)).
>>Each day I receive about 4 million rows of data (products) which I
>>insert into table1 (after I delete all of the previous data it had),
>>along it I receive for every row about another 15 keywords which I
>>insert into table2 (where as in table1 I delete all of the previous
>>
>>
>data
>
>
>>it had also), this process is a fact that I cant change.
>>
>>
>
>If the data arrives on a daily basis, and is not updated until the next
>day, I suggest creating a lot of indexes, and cluster on the index used
>most frequently.
>
>What exactly are the 15 keywords in the second table for?
>Are they column names?
>Are they categories for the first table?
>Why is the second table necessary at all?
>
>
Now clustering was unknown to me when I received this email from you...
THANKS!!! I created 4 replicas of my table ordered by the diffrent order
that I want to allow my users to use, yep its quite alot of GB but I
dont care about it as long its working fast, and damn its flying! less
then 100 ms and thats on a weak server which will be replaced soon!.

>
>
>>Now the users of the site can search for data from table1 by typing
>>whichever (and up to 4) words as they want at a text field (search
>>
>>
>input
>
>
>>string) and the server should display the correct results by querying
>>table1 & join table2 for its keywords.
>>
>>
>
>Can you give the exact table definitions for the two tables, and also
>the most likely queries you are going to receive?
>
>When the users type in keywords -- can these keywords be applied against
>any column in the table or only against a single column or against a
>small set of columns or something else?
>
>
Well yea its working on a single column and that column got indexed so
its flying.

>
>
>>I succeded to do it quite fast but when I tried to ORDER BY my results
>>its times jumped up drastically (2-3 seconds for a query... and thats
>>after the caching..).
>>
>>
>
>Order by will complicate quite a bit.  I have not tried it on
>PostgreSQL, but if you know the result set is small, a technique is to
>select into a temp table and then order by on the temp table.  It works
>well on other database systems (caveat: it has been a while since I
>worked as a DBA and I have not worked as a DBA on PostgreSQL).
>
>
That cant work cause it wont order all of the results by the desired
column but only the returned results so when a user will want to see
more results he can see results which were supposed to be displayed
before his previous display.

>
>
>>I can't allow a situation where a user will search with a keyword
>>
>>
>which
>
>
>>wasnt 'cached' before and because of that he will wait 15 seconds for
>>
>>
>a
>
>
>>result.
>>
>>
>
>You might try throwing hardware at it.  A 4 CPU AMD 64 machine with
>Ultra 320 striped SCSI disk array and a few gigabytes of ram will
>perform admirably.
>
>You might want more than is possible.  If you have 4 million rows, and
>each row is 1K, then that is 4 GB.  If your users do a query that has
>not been performed yet and you have to do a table scan, then you cannot
>expect some kind of sub-second response times because it won't be
>physically possible on any system.
>
>If you know what most queries may look like or if you only have a few
>character columns so that you can make an index on each of them and if
>you can put a unique clustered index on the most important (frequently
>used) item, then you can get the majority of your queries to run very
>quickly, and only on rare occasions will the query be slow.
>
>If I have a 4 million row table, with long rows and big varchar columns
>and I run a query on a column like this:
>
>    SELECT * FROM inventory WHERE product LIKE '%Table%'
>
>It isn't going to be fast on any system with any database.
>[snip]
>
>
Well a better hardware will soon be working but anyway your idea about
clustering solved my problem.

Thanks alot again! :)
Ben-Nes Yonatan
Canaan Surfing ltd.
http://www.canaan.net.il
pgsql-general by date:
From: Ben-Nes Yonatan
Date: 25 August 2005, 13:12:48
Subject: Re: Query results caching?
From: Clodoaldo Pinto
Date: 25 August 2005, 13:17:06
Subject: Problem upgrading from 8.0.1 to 8.0.3
Re: Query results caching? - Mailing list pgsql-general

Previous

Next