Re: eWeek Poll: Which database is most critical to your - Mailing list pgsql-hackers
From | Dann Corbit |
---|---|
Subject | Re: eWeek Poll: Which database is most critical to your |
Date | |
Msg-id | D90A5A6C612A39408103E6ECDD77B82906F3F3@voyager.corporate.connx.com Whole thread Raw |
Responses |
Re: eWeek Poll: Which database is most critical to your
|
List | pgsql-hackers |
-----Original Message----- From: Neil Conway [mailto:nconway@klamath.dyndns.org] Sent: Tuesday, February 26, 2002 3:42 PM To: Dann Corbit Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] eWeek Poll: Which database is most critical to your On Tue, 2002-02-26 at 18:20, Dann Corbit wrote: > I don't see how it will do any good. There is no "prepare" in > Postgresql > and therefore you will simply be reexecuting the queries every time any > way. Also, parameter markers only work in embedded SQL and that is a > single tasking system. Perhaps I wasn't clear. The feature I'm proposing is this: When processing SELECT queries but before any real work has been done, lookup the query in a hash table. If it already exists, return the cached result. If it doesn't exist, execute the query and cache the result in the hash table. Optionally, we could not immediately cache the query, just increment a "frequency" counter stored in the hash table. If the counter goes above a certain constant, we decide that the query is worth caching, so we cache the full result in the hash table. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Statistical tools are a good idea, because they can tell us where indexes should be added. However, you cannot simply return the result of the previous query, because the contents may have changed since the last time it was executed. It is simply invalid to do that. If some other system is doing that, then it isn't a relational database. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< When processing INSERTs, UPDATEs and DELETEs, check if the query would affect any of the tables for which we are maintaing this cache. If so, flush the cache. This ensures that we will never return invalid results. We could perhaps be fancy and keep stats on which columns our cached queries utilize and which columns the modifying query will affect, but that is unlikely to be an overall win. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you know whether or not someone has affected the row that you are reading? If you do not know, then every single update, insert or delete will mean that you have to refresh. And not only that, you will also have to track it. For sure, it will make the whole system run more slowly rather than faster. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HOWEVER -- I don't see this feature as something that will appeal to, say, 75% of PgSQL users. If the table in question is being modified on a regular basis, or if a wide variety of queries are being issued, this cache isn't a good idea. Nevertheless, I think there are certainly some situations in which this cache is useful -- and furthermore, these kinds of "mostly read-only" situations are often where MySQL is chosen over PostgreSQL. Anyway, just putting this on the table -- if the consensus is that this isn't a very worthwhile feature, I won't bother with it. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very likely, it is only my limited understanding not really grasping what it is that you are trying to do. Even so, I don't think it really helps even for read only queries, unless it is exactly the same query with the same parameter markers and everything that was issued before. That is very unusual. Normally, you won't have the entire query hard- wired, but with allow the customer to do some sort of filtering of the data. I do (very much) like your idea to gather statistics. Again, this must have the complete query plan parsed so that you know what the parameter markers are. Otherwise, there will be completely different queries: SELECT * FROM employee WHERE ssn = '777-99-1234' SELECT * FROM employee WHERE ssn = '999-77-1234' SELECT * FROM employee WHERE ssn = '797-97-1234' SELECT * FROM employee WHERE ssn = '979-79-1234' SELECT * FROM employee WHERE ssn = '779-99-1234' when really only one query exits: SELECT * FROM employee WHERE ssn = ? If you do not parameterize the queries your statistics will be misleading at best and outright lies at worst. For example, suppose that the above query appears 300,000 times, but with different SSN's. And then someone runs this twice: SELECT * FROM reports WHERE type = 'q' The query that ran twice will be the most frequent, despite the fact that the really frequent query happens 150,000 times as often. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
pgsql-hackers by date: