Re: eWeek Poll: Which database is most critical to - Mailing list pgsql-hackers

From F Harvell
Subject Re: eWeek Poll: Which database is most critical to
Date
Msg-id 200202271451.g1REp4d04916@odin.fts.net
Whole thread Raw
In response to Re: eWeek Poll: Which database is most critical to your  ("Dann Corbit" <DCorbit@connx.com>)
Responses Re: eWeek Poll: Which database is most critical to
List pgsql-hackers
On Tue, 26 Feb 2002 15:20:17 PST, "Dann Corbit" wrote:
> -----Original Message-----
> From: Neil Conway [mailto:nconway@klamath.dyndns.org]
> Sent: Tuesday, February 26, 2002 3:04 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] eWeek Poll: Which database is most critical to
> your
> 
> 
> On Tue, 2002-02-26 at 15:30, Zak Greant wrote:
> > Good Day All,
> > 
> > eWeek has posted a poll that asks which database server is most
> critical
> > to your organization.
> 
> The article mentions a MySQL feature which apparently improved
> performance considerably:
> 
> //
> MySQL 4.0.1's new, extremely fast query cache is also quite notable, as
> no other database we tested had this feature. If the text of an incoming
> query has a byte-for-byte match with a cached query, MySQL can retrieve
> the results directly from the cache without compiling the query, getting
> locks or doing index accesses. This query caching will be effective only
> for tables with few updates because any table updates that clear the
> cache to guarantee correct results are always returned.
> //
> 
> My guess is that it would be relatively simple to implement. Any
> comments on this?
> 
> If I implemented this, any chance this would make it into the tree? Of
> course, it would be:
> 
>     - disabled by default
>     - enabled on a table-by-table basis (maybe an ALTER TABLE command)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>
> I don't see how it will do any good.  There is no "prepare" in
> Postgresql
> and therefore you will simply be reexecuting the queries every time any
> way.  Also, parameter markers only work in embedded SQL and that is a 
> single tasking system.
> 
> I think it would be a major piece of work to do anything useful along
> those lines.
> 
> If you look at how DB/2 works, you will see that they store prepared
> statements.  Another alternative would be to keep some point in the
> parser marked and somehow jump to that point, but you would have to
> be able to save a parse tree somewhere and also recognize the query.
> 
> Here is where problems come in...
> -- Someone wants blue and blue-green, etc shirts that are backordered
> SELECT shirt, color, backorder_qty FROM garments WHERE color like
> "BLUE%"
> 
> Now, another query comes along:
> 
> -- Someone else wants reddish, etc shirts that are backordered:
> SELECT shirt, color, backorder_qty FROM garments WHERE color like "RED%"
> 
> It's the same query with different data.  Without parameter markers you
> will never know it.  And yet this is exactly the sort of caching that is
> useful.
> 
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> <<
 While the cache that MySQL implemented may be of limited value for
dynamic queries, it would be very useful for many, many queries.  For
example, many applications (especially the ones that I write), there
are literally hundreds of selects against static "lookup" tables that
maintain acceptable values for new data inputs (i.e., validated using
referential integrity).  Each and every one of those selects would
receive an improvement.
 Also, it is relatively easy to "parameterize" the select. After all,
it is a structured query.  Just by parsing it, you can identify the
"parameters".  As a matter of fact, I'm pretty certain that this is
already done by the optimizer when breaking the SQL down into the code
to perform the query.  The trick is to cache the "parameterized"
version of the query.  This is likely at or near the end of the
optimizer processing.  Also, you would want to capture the query plan
in the cache.  The query plan is not going to be interested at all in
the literal value of the parameters and therefore will be the same for
any query of the same form.
 For example, from above:

SELECT shirt, color, backorder_qty FROM garments WHERE color like 
'BLUE%'
 should become something on the order of:

SELECT shirt, color, backorder_qty FROM garments WHERE color like 
'{param0}%'
 The next query:

SELECT shirt, color, backorder_qty FROM garments WHERE color like 
'RED%'
 should also be "parameterized" on the order of:

SELECT shirt, color, backorder_qty FROM garments WHERE color like 
'{param0}%'
 A lookup into the query hash would match and therefore _at least_
the same query plan can be used.
 The commercial database "Cache" by Intersystems uses an approach
similar to this.  The performance of that database is phenomenal.  (Of
course, there is more going on in the internals of that system than
just the query cache.)





pgsql-hackers by date:

Previous
From: Jean-Paul ARGUDO
Date:
Subject: Oracle vs PostgreSQL in real life
Next
From: Tom Lane
Date:
Subject: Re: COPY incorrectly uses null instead of an empty string in last field