Thread: Wishlist for 7.4: Plan stability

Wishlist for 7.4: Plan stability

From
Greg Stark
Date:
Ok, someone else posted their data warehousing wishlist, I want to post my
single item OLP wishlist: Plan stability. This is the natural conclusion of
having prepared queries.

This is one area where it would be possible to totally beat Oracle's
implementation by a huge margin. And it's something that's critical to web
sites and other applications that handle many short queries and need to
reliably provide rapid response.

What I want to do would actually go way beyond what Oracle provides. I have a
clear Idea what I would want to make the database truly manageable under heavy
OLP load.

I would want a queries to be a first class object in the postgres world. There
could be a system table that lists all the prepared queries the database knows
about. Every query's current plan and performance statistics could be linked
from there. This would allow a sysadmin to at least have a clue what queries
are running and how without having to open up the application source. 

Then I would want an acl system to restrict under what circumstances postgres
allows new queries to be added, new plans to be analyzed, and old queries to
be removed.

While in a development server or a DSS server you want any user to be able to
create new queries and you want the database to optimize every query using the
most up to date information, in an OLP server that's not what you want. It's
more important to be consistently fast than it is to be as fast as possible.

Ideally I would expect every query to be manually loaded when a new version of
the application is loaded. A DBA could check at that point every plan and
ensure that they're all reasonable. Then the user that the web server connects
as would be prohibited from running any new queries or generating any new
plans for existing queries. 

The DBA could go to sleep at night confident that the database isn't suddenly
going to hit some formerly unreached section of code or decide to "optimize" a
query differently and suddenly come to a crashing halt.

It would also mean that the query written by the new junior programmer you
just hired can't slip by QA and thrash the query cache of your database by
inserting constants directly into the query.

And finally, it would also mean that the query written by your other junior
programmer that inserts unchecked data can't become a gaping security hazard
because when the hacker submits subqueries in the form the resulting query is
rejected.

-- 
greg



Re: Wishlist for 7.4: Plan stability

From
bpalmer
Date:
Is someone keeping a "hopeful" todo list?

- Brandon


----------------------------------------------------------------------------c: 917-697-8665
              h: 201-798-4983b. palmer,  bpalmer@crimelabs.net           pgp:crimelabs.net/bpalmer.pgp5
 



Re: Wishlist for 7.4: Plan stability

From
Rod Taylor
Date:
On Sat, 2002-11-30 at 21:22, bpalmer wrote:
> Is someone keeping a "hopeful" todo list?

Nearly every one of the items brought up could / should be on the
standard todo list.

http://developer.postgresql.org/todo.php

--
Rod Taylor <rbt@rbt.ca>

Re: Wishlist for 7.4: Plan stability

From
Bruce Momjian
Date:
Rod Taylor wrote:
-- Start of PGP signed section.
> On Sat, 2002-11-30 at 21:22, bpalmer wrote:
> > Is someone keeping a "hopeful" todo list?
> 
> Nearly every one of the items brought up could / should be on the
> standard todo list.
> 
> http://developer.postgresql.org/todo.php

Does anyone have additions for the list.  I didn't see anything
discussed that jumped out at me, though I didn't see "Conquer the world"
on there.  ;-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Wishlist for 7.4: Plan stability

From
Rod Taylor
Date:
On Sat, 2002-11-30 at 23:02, Bruce Momjian wrote:
> Rod Taylor wrote:
> -- Start of PGP signed section.
> > On Sat, 2002-11-30 at 21:22, bpalmer wrote:
> > > Is someone keeping a "hopeful" todo list?
> >
> > Nearly every one of the items brought up could / should be on the
> > standard todo list.
> >
> > http://developer.postgresql.org/todo.php
>
> Does anyone have additions for the list.  I didn't see anything
> discussed that jumped out at me, though I didn't see "Conquer the world"
> on there.  ;-)

I didn't see anything that wasn't on either the TODO list or in the
'Unsupported Features' page in the docs aside from the ltree/XML items
and MODIFY COLUMN support.

--
Rod Taylor <rbt@rbt.ca>

Re: Wishlist for 7.4: Plan stability

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Ok, someone else posted their data warehousing wishlist, I want to post my
> single item OLP wishlist: Plan stability.

That seems to me to translate to "I want the system to fail to react to
changes in data statistics and all other variables relevant to query
planning".

You can pretty much get that by never doing [VACUUM] ANALYZE, but I'm
quite lost as to why it's really a good idea.
        regards, tom lane


Re: 7.4 To Do

From
Paul Ramsey
Date:
We recently finished some evaluations of Oracle, and one of the things
which jumped out at me, PostgreSQL booster that I am, was that in 9i
Oracle has finally surpassed PostgreSQL in some elements of
object-relational technology. Among the things you can do are:

- Create new compound object types on the SQL command line. So for
example, the "imaginary numbers" example in the PostgreSLQ manual could
all be done on the command line. 
- Address components of objects using dot-notation. (select
employee.salary from employees)
- Index objects based on their components (create index blah_idx on
employees (employee.last_name))

In combination with ARRAY data types, and references, you can do pretty
fancy things without ever creating a C library.

PostgreSQL seems to have most of the underpinning already. I even did
some experimenting with the "tables as a datatype" stuff. However, it's
definately incomplete. In the tables-as-datatype example, you cannot get
the objects to reconstruct themselves, you have to do it by hand. And
while oid's make handy reference holders, there does not seem to be any
particular performance benefit to doing object-oriented modelling over
relational modelling in PostgreSQL at this point. By contrast, OORDBMS
like Informix can be blazingly fast when used with the proper OO model.
(Oracle performance enhancement for OO models is mixed at best right
now.)

Bruce Momjian wrote:

> Does anyone have additions for the list.  I didn't see anything
> discussed that jumped out at me, though I didn't see "Conquer the world"
> on there.  ;-)


Re: 7.4 To Do

From
Tom Lane
Date:
Paul Ramsey <pramsey@refractions.net> writes:
> Oracle has finally surpassed PostgreSQL in some elements of
> object-relational technology. Among the things you can do are:

> - Address components of objects using dot-notation. (select
> employee.salary from employees)

Cool.  How do they resolve the conflict against schema notation
(ie, is employee a table reference or a schema name here)?
        regards, tom lane


Re: 7.4 To Do

From
snpe
Date:
On Sunday 01 December 2002 05:03, Tom Lane wrote:
> Paul Ramsey <pramsey@refractions.net> writes:
> > Oracle has finally surpassed PostgreSQL in some elements of
> > object-relational technology. Among the things you can do are:
> >
> > - Address components of objects using dot-notation. (select
> > employee.salary from employees)
>
> Cool.  How do they resolve the conflict against schema notation
> (ie, is employee a table reference or a schema name here)?
>

There are simple name resolution rules

regards
Haris Peco


Re: Wishlist for 7.4: Plan stability

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Ok, someone else posted their data warehousing wishlist, I want to post my
> > single item OLP wishlist: Plan stability.
> 
> That seems to me to translate to "I want the system to fail to react to
> changes in data statistics and all other variables relevant to query
> planning".
> 
> You can pretty much get that by never doing [VACUUM] ANALYZE, but I'm
> quite lost as to why it's really a good idea.

Well, first of all there's no guarantee that the genetic algorithm will
actually produce the same plan twice, but that's a side issue.

The main issue is that you *do* want to vacuum and analyze the database
regularly to get good performance, but you don't want the database
spontaneously changing its behaviour without testing and verifying the new
behaviour personally. Not if it's a high availability production server.

I'm thinking it should require a specific privilege that can be separately
access controlled to parse a new query that didn't already exist in the query
table.

Then for a production server I would expect the DBA to arrange for vacuum
analyze to run regularly during off-peak hours. Have a job test all the
queries and report any changed optimiser behaviour. Then have a DBA sanity
check and test the performance of any new query plans before allowing them to
go into production.

But the threat of the optimiser changing behaviour from running analyze isn't
even the main threat I see this addressing. The threat of untested queries
entering production from new code being pushed live is far greater. I've seen
web sites go down more often from new queries with bad performance that were
missed in testing more often than any other source. 

And I've seen security holes caused by applications that allow untrusted users
to slip unexpected sql syntax into queries more often than any other reason.

Really it boils down to one point: there's really no reason to assume a user
should be able to execute any new query he feels like. Creating a new query
should be privileged operation just like creating a new table or new database.

For some systems such as development systems it of course makes sense for
users to be able to create new queries on the fly. For DSS systems too it's
pretty much assumed. 

But for OLTP systems it's very unlikely that a new query should suddenly be
necessary. These systems spend their days running the same queries millions of
times per day. They need to return results within milliseconds. Any new query
should be assumed to be a bug or a security breach and reported as an
immediate error. not cause the database to valiantly attempt to figure out how
best to handle the unexpected query.


-- 
greg



Re: Wishlist for 7.4: Plan stability

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Really it boils down to one point: there's really no reason to assume a user
> should be able to execute any new query he feels like. Creating a new query
> should be privileged operation just like creating a new table or new database.

This is an interesting view of what a database should be like, but it
has very little to do with my idea of a database ;-).  I think you want
some sort of middleware layer to keep your users away from the database.
I do not agree that the DB itself ought to contain such draconian
restrictions.
        regards, tom lane


Re: Wishlist for 7.4: Plan stability

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Really it boils down to one point: there's really no reason to assume a user
> > should be able to execute any new query he feels like. Creating a new query
> > should be privileged operation just like creating a new table or new database.
> 
> This is an interesting view of what a database should be like, but it
> has very little to do with my idea of a database ;-).  I think you want
> some sort of middleware layer to keep your users away from the database.

This is how people attempt to tackle this problem with Oracle or other
databases, but it's an assbackwards design. It leads to a lot of pain and
awkwardness and only partially solves the problems. A good design would be
elegant and result in a much more manageable system.

What I'm really asking for is lower level control of when the query parser and
the optimizer runs. That would allow an admin or middleware to control when
new queries are parsed and optimized.

It could also allow an admin to peek at the existing queries and see what
plans are currently in the system, rather than run explain himself and say
"well this is what it would do if i ran it now, which might be the same thing
that ran earlier and caused this performance problem but there's no way to
know for sure"

> I do not agree that the DB itself ought to contain such draconian
> restrictions.

Note that the restriction I'm proposing be available is of the form "stop the
system from doing something for me". This is the kind of feature that's
impossible to graft on cleanly in a higher layer.

-- 
greg