Re: 8.2 features status - Mailing list pgsql-hackers

From Rick Gigger
Subject Re: 8.2 features status
Date
Msg-id D3009F10-74C3-4A9B-9C92-BB73759392A1@alpinenetworking.com
Whole thread Raw
In response to Re: 8.2 features status  (David Fetter <david@fetter.org>)
List pgsql-hackers
If people are going to start listing features they want here's some  
things I think would be nice.  I have no idea though if they would be  
useful to anyone else:

1) hierarchical / recursive queries.  I realize it's just been  
discussed at length but since there was some question as to whether  
or not there's demand for it so I am just weighing in that I think  
there is.  I have to deal with hierarchy tables all the time and I  
simply have several standard methods of dealing with them depending  
on the data set / format.  But they all suck.  I've just gotten use  
to using the workarounds since there is nothing else.  If you are not  
hearing the screams it's just because I think it's just become a fact  
of life for most people (unless you're using oracle) that you've just  
got to work around it.  And everyone already has some code to do this  
and they've already done it everywhere it needs to be done.  And as  
long as you're a little bit clever you can always work around it  
without taking a big performance hit.  But it would sure be nice to  
have next time I have to deal with a tree table.

2) PITR on a per database basis.  I think this would be nice but I'm  
guessing that the work involved is big and that few people really  
care or need it, so it will probably never happen.

3) A further refinement of PITR where some sort of deamon ships small  
log segments as they are created so that the hot standby doesn't have  
to be updated in 16MB increments or have to wait for some timeout to  
occur.  It could always be up to the minute data.

4) All the Greenplum Bizgress MPP goodness.  In reality (and I don't  
know if bizgress mpp can actually do this) I'd like to have a cluster  
of cheap boxes.  I'd like to install postgres on all of them and  
configure them in such a way that it automatically partitions and  
mirrors each table so that each piece of data is always on two boxes  
and large tables and indexes get divided up intelligently.  Sort of  
like a raid10 on the database level.  This way any one box could die  
and I would be fine.  Enormous queries could be handled efficiently  
and I could scale up by just dropping in new hardware.

Maybe greeenplum has done this.  Maybe we will get their changes soon  
enough, maybe not.  Maybe this sort of functionality will never  
happen.  My guess is that all the little bit's a pieces of this will  
trickle in over the next several years and this sort of setup will be  
slowly converged on over time as lot's of little things come  
together.  Table spaces and constraint exclusion come to mind here as  
things that could eventually evolve to contribute to a larger solution.

5) Somehow make it so I NEVER HAVE TO THINK ABOUT OR DEAL WITH VACUUM  
AGAIN.  Once I get everything set up right everything works great but  
I'm sure if there's one thing I think everyone would love it would be  
getting postgres to the point where you don't even need to ship  
vacuumdb because there's no way the user could outsmart postgres's  
attempts to do garbage collection on it's own.

6) genuine updatable views.  such that you just add an updatable  
keyword when you create the view and it's automagically updatable.   
I'm guessing that we'll get something like that, but its real magic  
will be throwing an error to tell you when you try to make a view  
updatable and it can't figure out how to make the rules properly.

7) allow some way to extract the data files from a single database  
and insert them into another database cluster.  In many cases it  
would be a lot faster to copy the datafiles across the network than  
it is to dump, copy dump file, reload.

8) some sort of standard "hooks" to be used for replication.  I guess  
when the replication people all get their heads together and tell the  
core developers what they all need something like this could evolve.

Like I said, postgres more than satisfies my "needs".  I am  
especially happy when you factor in the cost of the software (free),  
and the quality of the community support (excellent).

And you can definitely say that the "missing" list is shrinking.  But  
I think of it like this.  There are tiers of database functionality  
that different people need:
A) Correct me if I'm wrong but as great as postgres is there are  
still people out there that MUST HAVE Oracle or DB2 to get done what  
they need to get done.  They just do things that the others can't.   
They may be expensive.  They may suck to use and administer but the  
simple fact is that they have features that people need that are not  
offered in less expensive databases.
B) Very, very powerful databases but lack the biggest, most  
complicated "enterprise" features.
C) Light weight db for taking care of the basic need to store data  
and query it with sql. (some would call these "toy" databases)
D) databases which are experimental, unreliable or have other limits  
that make them not practical compared with the other options

I would say that with version 7.0 postgres moved from D to C (please  
don't get offended if this is way off base, I never used 6.x but I  
heard it was prone to crashes, data corruption and of course there  
was that pesky row size limit).  It then proceeded to move up within  
tier C to become the best of it's class and pushing up into level B.   
With 8.0 it was firmly in level B.  It was fast, efficient, powerful  
and began adding lots of really, really big features like PITR,  
savepoints, tablespaces, etc.  Add ons like slony also allowed it to  
be used in places where it otherwise wouldn't have measured up.

Now there are only a few features left in the B range and so there  
are tons of situations that can be taken care of by postgres now that  
were out of it's reach just a few years ago.  Once those features are  
all gone there will still be some very big, very difficult features  
on the table that once completed will begin to remove any advantage  
that the really big guys have.   I'm thinking especially of #4 above  
here.  But they will definitely take a while.

I may have tons of details wrong here but my point is that I think  
that postgres isn't just taking stuff off a big to do list, but  
rather is pushing itself upwards and is now in a position to start  
working on some very hard problems that once completed will put it  
into a very elite class of database systems.  The "missing" list for  
tier B type problems is shrinking down to almost nothing and items  
from the tier A missing list are starting to come into view.

Maybe I'm way off base here but that's how I see it.  Postgres has  
come a long, long way, but the problems ahead are bigger and meaner  
than the ones behind.


On Aug 4, 2006, at 12:02 AM, David Fetter wrote:

> On Fri, Aug 04, 2006 at 12:37:10AM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> To me new things are like PITR, Win32, savepoints, two-phase
>>> commit, partitioned tables, tablespaces.  These are from 8.0 and
>>> 8.1.  What is there in 8.2 like that?
>>
>> [ shrug... ]  Five out of your six items have no basis in the SQL
>> spec.  So it's not clear to me what your definition of "major
>> feature" is, unless maybe it's "anything except what we did for
>> 8.2".  Can you enumerate ten things you would consider comparable to
>> the above features that aren't done yet?
>
> First, I'd like to say people are doing a fantastic job here.  Kudos!
>
> One huge thing missing from the "done" list is that crucial bit of
> infrastructure and process that has shortened feedback loops--hence
> the beta period--by weeks if not months: the build farm.  It's now
> smoothly integrated into the development process, and as a
> consequence, we can realistically have a release each year. :)
>
> As far as big missing features go, here's a short list:
>
> * Splitting queries among CPUs--possibly even among machines--for OLAP
>   loads
>
> * In-place upgrades (pg_upgrade)
>
> * Several varieties of replication, which I believe we as a project
>   will eventually endorse and ship
>
> * CALL
>
> * WITH RECURSIVE
>
> * MERGE
>
> * Windowing functions
>
> * On-the-fly in-line calls out to PL/your_choice without needing to
>   issue DDL
>
> * Wild-eyed feral bits of the SQL standard like SQL/MED and SQL/XML
>
> But all that leaves out the oldest, most honored Postgres tradition:
>
>     Breaking New Ground.
>
> We're definitely not done yet. :)
>
> Cheers,
> D
> -- 
> David Fetter <david@fetter.org> http://fetter.org/
> phone: +1 415 235 3778        AIM: dfetter666
>                               Skype: davidfetter
>
> Remember to vote!
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>



pgsql-hackers by date:

Previous
From: Rick Gigger
Date:
Subject: Re: pg_upgrade (was: 8.2 features status)
Next
From: Peter Eisentraut
Date:
Subject: Re: 8.2 features status