Thread: Sponsoring enterprise features

Sponsoring enterprise features

From
James Rogers
Date:
Hi folks,

Is there any pre-existing protocol for a company to pay for specific
features to be added to PostgreSQL?

I've gotten full executive buy-in to the idea that it would be far
cheaper to sponsor and pay for people to develop the enterprise features
we need in Postgres than to do an Oracle migration to get those same
features that we need (which would cost unholy amounts of money that we
don't want to spend for our installation). All that said, I don't know
if this is a feasible plan, or what the makeup is of the developers
currently working on Postgres.  As a practical matter, we do not have
the time or people to take on this project in-house.

Our company is interested in sponsoring a push to get enterprise-level
scalability features into PostgreSQL, things like partitioning and
organized heaps. As a practical business matter, Oracle is an option but
one of last resort that we (and I) would prefer to avoid if at all
possible.  We see an obvious long-term benefit to making Postgres do
what we need it to do than buying gobs of Oracle licenses.

Are other people/companies already doing this, either officially or
unofficially, and what is the general protocol for going about doing
this?

Cheers,

-James Rogersjamesr@best.com




Re: Sponsoring enterprise features

From
Josh Berkus
Date:
Mr. Rogers,

> Is there any pre-existing protocol for a company to pay for specific
> features to be added to PostgreSQL?

> Are other people/companies already doing this, either officially or
> unofficially, and what is the general protocol for going about doing
> this?

Other companies are doing this, and we often depend on corporate support for
new major features.

The general approach is to hire or contract a major contributor to make the
changes you want and win approval in the community for them.

Probably a few companies and/or individuals have already contacted you.   If
not, you are welcome to contact me.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Sponsoring enterprise features

From
Rod Taylor
Date:
On Tue, 2003-11-18 at 14:33, James Rogers wrote:
> Hi folks,
> 
> Is there any pre-existing protocol for a company to pay for specific
> features to be added to PostgreSQL?

There are several people who do this type of work (Neil, Joe, David, the
folks are Command Prompt Inc., etc.).

Personally, I think the best way is simply to make a post on -hackers
with a description of what you want to accomplish with a call for
estimates and proposals. Ensure you require the feature to be applied to
the main code line so it will be maintained for future releases.

I say a description of what you want to accomplish because certain
features are not as useful on PostgreSQL as they are other databases
(data partitioning being one of them, due to the ability to use partial
indexes) so you may not achieve what you are expecting.



Re: Sponsoring enterprise features

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> Personally, I think the best way is simply to make a post on -hackers
> with a description of what you want to accomplish with a call for
> estimates and proposals. ...
> I say a description of what you want to accomplish because certain
> features are not as useful on PostgreSQL as they are other databases
> (data partitioning being one of them, due to the ability to use partial
> indexes) so you may not achieve what you are expecting.

Right.  You can in any case get a great deal of free advice by starting
a pghackers discussion ;-)

It should be noted that "because Oracle does it that way" is a
guaranteed nonstarter as a rationale for any Postgres feature proposal.
There are enough differences between Postgres and Oracle that you will
need to do significant investigation before assuming that an Oracle-
based feature design is appropriate for Postgres.  Aside from technical
differences, we have fundamentally different priorities --- one of which
is simplicity of administration.  You'll get no buyin on proposals that
tend to create Oracle-like difficulties of installation and tuning.
        regards, tom lane


Re: Sponsoring enterprise features

From
James Rogers
Date:
On Thu, 2003-11-20 at 22:20, Tom Lane wrote:
> It should be noted that "because Oracle does it that way" is a
> guaranteed nonstarter as a rationale for any Postgres feature proposal.


A method of doing something is not a "feature"; making something
possible that couldn't be done before is a "feature".  I don't really
care how Oracle does something, though I am cognizant of *why* Oracle
does something.  s/Oracle/DB2/, and little changes.


> There are enough differences between Postgres and Oracle that you will
> need to do significant investigation before assuming that an Oracle-
> based feature design is appropriate for Postgres.  Aside from technical
> differences, we have fundamentally different priorities --- one of which
> is simplicity of administration.  You'll get no buyin on proposals that
> tend to create Oracle-like difficulties of installation and tuning.


I'm not sure what Oracle has to do with any of this.  If I wanted to use
Oracle, I would buy Oracle.  The thing is, I'm intimately familiar with
Oracle and there are a lot of things I despise about Oracle as a
consequence of this familiarity.  The features I'm talking about can be
added to any reasonable database engine, and are generically supported
features (or "enterprise" add-ons) in virtually all large commercial
databases.

As I stated previously, I/we are interested in adding features for
managing very large tables and working sets, and making Postgres scale
in general for these kinds of databases (currently, it does not).  These
kinds of features will be important for enterprise users, particularly
ones interested in migrating from Oracle/DB2/SQLServer/etc, and would be
invisible to people that don't need them.  This is a matter of adding
important functionality that can be supported by any reasonable database
engine.  In a nutshell, the features on my short list are all about heap
management (e.g. partitioning).  This is really important when databases
reach a certain size, but something for which Postgres has almost no
support.  

>From a large-scale enterprise database standpoint, heap management is
almost as important a capability as replication.  Replication is being
aggressively worked on, heap management is not and so we are interested
in making sure this part gets developed.  When PostgreSQL has this,
there will be little reason for anyone to use the big commercial
database-du-jour.  I don't care how its implemented specifically, just
as long as it is in there, and there is no technical reason that it
couldn't be implemented per previous discussions.

I've gotten the green light (and many responses from people interested
in doing it) to start writing up RFQs for specific features, which I
will post to the pg-hackers list.  It is all stuff previously determined
to be doable within the current PostgreSQL framework, and just requiring
some work that my company is willing to help pay for.

Cheers,

-James Rogersjamesr@best.com





Re: Sponsoring enterprise features

From
Josh Berkus
Date:
James,

> I'm not sure what Oracle has to do with any of this.  If I wanted to use
> Oracle, I would buy Oracle. 

Good.  Your original post, which appeared to propose carbon-copying a number 
of features from Oracle -- I didn't necessarily read it that way, but several 
other people did, including some of the developers you will want to recruit.    
You are not the first person to put forward some of these ideas, and your 
predicessors tended to follow the line "but if Oracle does it that way, it 
must be good."   So it's a knee-jerk reaction thing.

AFAIK, Tom was just warning you that your feature proposals need to be backed 
by arguments/evidence that they will actually improve performance or expand 
capabilities for PostgreSQL in some meaningful way.   You seem prepared to do 
that, so we shouldn't have any disagreements in that way.

> In a nutshell, the features on my short list are all about heap
> management (e.g. partitioning).  This is really important when databases
> reach a certain size, but something for which Postgres has almost no
> support.  

heap management == table partitioning?

I'm a little unclear, personally, about what can be accomplished through table 
partitioning that we can't currently do through partial indexes and inherited 
tables, especially after Gavin finishes his tablespaces patch (btw, Gavin 
could use sponsorship on that one, I think).  Can you make your case to 
me/the list?   So far, the only arguments we've gotten on this list have been 
of the "Oracle does it that way" variety so it'd be interesting to see 
something concrete.

Now, query partitioning is something I think everyone is interested in, and 
would very much like to see someone implement.  

> I've gotten the green light (and many responses from people interested
> in doing it) to start writing up RFQs for specific features, which I
> will post to the pg-hackers list.  It is all stuff previously determined
> to be doable within the current PostgreSQL framework, and just requiring
> some work that my company is willing to help pay for.

Cool.   I look forward to seeing the fruits of this effort.

From my perspective, the other "oracle-killer" that we're missing includes 
some isolated but difficult improvements in the Query Planner necessary to 
pass the TPC benchmarks.   I'd be happy to discuss those as well, if you 
like.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Sponsoring enterprise features

From
Greg Stark
Date:
Josh Berkus <josh@agliodbs.com> writes:

> I'm a little unclear, personally, about what can be accomplished through table 
> partitioning that we can't currently do through partial indexes and inherited 
> tables, especially after Gavin finishes his tablespaces patch (btw, Gavin 
> could use sponsorship on that one, I think).  Can you make your case to 
> me/the list?   So far, the only arguments we've gotten on this list have been 
> of the "Oracle does it that way" variety so it'd be interesting to see 
> something concrete.

Well probably everyone who wants it is saying things of the form "they were a
useful with Oracle because...". Which isn't the same thing as "Oracle does it
this way". I don't particularly care how partitioned tables are *implemented*,
only the net effect. You can think of them as an abstraction over inherited
tables that let the database guarantee your data integrity and offer query
optimizations in a way it cannot if you build it by hand.  

I know for us they were an absolute godsend. The main advantages over a single
monolithic table even with partial indexes are:

1) Being able to load and unload parts of the table quickly.

  Adding and removing a partition is basically a DDL operation, not DML. It  doesn't have to visit every tuple and mark
itdeleted or added. It just has  to add or remove the entire partition to the structure.
 
  Partitioned tables are frequently used for aging out old data. The common  example is of having a partition per month
andkeeping 3-12 months of data.  We had a more extreme case where we had one partition per day and kept 21  days of
data.    When we implemented partitioned tables the time to archive and delete the  old data went from taking most of
thenight and killing production  performance to effectively instantaneous and we were able to run it at peak  time.
 

2) Being able to do a sequential scan of a partition. 
  Sequential scans are faster than index scans. Sometimes much faster.  Partial indexes are nice but when they cover
10-20%of your table scanning  them is much slower than a sequential scan of a partition.
 


As for inherited tables. Well, I would expect a partitioned tables scheme to
be implemented using inherited tables or just using views. You could jury-rig
it today using these tools, it would just be very awkward and fragile. The
original Oracle implementation in Oracle 7 was implemented much the same way
using views. They were a complete hack and required lots of manual tweaking
though.


The point of partitioned tables is

a) The database ensures tuples go into the correct partition. If you used a  manually constructed view or inherited
tablesyou would always run the risk  of inserting into the wrong partition which would break your data  integrity.
 

b) The database automatically optimizes queries to query the correct  partitions. It detects clauses in the query much
likepartial indexes so  you don't have to tweak every query by hand and the database can skip  clauses that match the
partitionclause exactly. Also this is a prime  opportunity for the database to introduce parallel queries because each
partitioncan be accessed independently.
 


-- 
greg



Re: Sponsoring enterprise features

From
Alvaro Herrera
Date:
On Sat, Nov 22, 2003 at 11:54:45AM -0800, Josh Berkus wrote:

> > In a nutshell, the features on my short list are all about heap
> > management (e.g. partitioning).  This is really important when databases
> > reach a certain size, but something for which Postgres has almost no
> > support.  
> 
> heap management == table partitioning?

It was just an example.  Another example could be a B-Tree sorted heap
(like an index, but with the whole tuple at the leaf pages instead of a
pointer to the heap).  There was a thread about this some time ago.
It seemed a good idea ...

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Para tener más hay que desear menos"


Re: Sponsoring enterprise features

From
"Zeugswetter Andreas SB SD"
Date:
Main needs partitioning is useful for:
- partition elimination for queries (e.g. seq scans only scan relevant partitions)
- deleting/detaching huge parts of a table in seconds
- attaching huge parts to a table in seconds (that may have been loaded witha fast loading utility (e.g. loading
withoutindex, prebuilding indexes,attaching table + prebuilt partitioned indexes))  
- achieving [heap and index] per page data locality (for better cache rates)
- allowing partial restores (for defect disks) while the rest of the db is still online
- in pg, allowing partial vacuums (only partitions that see changes)

People needing those features usually have data with more than 10-50 Gb per
partition.

> I'm a little unclear, personally, about what can be accomplished through table
> partitioning that we can't currently do through partial indexes and inherited
> tables, especially after Gavin finishes his tablespaces patch

Well, sure the goal needs to be to make use of what already exists,
but a few things are still missing, e.g.:- unique indexes, that span the hierarchy (and do not contain the partitioning
column[s])-partition elimination (imho we should use check constraints for that)- physical backups :-)- tablespaces ? 

Note, that these would all be useful for non partitioning
use cases also.

Andreas


Re: Sponsoring enterprise features

From
Hannu Krosing
Date:
Zeugswetter Andreas SB SD kirjutas E, 24.11.2003 kell 13:16:
> Main needs partitioning is useful for:
> - partition elimination for queries (e.g. seq scans only scan relevant partitions)
> - deleting/detaching huge parts of a table in seconds
> - attaching huge parts to a table in seconds (that may have been loaded with
>     a fast loading utility (e.g. loading without index, prebuilding indexes,
>     attaching table + prebuilt partitioned indexes)) 
> - achieving [heap and index] per page data locality (for better cache rates)
> - allowing partial restores (for defect disks) while the rest of the db is still online
> - in pg, allowing partial vacuums (only partitions that see changes)
> 
> People needing those features usually have data with more than 10-50 Gb per 
> partition. 
> 
> > I'm a little unclear, personally, about what can be accomplished through table 
> > partitioning that we can't currently do through partial indexes and inherited 
> > tables, especially after Gavin finishes his tablespaces patch

Partial indexes don't solve the basic clustering problem (need for
reading too many pages).

Using inherited tables for this somehow seems conceptually wrong, though
this may solve the clustering problem. Also, indexes don't currently
cover child tables, but this may be thought of as an implementation
deficiency.

But some mechanisms used for inheritance could likely be used as basis
for partitioning.

> Well, sure the goal needs to be to make use of what already exists,
> but a few things are still missing, e.g.:
>  - unique indexes, that span the hierarchy (and do not contain the partitioning column[s])

We kind of have these - a unique index can nicely be built over table
stored in several 1GB files. Now we must just make sure that seqscans
know about the gaps if files are not full 1GB in size.

BTW, does VACUUM FULL shrink individual files or will it move tuples
between files when shrinking the storage ?

Are interfaces to sparse files portable enough that we could use these ?

>  - partition elimination (imho we should use check constraints for that)

or partial indexes

or smart clustering which can make use of several partitions depending
on the possition of the tuple in the index(es) used for clustering.

>  - physical backups :-)

Do you mean read-only partitions which can come and go?

>  - tablespaces ?

Tablespaces should be an orthogonal feature, mainly about efficient
storage. We could like to spread partitions of the same table over
several tablespaces.

> Note, that these would all be useful for non partitioning 
> use cases also.

True.

---------------
Hannu