On Fri, 2007-09-07 at 13:52 -0700, Avery Payne wrote:
> So I've been seeing/hearing all of the hoopla over vertical databases
> (column stores), and how they'll not only slice bread but also make
> toast, etc. I've done some quick searches for past articles on
> "C-Store", "Vertica", "Column Store", and "Vertical Database", and have
> seen little discussion on this.
I looked at doing this a while back, for similar reasons.
ISTM we would be able to do this fairly well if we implemented
Index-only columns. i.e. columns that don't exist in the heap, only in
an index.
Taken to the extreme, all columns could be removed from the heap and
placed in an index(es). Only the visibility information would remain on
the heap.
Syntax for this would be an ALTER TABLE SET STORAGE command, with a new
type of storage definition that will only be accepted if an index
already has been defined on the table which includes the specified
column. Doing this per column would be a big win over vertical databases
since AFAIK they *have* to do this to *every* column, even if it is not
beneficial to do so.
Every existing index plan works immediately. The main annoyance is
retrieving a column value that doesn't exist on the heap. That would
require a new kind of plan that involves preparing the index(es) by
sorting them on htid and then doing a left merge join with the main
heap. By now, most people will be screaming at their monitors "what an
idiot, thats gonna REALLY suck". True, but this is the same thing that
column-oriented databases have to do also, so it would be taking the
best that vertical databases have to offer and accepting the
consequences. There are some other plan possibilities also, apart from
this basic value retrieval, but they would require further index API
changes; I'm not certain of those as being primary use cases, however.
Vertical databases honestly do have their uses and there are many kinds
of marketing query that have complex where clauses yet only simple
select clauses. There are a number of industry-specific database
products that have utilised this technique to good effect for a number
of years.
So ISTM the main changes would be executor changes to allow retrieving
column values of index-only columns when they are required, and to
modify the insert/update tuple code somewhat. I thought maybe we can
call it COAST, Column-oriented attribute storage technique, :-)
-- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com