Thread: A Silly Idea for Vertically-Oriented Databases

A Silly Idea for Vertically-Oriented Databases

From
Avery Payne
Date:
Be forewarned - this is probably a very long post, and I'm just a mere 
mortal (ie. admin) who doesn't write copious amounts of C code.  Take 
the following posts and suggestions with a grain of salt.

So I've been seeing/hearing all of the hoopla over vertical databases 
(column stores), and how they'll not only slice bread but also make 
toast, etc.  I've done some quick searches for past articles on 
"C-Store", "Vertica", "Column Store", and "Vertical Database", and have 
seen little discussion on this.  And then a funny thought occurs to me - 
when I look at the directory structure and file layout of a PostgreSQL 
database, I see that each OID corresponds to a table, which corresponds 
to (generally) a single file. Then I have a second funny thought - what 
if there was a low-friction, low-cost-of-implementation way to bring 
similar advantages to PostgreSQL without major alterations, recoding, 
etc?  Finally it occurs to me that PostgreSQL already does something 
similar but it could do it so much better, with only one language change 
and minor changes to the storage layout.  So here's my plum-crazy 
proposal (and I've made some before - see 
http://archives.postgresql.org/pgsql-odbc/2006-10/msg00040.php - and 
they not only made it into production, but they are in active use by me 
on a weekly basis - Thanks Hiroshi!!!), bear with me...

Make one small, very tiny syntactic change to "CREATE TABLE" that 
includes a new keyword, "COLUMN-STORE" or something similar.  I don't 
care where it appears as long as it's after the "CREATE TABLE".  You 
would not have to break any existing SQL conventions, PostgreSQL would 
continue to be SQL compliant, and given the odd wording, I highly doubt 
that the folks who work on SQL keywords will end up using it at any 
point in time.  If adding COLUMN-STORE is objectionable because it will 
"cloud the compliance of the language" then simply move the 
functionality into the table space functionality.  In hindsight, it 
might even belong there instead.  So, instead of specifying it by 
table,  create a table space that has an attribute "Column Storage" set 
as active.  When inactive, it uses the traditional "one-file-per-table" 
layout.

Make each table column capable of receiving an OID.  This will be 
critical for the following steps...

If a table is created with "COLUMN-STORE" as an option, then it will 
continue to behave in the same way it always has, but the storage will 
be different.  Each column in the table will be represented by a single 
file, with the file name being (naturally) the OID.  
INSERT/UPDATE/DELETE would function as it always has, etc. Nothing would 
change.  Except how the data is stored.  The existing TOAST mechanisms 
continue to work - because the engine would treat each file as a 
single-column table! 

One additional "column" would be added to the store, an invisible one 
that not only tracks the OID for the "rows" in this type of setup, but 
also the state of the row.  Let's call this the "Control Column".  Given 
that the metadata size for the row would be fixed/constant, we won't 
have to worry about what is in the other "columns" and "rows", they can 
be any size.  BTW, the "Control Column" would be just another column 
from the storage engine's point of view.  It just happens to be one that 
no-one can see, other than the database (and maybe the administrator).

When you go to VACUUM a table, you would treat each column as a 
single-row table, so if a row is a candidate for a VACUUM reclamation, 
then it will adjust each "column" an equal amount.  Under no 
circumstances would you have columns "out of sync", so if a record goes, 
it means each adjacent column goes with it.  This sounds disk-intensive 
at first, until you realize that the admin will have made a contentious 
decision to use this format, and understands the advantages/drawbacks to 
this method.  So it takes a little longer to VACUUM, I don't care, 
because as an admin I will have specified this layout for a reason - to 
do OLAP, not OLTP.  Which means, I rarely VACUUM it.  Add to this the 
high efficiency you would gain by packing more records into buffers per 
read, and most of the losses you take in re-reading data would really 
not amount to as big a loss as you might think.

DELETE would simply mark a row off as deleted in the "Control Column".  
If the storage engine needed to reclaim a row, it would not have to look 
any further than the "control column" to find an empty spot where it 
could overwrite data.

INSERT/UPDATE continue to work as they always have.  The storage engine 
would perceive each "column" as a single-column table, meaning that the 
existing TOAST mechanisms continue to work!   Nothing needs to change 
there.  The real change would be that the table's columns would be 
"split up" into individual updates, and the "Control Column" would be 
used to keep all of the records in sync.

Why bother with this?  Because, when you are said and done, you will 
find yourself with a rough equivelent of a column-store database, with 
all of the OLAP goodness that people are looking for.  You have little 
if any impact on the admin/users perception, other than a flag was 
checked somewhere and forgotten about in the database.  From the storage 
engine's perspective, you have many many many small 1-column tables to 
take care of, and they all update at the same "place" at the same "time" 
to keep the records in sync when you recompose a row.  TOAST and large 
object storage works the same as before, nothing changes, and that's as 
it should be.

All with what would be (hopefully) a minor change to the storage 
backend.  We're not talking about brain surgery on existing, tested 
code, but rather, a new feature that uses existing features in-place.  
As TOAST improves so does this feature.  As caching improves, so does 
the feature again.  And so on.

I've been in a bit of a hurry to blurt all of this out, and I'm sure 
that I've forgotten something along the way, so if you find something 
missing, please be patient- I had to write all of this in about 20 
minutes or less and I didn't have alot of time.


Re: A Silly Idea for Vertically-Oriented Databases

From
Josh Berkus
Date:
Avery,

> Make one small, very tiny syntactic change to "CREATE TABLE" that
> includes a new keyword, "COLUMN-STORE" or something similar. 

If someone writes the rest of the code, I doubt the syntax will be the 
holdup.  But writing an efficient C-store table mechanism is much harder 
than I think you think it is; Vertica worked on it for a year and failed, 
and Paraccel took two years to succeed.   FYI, Paraccel is based on 
Postgres.

So, put up a pgfoundry project and start hacking a c-store table; I'm sure 
you;ll get interest if you can make something work.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: A Silly Idea for Vertically-Oriented Databases

From
Avery Payne
Date:
<pre><font><font face="Arial, Helvetica, sans-serif">Avery,

<my ramblings snipped>

>If someone writes the rest of the code, I doubt the syntax will be the 
>holdup.  But writing an efficient C-store table mechanism is much harder 
>than I think you think it is; Vertica worked on it for a year and failed, 
>and Paraccel took two years to succeed.   FYI, Paraccel is based on 
>Postgres.

>So, put up a pgfoundry project and start hacking a c-store table; I'm sure 
>you;ll get interest if you can make something work.

>-- 
>--Josh

Well, I did say it was a *crazy* idea. :-)

Given that I would be starting from the ground-floor, learning not only
the innards of PostgreSQL but also C coding as well, I would probably
have to overcome near-insurmountable odds to make this project take off.  Still,
if I was crazy enough to think it, maybe I'll be crazy enough to 
try for it. ;-)

Just ignore my 2nd posting, I was trying to clarify some of the 
ramblings I was typing.
</font></font></pre>

Re: A Silly Idea for Vertically-Oriented Databases

From
"Josh Tolley"
Date:
On 9/7/07, Avery Payne <apayne@pcfruit.com> wrote:
>
>  Avery,
>
> <my ramblings snipped>
>
> >If someone writes the rest of the code, I doubt the syntax will be the
> >holdup. But writing an efficient C-store table mechanism is much harder
> >than I think you think it is; Vertica worked on it for a year and failed,
> >and Paraccel took two years to succeed. FYI, Paraccel is based on
> >Postgres.
>
> >So, put up a pgfoundry project and start hacking a c-store table; I'm sure
> >you;ll get interest if you can make something work.
>
> >--
> >--Josh
>
> Well, I did say it was a *crazy* idea. :-)
>
> Given that I would be starting from the ground-floor, learning not only
> the innards of PostgreSQL but also C coding as well, I would probably
> have to overcome near-insurmountable odds to make this project take off.
> Still,
> if I was crazy enough to think it, maybe I'll be crazy enough to
> try for it. ;-)
>
> Just ignore my 2nd posting, I was trying to clarify some of the
> ramblings I was typing.

For whatever it's worth, I was reading about the same things today and
came up with the same basic idea, without the same level of
implementation details you've talked about, Avery. And it sounds
really neat. Hard, but neat, and potentially worth it if, say,
Paraccel doesn't open source their stuff first :)

But I don't know the PostgreSQL internals either, though I've been
known to throw together some C code now and again. So in short,
there's interest. Whether there's collective skill/free
time/motivation/insanity/etc. enough to make something useful of the
interest is another question altogether. :)

- Josh/eggyknap


Re: A Silly Idea for Vertically-Oriented Databases

From
Simon Riggs
Date:
On Fri, 2007-09-07 at 13:52 -0700, Avery Payne wrote:

> So I've been seeing/hearing all of the hoopla over vertical databases 
> (column stores), and how they'll not only slice bread but also make 
> toast, etc.  I've done some quick searches for past articles on 
> "C-Store", "Vertica", "Column Store", and "Vertical Database", and have 
> seen little discussion on this. 

I looked at doing this a while back, for similar reasons.

ISTM we would be able to do this fairly well if we implemented
Index-only columns. i.e. columns that don't exist in the heap, only in
an index. 

Taken to the extreme, all columns could be removed from the heap and
placed in an index(es). Only the visibility information would remain on
the heap.

Syntax for this would be an ALTER TABLE SET STORAGE command, with a new
type of storage definition that will only be accepted if an index
already has been defined on the table which includes the specified
column. Doing this per column would be a big win over vertical databases
since AFAIK they *have* to do this to *every* column, even if it is not
beneficial to do so.

Every existing index plan works immediately. The main annoyance is
retrieving a column value that doesn't exist on the heap. That would
require a new kind of plan that involves preparing the index(es) by
sorting them on htid and then doing a left merge join with the main
heap. By now, most people will be screaming at their monitors "what an
idiot, thats gonna REALLY suck". True, but this is the same thing that
column-oriented databases have to do also, so it would be taking the
best that vertical databases have to offer and accepting the
consequences. There are some other plan possibilities also, apart from
this basic value retrieval, but they would require further index API
changes; I'm not certain of those as being primary use cases, however.

Vertical databases honestly do have their uses and there are many kinds
of marketing query that have complex where clauses yet only simple
select clauses. There are a number of industry-specific database
products that have utilised this technique to good effect for a number
of years.

So ISTM the main changes would be executor changes to allow retrieving
column values of index-only columns when they are required, and to
modify the insert/update tuple code somewhat. I thought maybe we can
call it COAST, Column-oriented attribute storage technique, :-)

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: A Silly Idea for Vertically-Oriented Databases

From
Mark Mielke
Date:
Simon Riggs wrote:
> ISTM we would be able to do this fairly well if we implemented
> Index-only columns. i.e. columns that don't exist in the heap, only in
> an index. 
>
> Taken to the extreme, all columns could be removed from the heap and
> placed in an index(es). Only the visibility information would remain on
> the heap.
>   
Wouldn't the extreme be - even the visibility information is in the 
index? :-)

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>


Re: A Silly Idea for Vertically-Oriented Databases

From
Alvaro Herrera
Date:
Mark Mielke wrote:
> Simon Riggs wrote:
>> ISTM we would be able to do this fairly well if we implemented
>> Index-only columns. i.e. columns that don't exist in the heap, only in
>> an index. 
>> Taken to the extreme, all columns could be removed from the heap and
>> placed in an index(es). Only the visibility information would remain on
>> the heap.
>>   
> Wouldn't the extreme be - even the visibility information is in the index? 

Maybe we could extract the visibility information and put it in a
storage separate from the heap.  A seqscan would be a bit slower (have
to check the heap and the visibility storage) but some index scans could
be faster (can check the index and visibility storage, skipping the
heap), and updates would be faster too (append into the heap, update
visibility storage).  This would allow something closer to index-only
scans.  Of course, the key is to allow efficient lookup of visibility
info given a TID.

Has this been explored before?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: A Silly Idea for Vertically-Oriented Databases

From
Avery Payne
Date:
<div id="pgContentWrap"><div id="txtArchives"><pre><font face="Arial, Helvetica, sans-serif">>ISTM we would be able
todo this fairly well if we implemented
 
>Index-only columns. i.e. columns that don't exist in the heap, only in
>an index. 

>Taken to the extreme, all columns could be removed from the heap and
>placed in an index(es). Only the visibility information would remain on
>the heap.

So, let me understand this correctly - you want to index the columns and
use the index to reconstruct the data?  Some kind of "implicit" reconstruction?

>Doing this per column would be a big win over vertical databases
>since AFAIK they *have* to do this to *every* column, even if it is not
>beneficial to do so.</font><font><font face="Arial, Helvetica, sans-serif"></font></font>
<font face="Arial, Helvetica, sans-serif">
<snip></font><font><font><font><font face="Arial, Helvetica, sans-serif">

I was thinking about something a little more crude - each column being a free-standing
table, but being "viewed" by the client as a single entity, a kind of "data federation".
The idea was that the existing storage mechanism wouldn't be altered, and
with a little slight-of-hand, we could extend the mechanism without hampering things
like clustering, future extensions, optimizations, etc.  No changes to MVCC would be
needed, because it would above and through it.

The idea was that for a "wide" table you target only a few columns, so you could
sequential read without the penalty of having to seek to a new offset for each record.
Instead of processing all the columns, you process a single column, which means less
data to read for the same number of records.  That gain starts to slope off
when you specify more and more columns from the table.
</font></font></font></font><font><font face="Arial, Helvetica, sans-serif">
</font></font><font face="Arial, Helvetica, sans-serif">Throw in selective indexing (similar to what you were talking
about)and suddenly we can reclaim
 
some of that lost speed.  We'll make a kind of "compressed index", where the key turns into
a hash that points to (is attached to?) a bucket, and the bucket contains the offset of all the
records that relate to that key.  Other tricks can be employed, such as run-length encoding
entire ranges of offsets, etc. to keep this really really small.  Really small = faster and faster
to read, using more CPU than I/O.  And I/O is still more of an issue than CPU at this point.

Then again, if you ditch the column altogether and use a "compressed index" to reconstruct
data implicitly, now we're close to what you were talking about (assuming I understand you
correctly and also assuming that PostgreSQL doesn't already do this with indexes).  So, if the
column is indexed, then maybe split it off into a "compressed index", and if not, keep it in the main
table outright?

I guess I really need to think about this a bit more before I start delving into code.

<Vertical DB market discussion - snipped>

>I thought maybe we can call it COAST, Column-oriented attribute storage technique, :-)

I like it. :-)<a href="http://www.2ndQuadrant.com" rel="nofollow"></a>  I just wish I would have read this before
applyingfor a project name
 
at pgfoundry, the current proposal is given as "pg-cstore".
</font></pre></div></div>

Re: A Silly Idea for Vertically-Oriented Databases

From
Alvaro Herrera
Date:
Avery Payne wrote:

> >I thought maybe we can call it COAST, Column-oriented attribute storage technique, :-)
> 
> I like it. :-)<a rel="nofollow" href="http://www.2ndQuadrant.com"></a>  I just wish I would have read this before
applyingfor a project name
 
> at pgfoundry, the current proposal is given as "pg-cstore".

You can email the pgfoundry admins at gforge-admins@pgfoundry.org and
ask it to be cancelled instead of approved, and resubmit with the other
name.

-- 
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)


Re: A Silly Idea for Vertically-Oriented Databases

From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2007-09-10 kell 11:01, kirjutas Alvaro Herrera:
> Mark Mielke wrote:
> > Simon Riggs wrote:
> >> ISTM we would be able to do this fairly well if we implemented
> >> Index-only columns. i.e. columns that don't exist in the heap, only in
> >> an index. 
> >> Taken to the extreme, all columns could be removed from the heap and
> >> placed in an index(es). Only the visibility information would remain on
> >> the heap.
> >>   
> > Wouldn't the extreme be - even the visibility information is in the index? 
> 
> Maybe we could extract the visibility information and put it in a
> storage separate from the heap.  A seqscan would be a bit slower (have
> to check the heap and the visibility storage) but some index scans could
> be faster (can check the index and visibility storage, skipping the
> heap), and updates would be faster too (append into the heap, update
> visibility storage).  This would allow something closer to index-only
> scans.  Of course, the key is to allow efficient lookup of visibility
> info given a TID.
> 
> Has this been explored before?

I wrote to this list about a vague plan for doing it some months ago.

And I don't think that even seqscans need to be slower, as at least for
typical Data Warehouse application the visibility info can be compressed
a lot by even a simple RLE compression and thus you just do the sequscan
like this

while 1:   get next visible range   read in the visible range, without checking each tuple

I suspect that this can be made much more cache-friendly than current
scheme.

(A Guess: I even go as far as to claim that separate visibility heap
_may_ be more cahce friendly even uncompressed, so if we have every
second tuple deleted, it will still be faster (at least for bigger
tuples) to look up visibility in a dense visibility array and then
access just the visible tuples. Here even separate heap may be not
needed, just rearranging the heap structure to have visibility info in
the beginning of page together with tuple pointers may be a win )

Another bonus of separate visibility heap is that it can be maintained
separately, that is it can be shuffled around, CID's cleaned up,
compressed, etc. much more effectively than current one, which, among
other things will make VACUUM much more efficient, as neither
all-visible or all-dead pages need to be visited at all.

It can probably be made to serve as both DSM and VACUUM map also. And it
can be used for finding "best" insert spot for CLUSTERing-preserving
inserts/updates.


On subject of vertical or column-per-heap storage I see the biggest
obstacle to be te use of TID-s as tuple identifiers, as the "real" TIDs
(pageno32:tupleno:16) will be different for each column. So if we have a
table with a 1 kb text column, we will also have to store just 8 ints
per page in the int column if we want to preserve unique TID->tuple
mapping.

What we could do of course is start defining TID as just a 48-bit tuple
number and then have some rules for finding the PAGE:NR "tid" from that

it would be easy for fixed-size columns 
(PAGE:NR) = (TID/items_per_page:TID mod items_per_page)
but more complicated for VARLEN fields.

All the ideas I've had sofar for solving this have quite  a lot of
downsides. 

--------------
Hannu