Re: WORM and Read Only Tables (v0.1) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: WORM and Read Only Tables (v0.1)
Date
Msg-id 1197448449.4255.1527.camel@ebony.site
Whole thread Raw
In response to WORM and Read Only Tables (v0.1)  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: WORM and Read Only Tables (v0.1)  ("Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at>)
List pgsql-hackers
On Tue, 2007-12-11 at 20:30 -0800, Josh Berkus wrote:
> Simon,
> 
> > Use Case: VLDB with tons of (now) read only data, some not. Data needs
> > to be accessible, but data itself is rarely touched, allowing storage
> > costs to be minimised via a "storage hierarchy" of progressively cheaper
> > storage.
> 
> There's actually 2 cases to optimize for:
> 1) write-once-read-many (WORM)
> 2) write-once-read-seldom (WORS)
> 
> The 2nd case is becoming extremely popular due to the presence of 
> government-mandated records databases.  For example, I'm currently working on 
> one call completion records database which will hold 75TB of data, of which 
> we expect less than 1% to *ever* be queried.

Well, that's exactly the use case I'm writing for. I called that an
archival data store in my post on VLDB Features.

WORM is a type of storage that might be used, so it would be somewhat
confusing if we use it as the name of a specific use case. 

Getting partitioning/read-only right will allow 70+TB of that to be on
tape or similar, which with compression can be reduced to maybe 20TB? I
don't want to promise any particular compression ratio, but it will make
a substantial difference, as I'm sure you realise.

> One of the other things I'd like to note is that for WORM, conventional 
> storage is never going to approach column-store DBs for general performance.  
> So, should we be working on incremental improvements like the ones you 
> propose, or should we be working on integrating a c-store into PostgreSQL on 
> a per-table basis?

What I'm saying is that there are some features that all VLDBs need. If
we had a column store DB we would still need partitioning as well or the
data structures would become unmanageable. Plus partitioning can allow
the planner to avoid de-archiving/spinning up data and help reduce
storage costs.

Radical can be good, but it can take more time also. I dare say it would
be harder for the community to accept also. So I look for worthwhile
change in acceptable size chunks.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: There's random access and then there's random access
Next
From: Josh Berkus
Date:
Subject: Re: WORM and Read Only Tables (v0.1)