Re: Raw devices vs. Filesystems - Mailing list pgsql-admin

From Murthy Kambhampaty
Subject Re: Raw devices vs. Filesystems
Date
Msg-id 3554BFA5FAE4B14FB5459D36E4F4E36203D956@io.goeci.com
Whole thread Raw
In response to Raw devices vs. Filesystems  ("Jaime Casanova" <el_vigia_ec@hotmail.com>)
List pgsql-admin
On Wednesday, April 07, 2004 1:26 AM Tom Lane wrote:

>
> But to get back to the point of this discussion: to allow PG
> to use raw devices instead of filesystems, we'd first have to do a ton of
> portability work
...

[The following is said in a low, tentative voice :) ]

I wonder if writing the postgresql data structures as HDF5 data structures (http://hdf.ncsa.uiuc.edu/whatishdf5.html)
withina single HDF5 file (perhaps the WAL files would still reside elsewhere) would improve performance while allowing
HDF5to handle portability, and other useful features, is a better solution than the relying on filesystem features. 

HDF5 actually provides an added portability advantage that postgresql does not currently enjoy:
"a completely portable file format, so that a file can be written on any system and read on any other"
(See http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf).
The HDF5 "distribution" includes tools for dumping data structures, etc. so if you're hooked on filesystem level
operations,you have the ability to inspect postgresql data structures within the HDF5 file, i.e., "outside postgresql". 

HDF5's is also designed for clustered/grid computing systems:
"The HDF5 format and library provide a powerful means of organizing and accessing data in a manner that allows
scientiststo share, process, and manipulate data in today's heterogeneous and quickly-evolving high-performance
computationalenvironment, including the emerging computational GRIDs."
(http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf,p. 3). 
So, the main purpose of this post is to suggest that HDF5's design moves a postgresql version built on a HDF5 datastore
thatmuch closer to being ready for cluster-computing environments, with respect to the datastore (there's still the
sharedmemory, etc., that need to be addressed, but ...). 

We're playing with HDF5 from Python (see the pytables project) for our "analytics" work, but that requires moving data
outof postgresql. I suspect that an SQL interface to HDF5 data structures using postgresql would be a lot more
convenient,and that postgresql would gain multiple benefits from having all its data structures in a single HDF5 file.
OTOH,maybe us analytics types are better off with Python over HDF5 and "postgresql on HDF5" is not a net win for
postgresql.Still, there seems to a great advantage to having rich data structures to operate on rather than just
"files",and allowing the HDF5 library to deal with portability, I/O efficiency, and clustering. 

Hope my $0.02 worth was.

Cheers,
    Murthy

pgsql-admin by date:

Previous
From: Steve Atkins
Date:
Subject: Re: [PERFORM] Raw devices vs. Filesystems
Next
From: Peter Eisentraut
Date:
Subject: Re: installation problem