Re: alternative back-end block formats - Mailing list pgsql-hackers

From Christian Convey
Subject Re: alternative back-end block formats
Date
Msg-id CAPfS4ZzwxnQuYjEBnmd0eiYW3t85o4YOvGXfqK=AcNOgKc77rQ@mail.gmail.com
Whole thread Raw
In response to Re: alternative back-end block formats  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: alternative back-end block formats  (Cédric Villemain <cedric@2ndquadrant.com>)
List pgsql-hackers
Hi Craig,

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 01/21/2014 07:43 PM, Christian Convey wrote:
> Hi all,
>
> I'm playing around with Postgres, and I thought it might be fun to
> experiment with alternative formats for relation blocks, to see if I can
> get smaller files and/or faster server performance.

It's not clear how you'd do this without massively rewriting the guts of Pg.

Per the docs on internal structure, Pg has a block header, then tuples
within the blocks, each with a tuple header and list of Datum values for
the tuple. Each Datum has a generic Datum header (handling varlena vs
fixed length values etc) then a type-specific on-disk representation
controlled by the type output function for that type.

I'm still in the process of getting familiar with the pg backend code, so I don't have a concrete plan yet.  However, I'm working on the assumption that some set of macros and functions encapsulates the page layout.  

If/when I tackle this, I expect to add a layer of indirection somewhere around that boundary, so that some non-catalog tables, whose schemas meet certain simplifying assumptions, are read and modified using specialized code.
 
I don't want to get into the specific optimizations I'd like to try, only because I haven't fully studied the code yet, so I don't want to put my foot in my mouth.

What concrete problem do you mean to tackle? What idea do you want to
explore or implement?

My real motivation is that I'd like to get more familiar with the pg backend codebase, and tilting at this windmill seemed like an interesting way to accomplish that.

If I was focused on really solving a real-world problem, I'd say that this lays the groundwork for table-schema-specific storage optimizations and optimized record-filtering code.  But I'd only make that argument if I planned to (a) perform a careful study with statistically significant benchmarks, and/or (b) produce a merge-worthy patch.  At this point I have no intentions of doing so.  My main goal really is just to have fun with the code.


> Does anyone know if this has been done before with Postgres?  I would
> have assumed yes, but I'm not finding anything in Google about people
> having done this.

AFAIK (and I don't know much in this area) the storage manager isn't
very pluggable compared to the rest of Pg.

Thanks for the warning.  Duly noted.

Kind regards,
Christian

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [PATCH] Support for pg_stat_archiver view
Next
From: Josh Berkus
Date:
Subject: Re: Standalone synchronous master