Re: [PATCH 16/16] current version of the design document - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH 16/16] current version of the design document
Date
Msg-id 201206131803.15123.andres@2ndquadrant.com
Whole thread Raw
In response to Re: [PATCH 16/16] current version of the design document  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
Hi,

On Wednesday, June 13, 2012 05:39:36 PM Merlin Moncure wrote:
> On Wed, Jun 13, 2012 at 9:40 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> >> Let's take the case where I have N small-ish schema identical database
> >> shards that I want to aggregate into a single warehouse -- something
> >> that HS/SR currently can't do.
> >> There's a lot of ways to do that obviously but assuming the warehouse
> >> would have to have a unique schema, could it be done in your
> >> architecture?
> > 
> > Not sure what you mean by the warehouse having a unique schema? It has
> > the same schema as the OLTP counterparts? That would obviously be the
> > easy case if you take care and guarantee uniqueness of keys upfront.
> > That basically would be trivial ;)
> 
> by unique I meant 'not the same as the shards' -- presumably this
> would mean one of
> a) each shard's data would be in a private schema folder
> or
> b) you'd have one set of tables but decorated with an extra shard
> identifying column that would to be present in all keys to get around
> uniqueness issues
I think it would have to mean a) and that you have N of those logical import 
processes hanging around. We really need an identical TupleDesc to do the 
decoding.

> > It gets a bit more complex if you need to transform the data for the
> > warehouse. I don't plan to put in work to make that possible without some
> > C coding (filling out the callbacks and doing the work in there). It
> > shouldn't need much though.
> > 
> > Does that answer your question?
> yes.  Do you envision it would be possible to wrap the ApplyCache
> callbacks in a library that could be exposed as an extension?  For
> example, a library that would stick the replication data into a queue
> that a userland (non C) process could walk, transform, etc?   I know
> that's vague -- my general thrust here is that I find the
> transformation features particularly interesting and I'm wondering how
> much C coding would be needed to access them in the long term.
I can definitely imagine the callbacks calling some wrapper around a higher-
level language. Not sure how that fits into an extension (if you mean it as in 
CREATE EXTENSION) though. I don't think you will be able to start the 
replication process from inside a normal backend. I imagine something like 
specifying a shared object + parameters in the config or such.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Next
From: Andres Freund
Date:
Subject: Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture