Thread: Strategy for materialisation and centralisation of data

Strategy for materialisation and centralisation of data

From
Rory Campbell-Lange
Date:
We have quite a few databases of type a and many of type b in a cluster.
Both a and b types are fairly complex and are different solutions to a
similar problem domain. All the databases are very read-centric, and all
database interaction is currently through plpgsql with no materialised
data.

Some organisations have several type a and many type b databases, and
need to query these in a homogeneous manner. We presently do this with
many middleware requests or pl/proxy. An a or b type database belongs to
0 or 1 organisations.

Making a and b generally the same would be a very big project.
Consequently I'm discussing materialising a subset of data in a common
format between the two database types and shipping that data to
organisation databases. This would have the benefit of providing a
common data interface and speeding up queries for all database types.
Users would have faster queries, and it would be a big time saver for
our development team, who presently have to deal with three quite
different data APIs.

Presently I've been thinking of using triggers or materialized views in
each database to materialise data into a "matview" schema which is then
shipped via logical replication to an organisation database when
required. New columns in the matview schema tables would ensure replica
identity uniqueness and allow the data to be safely stored in common
tables in the organisation database.

A few issues I foresee with this approach include:

* requiring two to three times current storage for materialisation
  (the cluster is currently ~250GB)

* having to have many logical replication slots
  (we sometimes suffer from pl/proxy connection storms)

Commentary gratefully received,
Rory






Re: Strategy for materialisation and centralisation of data

From
Karsten Hilbert
Date:
On Thu, May 21, 2020 at 03:35:59PM +0100, Rory Campbell-Lange wrote:


> We have quite a few databases of type a and many of type b in a cluster.
> Both a and b types are fairly complex and are different solutions to a
> similar problem domain. All the databases are very read-centric, and all
> database interaction is currently through plpgsql with no materialised
> data.
>
> Some organisations have several type a and many type b databases, and
> need to query these in a homogeneous manner. We presently do this with
> many middleware requests or pl/proxy. An a or b type database belongs to
> 0 or 1 organisations.

Might postgres_fdw help in any way ?

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Strategy for materialisation and centralisation of data

From
Rory Campbell-Lange
Date:
On 21/05/20, Karsten Hilbert (Karsten.Hilbert@gmx.net) wrote:
> On Thu, May 21, 2020 at 03:35:59PM +0100, Rory Campbell-Lange wrote:
> 
> > We have quite a few databases of type a and many of type b in a cluster.
> > Both a and b types are fairly complex and are different solutions to a
> > similar problem domain. All the databases are very read-centric, and all
> > database interaction is currently through plpgsql with no materialised
> > data.
> >
> > Some organisations have several type a and many type b databases, and
> > need to query these in a homogeneous manner. We presently do this with
> > many middleware requests or pl/proxy. An a or b type database belongs to
> > 0 or 1 organisations.
> 
> Might postgres_fdw help in any way ?

Thanks for the suggestion. As I noted we are already using pl/proxy and
it works well, although we are soaking up a lot of connections with it.
From my reading of the postgres_fdw docs it is much more featureful than
pl/proxy but it is likely to have the same connection characteristics.

The main issues we're trying to solve is standardising data access
through (I think) materialisation and centralisation.

Rory



Re: Strategy for materialisation and centralisation of data

From
Karsten Hilbert
Date:
On Thu, May 21, 2020 at 09:52:02PM +0100, Rory Campbell-Lange wrote:

> > Might postgres_fdw help in any way ?
>
> Thanks for the suggestion. As I noted we are already using pl/proxy and
> it works well, although we are soaking up a lot of connections with it.
> >From my reading of the postgres_fdw docs it is much more featureful than
> pl/proxy but it is likely to have the same connection characteristics.
>
> The main issues we're trying to solve is standardising data access
> through (I think) materialisation and centralisation.

While plausible I still would not be entirely surprised if
testing were to reveal something different.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Strategy for materialisation and centralisation of data

From
Michael Stephenson
Date:
You might find Materialize interesting:


On Thu, May 21, 2020 at 10:36 AM Rory Campbell-Lange <rory@campbell-lange.net> wrote:
We have quite a few databases of type a and many of type b in a cluster.
Both a and b types are fairly complex and are different solutions to a
similar problem domain. All the databases are very read-centric, and all
database interaction is currently through plpgsql with no materialised
data.

Some organisations have several type a and many type b databases, and
need to query these in a homogeneous manner. We presently do this with
many middleware requests or pl/proxy. An a or b type database belongs to
0 or 1 organisations.

Making a and b generally the same would be a very big project.
Consequently I'm discussing materialising a subset of data in a common
format between the two database types and shipping that data to
organisation databases. This would have the benefit of providing a
common data interface and speeding up queries for all database types.
Users would have faster queries, and it would be a big time saver for
our development team, who presently have to deal with three quite
different data APIs.

Presently I've been thinking of using triggers or materialized views in
each database to materialise data into a "matview" schema which is then
shipped via logical replication to an organisation database when
required. New columns in the matview schema tables would ensure replica
identity uniqueness and allow the data to be safely stored in common
tables in the organisation database.

A few issues I foresee with this approach include:

* requiring two to three times current storage for materialisation
  (the cluster is currently ~250GB)

* having to have many logical replication slots
  (we sometimes suffer from pl/proxy connection storms)

Commentary gratefully received,
Rory





Re: Strategy for materialisation and centralisation of data

From
Rory Campbell-Lange
Date:
On 21/05/20, Michael Stephenson (domehead100@gmail.com) wrote:
> On Thu, May 21, 2020 at 10:36 AM Rory Campbell-Lange wrote

> > Presently I've been thinking of using triggers or materialized views in
> > each database to materialise data into a "matview" schema which is then
> > shipped via logical replication to an organisation database when
> > required. New columns in the matview schema tables would ensure replica
> > identity uniqueness and allow the data to be safely stored in common
> > tables in the organisation database.

> You might find Materialize interesting:
> 
> https://materialize.io/

Hi Michael. Thanks for the pointer. Materialize.io seems cool but we
don't deal with streaming data, or am I missing something?

Cheers
Rory