Home > mailing lists

Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From	Sumanta Mukherjee
Subject	Re: I'd like to discuss scaleout at PGCon
Date	June 22, 2020 04:53:33
Msg-id	CAMSJAirGtacYGkdUV=0nEYt11LAtb8_v99cOVR3mBKB8LB2N0A@mail.gmail.com Whole thread Raw
In response to	RE: I'd like to discuss scaleout at PGCon ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses	Re: I'd like to discuss scaleout at PGCon Re: I'd like to discuss scaleout at PGCon
List	pgsql-hackers

Tree view

Hi,

I read through the symfora paper and it is a nice technique. I am not very sure about where Hyder is used commercially but given that it has come out of Microsoft Research so some microsoft products might be using it/some of these concepts already.

With Regards,

Sumanta Mukherjee.

EnterpriseDB: http://www.enterprisedb.com

On Wed, Jun 17, 2020 at 9:38 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:

Hello,

It seems you didn't include pgsql-hackers.

From: Sumanta Mukherjee <sumanta.mukherjee@enterprisedb.com>
> I saw the presentation and it is great except that it seems to be unclear of both SD and SN if the storage and the compute are being explicitly separated. Separation of storage and compute would have some cost advantages as per my understanding. The following two work (ref below) has some information about the usefulness of this technique for scale out and so it would be an interesting addition to see if in the SN architecture that is being proposed could be modified to take care of this phenomenon and reap the gain.

Thanks. Separation of compute and storage is surely to be considered. Unlike the old days when the shared storage was considered to be a bottleneck with slow HDDs and FC-SAN, we could now expect high speed shared storage thanks to flash memory, NVMe-oF, and RDMA.

> 1. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder - A
> Transactional Record Manager for Shared Flash. In CIDR 2011.

This is interesting. I'll go into this. Do you know there's any product based on Hyder? OTOH, Hyder seems to require drastic changes when adopting for Postgres -- OCC, log-structured database, etc. I'd like to hear how feasible those are. However, its scale-out capability without the need for data or application partitioning appears appealing.

To explore another possibility that would have more affinity with the current Postgres, let me introduce our proprietary product called Symfoware. It's not based on Postgres.

It has shared nothing scale-out functionality with full ACID based on 2PC, conventional 2PL locking and distributed deadlock resolution. Despite being shared nothing, all the database files and transaction logs are stored on shared storage.

The database is divided into "log groups." Each log group has one transaction log and multiple tablespaces (it's called "database space" instead of tablespace.)

Each DB instance in the cluster owns multiple log groups, and handles reads/writes to the data in its owning log groups. When a DB instance fails, other surviving DB instances take over the log groups of the failed DB instance, recover the data using the transaction log of the log group, and accepts reads/writes to the data in the log group. The DBA configures which DB instance initially owns which log groups and which DB instances are candidates to take over which log groups.

This way, no server is idle as a standby. All DB instances work hard to process read-write transactions. This "no idle server for HA" is one of the things Oracle RAC users want in terms of cost.

However, it still requires data and application partitioning unlike Hyder. Does anyone think of a way to eliminate partitioning? Data and application partitioning is what Oracle RAC users want to avoid or cannot tolerate.

Ref: Introduction of the Symfoware shared nothing scale-out called "load share."
https://pdfs.semanticscholar.org/8b60/163593931cebc58e9f637cfb501500230adc.pdf

Regards
Takayuki Tsunakawa

--- below is Sumanta's original mail ---
From: Sumanta Mukherjee <sumanta.mukherjee@enterprisedb.com>
Sent: Wednesday, June 17, 2020 5:34 PM
To: Tsunakawa, Takayuki/綱川貴之 <tsunakawa.takay@fujitsu.com>
Cc: Bruce Momjian <bruce@momjian.us>; Merlin Moncure <mmoncure@gmail.com>; Robert Haas <robertmhaas@gmail.com>; maumau307@gmail.com
Subject: Re: I'd like to discuss scaleout at PGCon

Hello,

I saw the presentation and it is great except that it seems to be unclear of both SD and SN if the storage and the compute are being explicitly separated. Separation of storage and compute would have some cost advantages as per my understanding. The following two work (ref below) has some information about the usefulness of this technique for scale out and so it would be an interesting addition to see if in the SN architecture that is being proposed could be modified to take care of this phenomenon and reap the gain.

1. Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder - A
Transactional Record Manager for Shared Flash. In CIDR 2011.

2. Dhruba Borthakur. 2017. The Birth of RocksDB-Cloud. http://rocksdb.
blogspot.com/2017/05/the-birth-of-rocksdb-cloud.html.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Michael Paquier
Date: 22 June 2020, 04:48:11
Subject: Re: tag typos in "catalog.sgml"

From: David Rowley
Date: 22 June 2020, 04:54:22
Subject: Re: Parallel Seq Scan vs kernel read ahead

Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

Previous

Next