Home > mailing lists

Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: I'd like to discuss scaleout at PGCon
Date	June 7, 2018 08:09:09
Msg-id	CAFjFpRepQesCjbQqQBfZKYb88a3M7Uy6VB+NmoUP7cb7QCqu4g@mail.gmail.com Whole thread Raw
In response to	Re: I'd like to discuss scaleout at PGCon (Alvaro Herrera <alvherre@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Wed, Jun 6, 2018 at 11:46 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On 2018-Jun-06, Ashutosh Bapat wrote:
>
>> On Tue, Jun 5, 2018 at 10:04 PM, MauMau <maumau307@gmail.com> wrote:
>> > From: Ashutosh Bapat
>> >> In order to normalize parse trees, we need to at least replace
>> >> various OIDs in parse-tree with something that the foreign server
>> >> will understand correctly like table name on the foreign table
>> >> pointed to by local foreign table OR (schema qualified) function
>> >> names and so on.
>> >
>> > Yes, that's the drawback of each node in the cluster having
>> > different OIDs for the same object.  That applies to XL, too.
>>
>> Keeping OIDs same across the nodes would require extra communication
>> between nodes to keep track of next OID, dropped OIDs etc. We need to
>> weigh the time spent in that communication and the time saved during
>> parsing.
>
> We already have the ability to give objects some predetermined OID, for
> pg_upgrade.

True. But that's only for a database not in action. We are talking
about database in action. Assigning a predetermined OID is just one of
and possibly the smallest thing in the bigger picture.

>
> Maybe an easy (hah) thing to do is use 2PC for DDL, agree on a OID
> that's free on every node, then create the object in all servers at the
> same time.  We currently use the system-wide OID generator to assign the
> OID, but seems an easy thing to change (much harder is to prevent
> concurrent creation of objects using the arranged OID; maybe can reuse
> speculative tokens in btrees for this).  Doing this imposes a cost at
> DDL-execution-time only, which seems much better than imposing the cost
> of translating name to OID on every server for every query.

This works if we consider that all the nodes are up always. If a few
nodes are down, the rest of the nodes need to determine the OID and
communicate it to the failed nodes when they come up. That's easier
said than done. The moment we design something like that, we have to
deal with split brain problem. Two sets of nodes which think the other
set is down, will keep assigning OIDs that they think are OK and later
see the conflicts when communicating the assigned OIDs.

Not that we can not implement something like this, but it is a lot of
work. We will need to be careful to identify the cases where the
scheme will fail and plug all the holes.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

From: David Rowley
Date: 07 June 2018, 08:01:47
Subject: Re: computing completion tag is expensive for pgbench -S -M prepared

From: Ashutosh Bapat
Date: 07 June 2018, 08:17:19
Subject: Re: Remove mention in docs that foreign keys on partitioned tablesare not supported

Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

Previous

Next