Re: I'd like to discuss scaleout at PGCon - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: I'd like to discuss scaleout at PGCon
Date
Msg-id CAFjFpRepQesCjbQqQBfZKYb88a3M7Uy6VB+NmoUP7cb7QCqu4g@mail.gmail.com
Whole thread Raw
In response to Re: I'd like to discuss scaleout at PGCon  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Wed, Jun 6, 2018 at 11:46 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On 2018-Jun-06, Ashutosh Bapat wrote:
>
>> On Tue, Jun 5, 2018 at 10:04 PM, MauMau <maumau307@gmail.com> wrote:
>> > From: Ashutosh Bapat
>> >> In order to normalize parse trees, we need to at least replace
>> >> various OIDs in parse-tree with something that the foreign server
>> >> will understand correctly like table name on the foreign table
>> >> pointed to by local foreign table OR (schema qualified) function
>> >> names and so on.
>> >
>> > Yes, that's the drawback of each node in the cluster having
>> > different OIDs for the same object.  That applies to XL, too.
>>
>> Keeping OIDs same across the nodes would require extra communication
>> between nodes to keep track of next OID, dropped OIDs etc. We need to
>> weigh the time spent in that communication and the time saved during
>> parsing.
>
> We already have the ability to give objects some predetermined OID, for
> pg_upgrade.

True. But that's only for a database not in action. We are talking
about database in action. Assigning a predetermined OID is just one of
and possibly the smallest thing in the bigger picture.

>
> Maybe an easy (hah) thing to do is use 2PC for DDL, agree on a OID
> that's free on every node, then create the object in all servers at the
> same time.  We currently use the system-wide OID generator to assign the
> OID, but seems an easy thing to change (much harder is to prevent
> concurrent creation of objects using the arranged OID; maybe can reuse
> speculative tokens in btrees for this).  Doing this imposes a cost at
> DDL-execution-time only, which seems much better than imposing the cost
> of translating name to OID on every server for every query.

This works if we consider that all the nodes are up always. If a few
nodes are down, the rest of the nodes need to determine the OID and
communicate it to the failed nodes when they come up. That's easier
said than done. The moment we design something like that, we have to
deal with split brain problem. Two sets of nodes which think the other
set is down, will keep assigning OIDs that they think are OK and later
see the conflicts when communicating the assigned OIDs.

Not that we can not implement something like this, but it is a lot of
work. We will need to be careful to identify the cases where the
scheme will fail and plug all the holes.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: computing completion tag is expensive for pgbench -S -M prepared
Next
From: Ashutosh Bapat
Date:
Subject: Re: Remove mention in docs that foreign keys on partitioned tablesare not supported