Re: Weird procedure question - Mailing list pgsql-general

From digimer
Subject Re: Weird procedure question
Date
Msg-id d9220a5e-cb04-ef4d-d4a7-cce5f22c9124@alteeve.ca
Whole thread Raw
In response to Re: Weird procedure question  (digimer <lists@alteeve.ca>)
List pgsql-general
On 2018-09-25 6:41 p.m., digimer wrote:
> On 2018-09-25 6:22 p.m., Tim Cross wrote:
>> digimer <lists@alteeve.ca> writes:
>>
>>> Hi all,
>>>
>>>    I've got an interesting use case that I am stuck on. It's a bit of a
>>> complicated environment, but I'll try to keep it simple.
>>>
>>>    In short; I have a history schema that has tables that match the
>>> public schema, plus one 'history_id' column that has a simple
>>> sequential
>>> bigserial value. Then I have a procedure and trigger that, on UPDATE or
>>> INSERT, copies the data to history. Example use case is that I can
>>> UPDATE a sensor value in the public table and it's also INSERTs the
>>> data
>>> into history. So public shows just the most recent values, but I can
>>> see
>>> changes over time in the history schema.
>>>
>>>    I have built my system to support writing to one or more DBs. I keep
>>> a list of connected DBs and send INSERT/UPDATE calls to a method that
>>> then runs the UPDATE/INSERT against all connected databases, as a form
>>> of redundancy. This all works fine.
>>>
>>>    The problem I've hit is that the 'history_id' differs between the
>>> various databases. So I want to switch this to 'history_uuid' and use
>>> UUIDs instead of bigserial.
>>>
>>>    Now the question;
>>>
>>>    Can I tell a produce to use a specific UUID?
>>>
>>>    The idea is to generate a UUID for 'history_uuid' so that I have
>>> consistency across databases. Of course, if an UPDATE will change
>>> multiple rows, then I'll need to predefine multiple UUIDs. This is
>>> where
>>> things start to get really complicated I think... Maybe I could pass an
>>> array of UUIDs? I don't care if I find out which UUID was used for
>>> which
>>> record, just that the same UUID was used for the same record when the
>>> procedure is (re)run on other DBs.
>>>
>>>    The databases are not clustered, on purpose. I've been trying to
>>> handle all the HA stuff in my application for various reasons.
>>>
>>> If it helps, here is an example pair of tables, the procedure and the
>>> trigger I currently use;
>>>
>>> ====
>>> CREATE TABLE host_variable (
>>>    host_variable_uuid uuid not null primary key,
>>>    host_variable_host_uuid uuid not null,
>>>    host_variable_name text not null,
>>>    host_variable_value text not null,
>>>    modified_date timestamp with time zone not null
>>> );
>>> ALTER TABLE host_variable OWNER TO admin;
>>>
>>> CREATE TABLE history.host_variable (
>>>    history_id bigserial,
>>>    host_variable_uuid uuid,
>>>    host_variable_host_uuid uuid,
>>>    host_variable_name text,
>>>    host_variable_value text,
>>>    modified_date timestamp with time zone not null
>>> );
>>> ALTER TABLE history.host_variable OWNER TO admin;
>>>
>>> CREATE FUNCTION history_host_variable() RETURNS trigger
>>> AS $$
>>> DECLARE
>>>    history_host_variable RECORD;
>>> BEGIN
>>>    SELECT INTO history_host_variable * FROM host_variable WHERE
>>> host_uuid = new.host_uuid;
>>>    INSERT INTO history.host_variable
>>>    (host_variable_uuid,
>>>    host_variable_host_uuid,
>>>    host_variable_name,
>>>    host_variable_value,
>>>    modified_date)
>>>    VALUES
>>>    (history_host_variable.host_variable_uuid,
>>>    history_host_variable.host_variable_host_uuid,
>>>    history_host_variable.host_variable_name,
>>>    history_host_variable.host_variable_value,
>>>    history_host_variable.modified_date);
>>>    RETURN NULL;
>>> END;
>>> $$
>>> LANGUAGE plpgsql;
>>> ALTER FUNCTION history_host_variable() OWNER TO admin;
>>>
>>> CREATE TRIGGER trigger_host_variable
>>>    AFTER INSERT OR UPDATE ON host_variable
>>>    FOR EACH ROW EXECUTE PROCEDURE history_host_variable();
>>> ====
>>>
>>>    I know this might sound odd, but I didn't want to complicate things
>>> with how my system works. However, if it would help solve the problem,
>>> I'm happy to dig into more detail.
>>>
>>>    Thanks!
>> I think James has probably given you the input you need - basically,
>> don't allow the system to automatically set the modified time - make
>> that parameter to your function or set that value before the copy to the
>> history tables - content would then be the same, so uuid v3 should work.
>>
>> However, I do think you have another big problem lurking in the
>> shadows. What happens if any of your connected databases are unavailable
>> or unreachable for a period of time? I suspect your going to run into
>> update anomalies and depending on your setup/environment, possibly even
>> partitioning problems (depending on number of clients and typology
>> etc). These are well known problems in distributed or replication
>> systems.
>>
>> You appear to be implementing a 'poor mans' replication system. There
>> are lots of complex issues to deal with and I wonder why you want to
>> take them on when PG has already got well tested and robust solutions
>> for this that would simplify your architecture and avoid the need to
>> re-implement functionality which already exists?
>>
>> regards,
>>
>> Tim
>>
> Hi Tim,
>
>   Last I checked, pgsql couldn't handle this;
>
> Two DBs up, getting data.
> DB1 goes down, DB2 continues to collect data.
> DB2 goes down
> DB1 comes back up, starts collecting data.
> DB2 comes back up, now I need to move data in both directions (DB1 has
> data 2 doesn't and vice-versa).
>
>   I've created a way to resolve this in my application and it's worked
> for some time (obviously, in my application only. It's not a general
> purpose system nor is it intended to be).
>
>   For the record, I realized I was looking for a complex solution to a
> simple problem. I do create the 'modified_date' value in my app, and I
> just needed to refresh it between UPDATEs/INSERTs on the same column
> so that no two records in the history table have the same
> 'modified_date'. With that, my resync works again.
>
> Cheers,
>
> digimer

I should mention, I used two DBs in the example, but it could be 3 or
more, and I need to resync in as many directions at once as needed. So
it's a "poor person's" N-way replication cluster, as you guessed. I'd
love to use existing tools if they actually exist.

digimer



pgsql-general by date:

Previous
From: digimer
Date:
Subject: Re: Weird procedure question
Next
From: Christopher Browne
Date:
Subject: Re: Weird procedure question