Re: Weird procedure question - Mailing list pgsql-general
From | digimer |
---|---|
Subject | Re: Weird procedure question |
Date | |
Msg-id | d9220a5e-cb04-ef4d-d4a7-cce5f22c9124@alteeve.ca Whole thread Raw |
In response to | Re: Weird procedure question (digimer <lists@alteeve.ca>) |
List | pgsql-general |
On 2018-09-25 6:41 p.m., digimer wrote: > On 2018-09-25 6:22 p.m., Tim Cross wrote: >> digimer <lists@alteeve.ca> writes: >> >>> Hi all, >>> >>> I've got an interesting use case that I am stuck on. It's a bit of a >>> complicated environment, but I'll try to keep it simple. >>> >>> In short; I have a history schema that has tables that match the >>> public schema, plus one 'history_id' column that has a simple >>> sequential >>> bigserial value. Then I have a procedure and trigger that, on UPDATE or >>> INSERT, copies the data to history. Example use case is that I can >>> UPDATE a sensor value in the public table and it's also INSERTs the >>> data >>> into history. So public shows just the most recent values, but I can >>> see >>> changes over time in the history schema. >>> >>> I have built my system to support writing to one or more DBs. I keep >>> a list of connected DBs and send INSERT/UPDATE calls to a method that >>> then runs the UPDATE/INSERT against all connected databases, as a form >>> of redundancy. This all works fine. >>> >>> The problem I've hit is that the 'history_id' differs between the >>> various databases. So I want to switch this to 'history_uuid' and use >>> UUIDs instead of bigserial. >>> >>> Now the question; >>> >>> Can I tell a produce to use a specific UUID? >>> >>> The idea is to generate a UUID for 'history_uuid' so that I have >>> consistency across databases. Of course, if an UPDATE will change >>> multiple rows, then I'll need to predefine multiple UUIDs. This is >>> where >>> things start to get really complicated I think... Maybe I could pass an >>> array of UUIDs? I don't care if I find out which UUID was used for >>> which >>> record, just that the same UUID was used for the same record when the >>> procedure is (re)run on other DBs. >>> >>> The databases are not clustered, on purpose. I've been trying to >>> handle all the HA stuff in my application for various reasons. >>> >>> If it helps, here is an example pair of tables, the procedure and the >>> trigger I currently use; >>> >>> ==== >>> CREATE TABLE host_variable ( >>> host_variable_uuid uuid not null primary key, >>> host_variable_host_uuid uuid not null, >>> host_variable_name text not null, >>> host_variable_value text not null, >>> modified_date timestamp with time zone not null >>> ); >>> ALTER TABLE host_variable OWNER TO admin; >>> >>> CREATE TABLE history.host_variable ( >>> history_id bigserial, >>> host_variable_uuid uuid, >>> host_variable_host_uuid uuid, >>> host_variable_name text, >>> host_variable_value text, >>> modified_date timestamp with time zone not null >>> ); >>> ALTER TABLE history.host_variable OWNER TO admin; >>> >>> CREATE FUNCTION history_host_variable() RETURNS trigger >>> AS $$ >>> DECLARE >>> history_host_variable RECORD; >>> BEGIN >>> SELECT INTO history_host_variable * FROM host_variable WHERE >>> host_uuid = new.host_uuid; >>> INSERT INTO history.host_variable >>> (host_variable_uuid, >>> host_variable_host_uuid, >>> host_variable_name, >>> host_variable_value, >>> modified_date) >>> VALUES >>> (history_host_variable.host_variable_uuid, >>> history_host_variable.host_variable_host_uuid, >>> history_host_variable.host_variable_name, >>> history_host_variable.host_variable_value, >>> history_host_variable.modified_date); >>> RETURN NULL; >>> END; >>> $$ >>> LANGUAGE plpgsql; >>> ALTER FUNCTION history_host_variable() OWNER TO admin; >>> >>> CREATE TRIGGER trigger_host_variable >>> AFTER INSERT OR UPDATE ON host_variable >>> FOR EACH ROW EXECUTE PROCEDURE history_host_variable(); >>> ==== >>> >>> I know this might sound odd, but I didn't want to complicate things >>> with how my system works. However, if it would help solve the problem, >>> I'm happy to dig into more detail. >>> >>> Thanks! >> I think James has probably given you the input you need - basically, >> don't allow the system to automatically set the modified time - make >> that parameter to your function or set that value before the copy to the >> history tables - content would then be the same, so uuid v3 should work. >> >> However, I do think you have another big problem lurking in the >> shadows. What happens if any of your connected databases are unavailable >> or unreachable for a period of time? I suspect your going to run into >> update anomalies and depending on your setup/environment, possibly even >> partitioning problems (depending on number of clients and typology >> etc). These are well known problems in distributed or replication >> systems. >> >> You appear to be implementing a 'poor mans' replication system. There >> are lots of complex issues to deal with and I wonder why you want to >> take them on when PG has already got well tested and robust solutions >> for this that would simplify your architecture and avoid the need to >> re-implement functionality which already exists? >> >> regards, >> >> Tim >> > Hi Tim, > > Last I checked, pgsql couldn't handle this; > > Two DBs up, getting data. > DB1 goes down, DB2 continues to collect data. > DB2 goes down > DB1 comes back up, starts collecting data. > DB2 comes back up, now I need to move data in both directions (DB1 has > data 2 doesn't and vice-versa). > > I've created a way to resolve this in my application and it's worked > for some time (obviously, in my application only. It's not a general > purpose system nor is it intended to be). > > For the record, I realized I was looking for a complex solution to a > simple problem. I do create the 'modified_date' value in my app, and I > just needed to refresh it between UPDATEs/INSERTs on the same column > so that no two records in the history table have the same > 'modified_date'. With that, my resync works again. > > Cheers, > > digimer I should mention, I used two DBs in the example, but it could be 3 or more, and I need to resync in as many directions at once as needed. So it's a "poor person's" N-way replication cluster, as you guessed. I'd love to use existing tools if they actually exist. digimer
pgsql-general by date: