Re: Streaming replication and sharding - Mailing list pgsql-general

From René Romero Benavides
Subject Re: Streaming replication and sharding
Date
Msg-id 3838932.rfQQDZ6cEG@beagle
Whole thread Raw
In response to Re: Streaming replication and sharding  (Albe Laurenz <laurenz.albe@wien.gv.at>)
List pgsql-general
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">I'malso interested in leveraging something like that, these are my thoughts:<p
style="-qt-paragraph-type:empty;margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">- Have a master server with all the data<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">-Have the data partitioned vertically (inheritance, exclusion constrains, etc)<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">-One synchronous slave<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- N asynchronous slaves that feed from the
synchronousslave (cascading replication)<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- Use plproxy to enforce that one instance
servesqueries of only one portion of the data (pseudo-sharding). Also with plproxy is possible to parallelize some
queries.<pstyle="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">Advantages:<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> + Highly redundant<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">+ Lends itself to automatic failover to the syncrhonous slave<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> + Read
scalable<pstyle=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;"> + No synchronization conflicts among "shards"<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> + Some
queriescould be parallelized<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Disadvantages:<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> + Plproxy
worksonly with functions<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;"> + Writes can be performed only in the master ( not write
scalable)<pstyle=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;"> + Needs good communication infrastructure<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> + Table
structuresneed to be simple.<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;"> + At some point (failover), you'll need to implement plproxy
automaticre-configuration<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">On Monday, February 18, 2013 08:00:33 AM Albe
Laurenzwrote:<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;">> Tiemo Kieft wrote:<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> We are developing an application that uses various web analytics packages<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> (like Google Analytics) to<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> run analyses on. We are currently in closed<p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> beta stadium where we don't have a lot of data in the database, but at<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> some point it will grow considerably.<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > <p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> We basically have two different sets of data, on the one hand we have raw<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> metrics from the datasource,<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> and on the other hand we have account and<p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> meta information. The former can be re-downloaded at any time, and will<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> grow to quite large sizes. The latter set is the one that we really care<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> about, and don't want to risk losing.<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > <p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> Currently we plan on using streaming replication to replicate all data to<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> at least one slave, for the<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> near future this will do, since we can run<p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> some of the large (read-only) aggregation queries on the slave database.<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> In the future the dataset might grow to the point where we need to start<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> thinking about sharding. The analytics data can be sharded on a<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> >
per-customerbasis, and doesn't have to be replicated.<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > <p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> >
SincePostgres doesn't support per-table streaming replication (as far as<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > I can tell), the
only<pstyle=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0;
text-indent:0px;-qt-user-state:0;"> solution would be to run two separate instances of<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> >
postgresper server. One instance is replicated to all servers, and will<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > contain account and
otherimportant information. The other instance is<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > used to store analytics data. Is
thisa viable way of solving this<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> > problem, or are we overlooking something? <p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> The problem is not really immediate, as the dataset is currently small<p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> enough to fit on one machine<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> (and replicated to a second), just want to<p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>> be future proof, and get this solved before the problems start.<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> The problems I see with distributing your data across<p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>several PostgreSQL clusters is that they become disconnected.<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>It will become much more difficult to keep them consistent:<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>You cannot have referential integrity, and if a database restore<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> is
needed,you have to make extra provisions that you can restore<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> your system to a
consistentstate across all clusters.<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> <p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> You also lose the
abilityto join between tables that are<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> distributed across different databases. This can be a<p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>perdormance problem, particularly in an OLAP scenario.<p style=" margin-top:0px;
margin-bottom:0px;margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p
style="margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>That's all I can think of at the moment.<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> <p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">>Yours,<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; -qt-user-state:0;">> Laurenz Albe<p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"><br />-- <p style="
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;
-qt-user-state:0;">RenéRomero Benavides<span style=" font-family:'liberation sans';"> </span><a
href="https://twitter.com/iCodeiExist"><spanstyle=" font-family:'liberation sans'; text-decoration: underline;
color:#0057ae;">@iCodeiExist</span></a><a href="https://twitter.com/PgsqlMx"><span style=" text-decoration: underline;
color:#0057ae;">@PgsqlMx</span></a><pstyle="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; "> <p style=" margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Postgresql Tips en español
parala comunidad de México e Hispanoamérica.<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px;
margin-right:0px;-qt-block-indent:0; text-indent:0px; -qt-user-state:0;"><a href="http://postgresql.org.mx"><span
style="text-decoration: underline; color:#0057ae;">http://postgresql.org.mx</span></a><p
style="-qt-paragraph-type:empty;margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; "> <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; "> <p style="-qt-paragraph-type:empty;
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; "> <p
style="-qt-paragraph-type:empty;margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px;
-qt-block-indent:0;text-indent:0px; "> <p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px;
margin-left:0px;margin-right:0px; -qt-block-indent:0; text-indent:0px; "> <p style="-qt-paragraph-type:empty;
margin-top:0px;margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">  

pgsql-general by date:

Previous
From: Seref Arikan
Date:
Subject: Re: What happens if I create new threads from within a postgresql function?
Next
From: "René Romero Benavides"
Date:
Subject: Re: Streaming replication and sharding