Thread: Sync production DB with development?

Sync production DB with development?

From
Israel Brewster
Date:
I am currently doing periodic syncs of one of my production databases to my development database using the command pg_dump -ch <production host> <database name> | psql <database name>, run on my development server. This works well enough, but as the size of the production database grows, this command is, for obvious reasons, getting progressively slower (a full sync like this currently takes about 35 seconds). Is there a better way? Something that will only transfer records that are different on the production server, like rsync does for files perhaps?

My main concern here is the time it takes to sync, given that the database size will only continue growing as time passes (unless I start implementing an archive at some point). The current database has two years worth of records. I would assume that the time the sync takes would grow roughly linearly with the number of records, so I could easily be over a minute of sync time in another two years. I would really rather not have to wait several minutes every time I want to update my development data.
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------



Attachment

Re: Sync production DB with development?

From
Emanuel Calvo
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512



El 08/10/14 a las 14:01, Israel Brewster escibió:
> I am currently doing periodic syncs of one of my production
> databases to my development database using the command pg_dump -ch
> <production host> <database name> | psql <database name>, run on my
> development server. This works well enough, but as the size of the
> production database grows, this command is, for obvious reasons,
> getting progressively slower (a full sync like this currently takes
> about 35 seconds). Is there a better way? Something that will only
>  transfer records that are different on the production server, like
> rsync does for files perhaps?

You can setup a streaming server, however I wont' recommend to sync
from a production server.

Usually there is no need to have *all* the data from prod to
development. Both environments should be isolated for security reasons.

Other thing is to implement a QA server, streaming from the master or
taking a nightly snapshot with pg_basebackup. I think it could be more
than enough.

Actually, doing pg_dump | psql could take more time than pg_basebackup.

>
> My main concern here is the time it takes to sync, given that the
> database size will only continue growing as time passes (unless I
> start implementing an archive at some point). The current database
> has two years worth of records. I would assume that the time the
> sync takes would grow roughly linearly with the number of records,
> so I could easily be over a minute of sync time in another two
> years. I would really rather not have to wait several minutes every
> time I want to update my development data.

Which is the entire size of your production cluster?


- --
- --
Emanuel Calvo  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Bs. As., Argentina (GMT-3)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCgAGBQJUNXSfAAoJEIBeI/HMagHmUHwQALpvwm44MJnDPOIGiPInZNRB
7DghWZD/JY4HbO1sKFMneXTJuNAbnNgE3MVyPnuqBnji1X0GyyDb0/NHW7yee7kd
3g+/FcryUAdY0WLuiuezcvc92eMrQTqE0eVEkM8WhlwFdmog3vbln1BeOYdMN/MF
mrOjCBZRPYbIHwk6+hGN+C/OvE4e6n91+kN+J00Ga3+oD+LZxUZTzjY2ywh5d7yd
HYsu2V2S508IQVISfoOR/skWWIWGr43ouzyPauJ/lSu/H0Jgor9c0ZjBeQUu6hsh
N6i48yY80VRDC1F5EoEg5NVzzaPcye7r3IgEp2Yx3dsvVw+sF71P/1dqJNg9K1I/
GplwEVn0Uq7X5ld8crMMqmzLdGCQXXl1CJlLa6g593SzhjWsqeC+Fj1bFJnspNB6
xrEolLC1r+ZG7DTIRr4fmomk702/X8NDJPd4FcJ8G5lOm3keJz3yygmrP1X6GVYt
UPj4p+DnwlikQqDIbLAzSAKoj/pGyya2GtadUwqQkzLYGELTNU2UR5yVcwIr/Wi5
oMA5iNqxHwrZoPzMaFQzvK+e2IHQSe/IH1EVqrqtOy+FqQLvNgABtrBrEjcy3JLq
2jL0tA9yNuixncv6JBe8kFXJ7+gwpdrGG69YGSq74B1/IzoMsWo9L+eIv1x4YnBw
xDGMjU/lhV7A9MagRZa6
=g73R
-----END PGP SIGNATURE-----


Re: Sync production DB with development?

From
Israel Brewster
Date:
On Oct 8, 2014, at 9:30 AM, Emanuel Calvo <emanuel.calvo@2ndquadrant.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
>
>
> El 08/10/14 a las 14:01, Israel Brewster escibió:
>> I am currently doing periodic syncs of one of my production
>> databases to my development database using the command pg_dump -ch
>> <production host> <database name> | psql <database name>, run on my
>> development server. This works well enough, but as the size of the
>> production database grows, this command is, for obvious reasons,
>> getting progressively slower (a full sync like this currently takes
>> about 35 seconds). Is there a better way? Something that will only
>> transfer records that are different on the production server, like
>> rsync does for files perhaps?
>
> You can setup a streaming server, however I wont' recommend to sync
> from a production server.

No, that wouldn't work well, because I need full access to my development server, and I need to be able to NOT have it
insync while I am working on it. 

>
> Usually there is no need to have *all* the data from prod to
> development. Both environments should be isolated for security reasons.

Agreed. and no, I don't need all the data. But pg_dump doesn't give me an option to, say, only grab the last week of
data.

>
> Other thing is to implement a QA server, streaming from the master or
> taking a nightly snapshot with pg_basebackup. I think it could be more
> than enough.

A QA server is great, and nightly snapshots are probably fine for that, however it doesn't help with my development
server,I don't think. 

>
> Actually, doing pg_dump | psql could take more time than pg_basebackup.
>
>>
>> My main concern here is the time it takes to sync, given that the
>> database size will only continue growing as time passes (unless I
>> start implementing an archive at some point). The current database
>> has two years worth of records. I would assume that the time the
>> sync takes would grow roughly linearly with the number of records,
>> so I could easily be over a minute of sync time in another two
>> years. I would really rather not have to wait several minutes every
>> time I want to update my development data.
>
> Which is the entire size of your production cluster?

At the moment, only about 538MB. Which I realize isn't all that large in the grand scheme of databases.

>
>
> - --
> - --
> Emanuel Calvo  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> Bs. As., Argentina (GMT-3)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
> Comment: GPGTools - http://gpgtools.org
>
> iQIcBAEBCgAGBQJUNXSfAAoJEIBeI/HMagHmUHwQALpvwm44MJnDPOIGiPInZNRB
> 7DghWZD/JY4HbO1sKFMneXTJuNAbnNgE3MVyPnuqBnji1X0GyyDb0/NHW7yee7kd
> 3g+/FcryUAdY0WLuiuezcvc92eMrQTqE0eVEkM8WhlwFdmog3vbln1BeOYdMN/MF
> mrOjCBZRPYbIHwk6+hGN+C/OvE4e6n91+kN+J00Ga3+oD+LZxUZTzjY2ywh5d7yd
> HYsu2V2S508IQVISfoOR/skWWIWGr43ouzyPauJ/lSu/H0Jgor9c0ZjBeQUu6hsh
> N6i48yY80VRDC1F5EoEg5NVzzaPcye7r3IgEp2Yx3dsvVw+sF71P/1dqJNg9K1I/
> GplwEVn0Uq7X5ld8crMMqmzLdGCQXXl1CJlLa6g593SzhjWsqeC+Fj1bFJnspNB6
> xrEolLC1r+ZG7DTIRr4fmomk702/X8NDJPd4FcJ8G5lOm3keJz3yygmrP1X6GVYt
> UPj4p+DnwlikQqDIbLAzSAKoj/pGyya2GtadUwqQkzLYGELTNU2UR5yVcwIr/Wi5
> oMA5iNqxHwrZoPzMaFQzvK+e2IHQSe/IH1EVqrqtOy+FqQLvNgABtrBrEjcy3JLq
> 2jL0tA9yNuixncv6JBe8kFXJ7+gwpdrGG69YGSq74B1/IzoMsWo9L+eIv1x4YnBw
> xDGMjU/lhV7A9MagRZa6
> =g73R
> -----END PGP SIGNATURE-----
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: Sync production DB with development?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> so I could easily be over a minute of sync time in another two years.
> I would really rather not have to wait several minutes every time I
> want to update my development data.

A minute is really not that long of a wait, especially given the tradeoff
in complexity. Still, if the majority of the time is spent moving old
data from one or more tables, you could exclude those from the pg_dump
with -T, then copy over some small subset of the table with a pair of
COPY commands from prod to dev.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201410081635
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlQ1oBIACgkQvJuQZxSWSsjSxgCgjhcAvjgoBgpYA2FEKiKovSos
l/QAn1tdZk69ku8Z1LArrFzESopr1/OB
=l59M
-----END PGP SIGNATURE-----



Re: Sync production DB with development?

From
Emanuel Calvo
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512



El 08/10/14 a las 17:24, Israel Brewster escibió:
> On Oct 8, 2014, at 9:30 AM, Emanuel Calvo
> <emanuel.calvo@2ndquadrant.com> wrote:
>
>
>
> El 08/10/14 a las 14:01, Israel Brewster escibió:
>>>> I am currently doing periodic syncs of one of my production
>>>> databases to my development database using the command
>>>> pg_dump -ch <production host> <database name> | psql
>>>> <database name>, run on my development server. This works
>>>> well enough, but as the size of the production database
>>>> grows, this command is, for obvious reasons, getting
>>>> progressively slower (a full sync like this currently takes
>>>> about 35 seconds). Is there a better way? Something that will
>>>> only transfer records that are different on the production
>>>> server, like rsync does for files perhaps?
>
> You can setup a streaming server, however I wont' recommend to
> sync from a production server.
>
>> No, that wouldn't work well, because I need full access to my
>> development server, and I need to be able to NOT have it in sync
>> while I am working on it.
>
>

That's the issue: streaming will left your development server as
read-only and I guess you don't want to do that.

> Usually there is no need to have *all* the data from prod to
> development. Both environments should be isolated for security
> reasons.
>
>> Agreed. and no, I don't need all the data. But pg_dump doesn't
>> give me an option to, say, only grab the last week of data.
>

Yeah, at this point you will need to build your own sample data using
production values.

>
> Other thing is to implement a QA server, streaming from the master
> or taking a nightly snapshot with pg_basebackup. I think it could
> be more than enough.
>
>> A QA server is great, and nightly snapshots are probably fine for
>> that, however it doesn't help with my development server, I don't
>> think.
>

Well, the nightly snapshots will allow you to sync every night your
development server and write over it during the day (you can sync
many times as you want).

>
> Actually, doing pg_dump | psql could take more time than
> pg_basebackup.
>
>>>>
>>>> My main concern here is the time it takes to sync, given that
>>>> the database size will only continue growing as time passes
>>>> (unless I start implementing an archive at some point). The
>>>> current database has two years worth of records. I would
>>>> assume that the time the sync takes would grow roughly
>>>> linearly with the number of records, so I could easily be
>>>> over a minute of sync time in another two years. I would
>>>> really rather not have to wait several minutes every time I
>>>> want to update my development data.
>
> Which is the entire size of your production cluster?
>
>> At the moment, only about 538MB. Which I realize isn't all that
>> large in the grand scheme of databases.
>

I think that at this point you shouldn't be worried at all.

Usually, you can build sample data DB and using it as a template.
So, any time you want to drp and recreate your database, you just do
a *create database* with the template option.


- --
- --
Emanuel Calvo  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Bs. As., Argentina (GMT-3)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCgAGBQJUNaRnAAoJEIBeI/HMagHm5sEQALsfKrf3UHoVAr3N9EQfvlza
Ds7ZXue48Mt63FHLlsscC/lsJlfublJXwnV/7H9aFgviVDgWRqpjfrtFWk/5WzXn
g0c6zbjB13uc5K50OQqeFjo4Sb1UMZqSInGLDa3wi2BCT8XQepUQk60Hy9YJo449
2VkibVrJPulEJN2pviL7nlHNQoW3A2KHne9uEc3sdVDtAPgXrbB45BW184Qgpc34
r2ReAqM8S533lTe/ZfXpn6ZUru8uJHXnwkRirt8HOelXeIYNxvZyjmzaFpXmt5ZG
grJfx0WZB+qTmT4lLw2OdwZ/3U/M1y4cMLnePmBWpdxph43gSH7AEDO2XMyhpsnQ
5To1zgfZexvZXZ1AIMAAShgGxPMLgCVy7lTdzfZVS92CiU2ou4gsh8s3wZpwjc68
VuplS28HIY2GNZQm4OackfAXwm9yR80YdW7rE6Il7eUx1pAv8OZJyQPDmoav4J7V
Ir7X9kIMK8JUtb+G79lpsQcBwJ6f+BGW+OMMqfYHfuSfPErLprYuoN9A7cZVJNtv
D9fWyHtcvyFMfyxfZmGdY1pK5M/9DWrI7e9ILp29oywZies0Zk9b9AuV3Hki8LZm
kQuJOswXKM5/g1U4QS9a5Pf0DF5qx4vAZq9OgtLnN8kyUykgZaHuJJzVSdE+wsOW
Q14aZJWWdJpBSu45NtrG
=Zb7i
-----END PGP SIGNATURE-----


Re: Sync production DB with development?

From
Jeff Ross
Date:
On 10/8/14, 2:24 PM, Israel Brewster wrote:
> Agreed. and no, I don't need all the data. But pg_dump doesn't give me an option to, say, only grab the last week of
data.

You might want to check out pg_sample.  It lets you get a referentially
consistent sample of a database.  You specify the number of rows you
want and it will get do the rest, retrieving enough rows from other
tables to satisfy any foreign key constraints.

https://github.com/mla/pg_sample

Jeff