Thread: [GENERAL] Question about loading up a table

[GENERAL] Question about loading up a table

From

Alex Samad

Date:

31 July 2017, 15:02:54

I am using pg_dump | psql to transfer data from my old 9.2 psql into a 9.6 psql.

The new DB server is setup as master replicating to a hot standby server.

What I have noticed is that the rows don't get replicated over until the copy from stdin is finished... hard to test when you have M+ lines of rows.

Is there a way to tell the master to replicate earlier or is there a way to get pg_dump to bundle into say 100K rows at a time ?

Thanks

Alex

Re: [GENERAL] Question about loading up a table

From

vinny

Date:

31 July 2017, 15:31:09

On 2017-07-31 11:02, Alex Samad wrote:
> Hi
>
> I am using pg_dump | psql to transfer data from my old 9.2 psql into a
> 9.6 psql.
>
> The new DB server is setup as master replicating to a hot standby
> server.
>
> What I have noticed is that the rows don't get replicated over until
> the copy from stdin is finished...  hard to test when you have M+ lines
> of rows.

If you are "just testing" then you could use the COPY command
https://www.postgresql.org/docs/9.2/static/sql-copy.html
to generate a smaller dataset.

>
> Is there a way to tell the master to replicate earlier

I highly doubt it, because the master cannot know what to replicate
until
your transaction is ended with a COMMIT. If you end with ROLLBACK,
or your last query is DELETE FROM (your_table>;
then there isn't even anything to replicate at all...

> or is there a
> way to get pg_dump to bundle into say 100K rows at a time ?

I'm not aware of such a feature, it would be quite tricky because
of dependencies between records. You cannot simply dump the first 100k
rows
from table A and the first 100k from table B, because row #9 from table
A
may have a relation to row 100.001 from table B.

Re: [GENERAL] Question about loading up a table

From

Scott Marlowe

Date:

01 August 2017, 02:59:50

On Mon, Jul 31, 2017 at 2:31 AM, vinny <vinny@xs4all.nl> wrote:
> On 2017-07-31 11:02, Alex Samad wrote:
>>
>> Hi
>>
>> I am using pg_dump | psql to transfer data from my old 9.2 psql into a
>> 9.6 psql.

Note that you should be doing pg_dump with 9.6's pg_dump, as it's
possible for 9.2's pg_dump to not know about a 9.6 feature.

>> The new DB server is setup as master replicating to a hot standby
>> server.
>>
>> What I have noticed is that the rows don't get replicated over until
>> the copy from stdin is finished...  hard to test when you have M+ lines
>> of rows.

SNIP

>> Is there a way to tell the master to replicate earlier
>
> I highly doubt it, because the master cannot know what to replicate until
> your transaction is ended with a COMMIT. If you end with ROLLBACK,
> or your last query is DELETE FROM (your_table>;
> then there isn't even anything to replicate at all...

This is actually a visibility issue. All the new changes are
replicated to the slave, but just like on the master, other
connections can't see the change because it's not visible. The slave,
except for some small delay (seconds etc) is an exact replica of the
master. So even a delete at the end gets replicated. You just don't
see anything but possible table bloat to show for it.

To prove this to oneself, start the copy, and get into another session
to the master. You don't see any rows there either until the commit
after the copy.

Re: [GENERAL] Question about loading up a table

From

Alex Samad

Date:

01 August 2017, 04:56:20

I'm using pg_dump 9.6 to do the dumps.

I'm also pretty sure no data is being replicated until the end of the copy stdin as I was watching tcpdump output and I can see data from the orig master to the new master and no traffic between new master and the standby, pretty sure my replication is working as my other tables have replicated over.

as for allow pg_dump to create copy stdin with specific number of rows - not sure what that is so hard / bad, if it was a option for somebody to use. For my situation its sounds like a really good idea.

I'm dumping a single table into a new single table so constraint shouldn't be a problem.

Guess I have to just let it rung to completion

Thanks

On 1 August 2017 at 06:59, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Mon, Jul 31, 2017 at 2:31 AM, vinny <vinny@xs4all.nl> wrote:
> On 2017-07-31 11:02, Alex Samad wrote:
>>
>> Hi
>>
>> I am using pg_dump | psql to transfer data from my old 9.2 psql into a
>> 9.6 psql.

Note that you should be doing pg_dump with 9.6's pg_dump, as it's
possible for 9.2's pg_dump to not know about a 9.6 feature.

>> The new DB server is setup as master replicating to a hot standby
>> server.
>>
>> What I have noticed is that the rows don't get replicated over until
>> the copy from stdin is finished... hard to test when you have M+ lines
>> of rows.

SNIP

>> Is there a way to tell the master to replicate earlier
>
> I highly doubt it, because the master cannot know what to replicate until
> your transaction is ended with a COMMIT. If you end with ROLLBACK,
> or your last query is DELETE FROM (your_table>;
> then there isn't even anything to replicate at all...

This is actually a visibility issue. All the new changes are
replicated to the slave, but just like on the master, other
connections can't see the change because it's not visible. The slave,
except for some small delay (seconds etc) is an exact replica of the
master. So even a delete at the end gets replicated. You just don't
see anything but possible table bloat to show for it.

To prove this to oneself, start the copy, and get into another session
to the master. You don't see any rows there either until the commit
after the copy.

Re: [GENERAL] Question about loading up a table

From

Alex Samad

Date:

01 August 2017, 12:16:29

I double checked and there is data going over, thought I would correct that.

But it seems to be very slow. Having said that how do I / what tools do I use to check through put

On 1 August 2017 at 08:56, Alex Samad <alex@samad.com.au> wrote:

Hi

I'm using pg_dump 9.6 to do the dumps.

I'm also pretty sure no data is being replicated until the end of the copy stdin as I was watching tcpdump output and I can see data from the orig master to the new master and no traffic between new master and the standby, pretty sure my replication is working as my other tables have replicated over.

as for allow pg_dump to create copy stdin with specific number of rows - not sure what that is so hard / bad, if it was a option for somebody to use. For my situation its sounds like a really good idea.

I'm dumping a single table into a new single table so constraint shouldn't be a problem.

Guess I have to just let it rung to completion

Thanks

On 1 August 2017 at 06:59, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Jul 31, 2017 at 2:31 AM, vinny <vinny@xs4all.nl> wrote:
> On 2017-07-31 11:02, Alex Samad wrote:
>>
>> Hi
>>
>> I am using pg_dump | psql to transfer data from my old 9.2 psql into a
>> 9.6 psql.

Note that you should be doing pg_dump with 9.6's pg_dump, as it's
possible for 9.2's pg_dump to not know about a 9.6 feature.

>> The new DB server is setup as master replicating to a hot standby
>> server.
>>
>> What I have noticed is that the rows don't get replicated over until
>> the copy from stdin is finished... hard to test when you have M+ lines
>> of rows.

SNIP

>> Is there a way to tell the master to replicate earlier
>
> I highly doubt it, because the master cannot know what to replicate until
> your transaction is ended with a COMMIT. If you end with ROLLBACK,
> or your last query is DELETE FROM (your_table>;
> then there isn't even anything to replicate at all...

This is actually a visibility issue. All the new changes are
replicated to the slave, but just like on the master, other
connections can't see the change because it's not visible. The slave,
except for some small delay (seconds etc) is an exact replica of the
master. So even a delete at the end gets replicated. You just don't
see anything but possible table bloat to show for it.

To prove this to oneself, start the copy, and get into another session
to the master. You don't see any rows there either until the commit
after the copy.

Re: [GENERAL] Question about loading up a table

From

Scott Marlowe

Date:

01 August 2017, 19:24:22

On Mon, Jul 31, 2017 at 11:16 PM, Alex Samad <alex@samad.com.au> wrote:
> Hi
>
> I double checked and there is data going over, thought I would correct that.
>
> But it seems to be very slow.   Having said that how do I / what tools do I
> use to check through put

Try the pg_current_xlog_location function on the slave?

Re: [GENERAL] Question about loading up a table

From

Alex Samad

Date:

02 August 2017, 05:27:27

So just to go over what i have

server A (this is the original pgsql server 9.2)

Server X and Server Y ... PGSQL 9.6 in a cluster - streaming replication with hot standby.

I have 2 tables about 2.5T of diskspace.

I want to get the date from A into X and X will replicate into Y.

I am currently on X using this command

pg_dump -U <USER> -h <Server A > -t BIGTABLE -a <DB> | sudo -u postgres -i psql -q <DB>;

This is taking a long time, its been 2 days and I have xfered around 2T.. This is just a test to see how long and to populate my new UAT env. so I will have to do it again.

Problem is time. the pg_dump process is single threaded.

I have 2 routers in between A and X but its 10G networking - but my network graphs don't show much traffic.

Server X is still in use, there are still records being inserted into the tables.

How can I make this faster.

I could shutdown server A and present the disks to server X, could I load this up in PGSQL and do a table to table copy - i presume this would be faster ... is this possible ? how do I get around the same DB name ?

What other solutions do I have ?

Alex

On 1 August 2017 at 23:24, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Mon, Jul 31, 2017 at 11:16 PM, Alex Samad <alex@samad.com.au> wrote:
> Hi
>
> I double checked and there is data going over, thought I would correct that.
>
> But it seems to be very slow. Having said that how do I / what tools do I
> use to check through put

Try the pg_current_xlog_location function on the slave?

Re: [GENERAL] Question about loading up a table

From

Scott Marlowe

Date:

02 August 2017, 22:11:08

On Tue, Aug 1, 2017 at 4:27 PM, Alex Samad <alex@samad.com.au> wrote:
> Hi
>
> So just to go over what i have
>
>
> server A (this is the original pgsql server 9.2)
>
> Server X and Server Y ... PGSQL 9.6 in a cluster - streaming replication
> with hot standby.
>
>
> I have 2 tables about 2.5T of diskspace.
>
> I want to get the date from A into X and X will replicate into Y.
>
>
> I am currently on X using this command
>
> pg_dump -U <USER> -h <Server A > -t BIGTABLE -a <DB> | sudo -u postgres -i
> psql -q <DB>;
>
> This is taking a long time, its been 2 days and I have xfered around 2T..
> This is just a test to see how long and to populate my new UAT env. so I
> will have to do it again.
>
> Problem is time.  the pg_dump process is single threaded.
> I have 2 routers in between A and X but its 10G networking - but my network
> graphs don't show much traffic.
>
> Server X is still in use, there are still records being inserted into the
> tables.
>
> How can I make this faster.
>
> I could shutdown server A and present the disks to server X, could I load
> this up in PGSQL and do a table to table copy - i presume this would be
> faster ... is this possible ?  how do I get around the same DB name ?
> What other solutions do I have ?

Yes, but if it's taking days to transfer 2TB then you need to
investigate where your performance is tanking.

Have you tried resyncing / scping files across the network to see how
fast your network connection is?

Have you tried just pg_dumping / restoring locally to get an idea how
fast you can dump / restore withoout doing it over a network
connection?

Are you IO bound? Network bound? CPU bound?

Is the destination copying data, or building indexes? Do you insert
into a schema that already has indexes in place? If so have you tried
dropping the indexes first and rebuilding them?

Re: [GENERAL] Question about loading up a table

From

Alex Samad

Date:

03 August 2017, 06:12:59

I don't have an extra 4T of filespace. I could potentially move the attached lun from one server and attach to the other

well that was my question how to check if its pg_dump thats bound. I have checked network performance - 9.8Gb and I can write more data to disk

I do have 1 index

On 3 August 2017 at 02:11, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Tue, Aug 1, 2017 at 4:27 PM, Alex Samad <alex@samad.com.au> wrote:
> Hi
>
> So just to go over what i have
>
>
> server A (this is the original pgsql server 9.2)
>
> Server X and Server Y ... PGSQL 9.6 in a cluster - streaming replication
> with hot standby.
>
>
> I have 2 tables about 2.5T of diskspace.
>
> I want to get the date from A into X and X will replicate into Y.
>
>
> I am currently on X using this command
>
> pg_dump -U <USER> -h <Server A > -t BIGTABLE -a <DB> | sudo -u postgres -i
> psql -q <DB>;
>
> This is taking a long time, its been 2 days and I have xfered around 2T..
> This is just a test to see how long and to populate my new UAT env. so I
> will have to do it again.
>
> Problem is time. the pg_dump process is single threaded.
> I have 2 routers in between A and X but its 10G networking - but my network
> graphs don't show much traffic.
>
> Server X is still in use, there are still records being inserted into the
> tables.
>
> How can I make this faster.
>
> I could shutdown server A and present the disks to server X, could I load
> this up in PGSQL and do a table to table copy - i presume this would be
> faster ... is this possible ? how do I get around the same DB name ?
> What other solutions do I have ?

Yes, but if it's taking days to transfer 2TB then you need to
investigate where your performance is tanking.

Have you tried resyncing / scping files across the network to see how
fast your network connection is?

Have you tried just pg_dumping / restoring locally to get an idea how
fast you can dump / restore withoout doing it over a network
connection?

Are you IO bound? Network bound? CPU bound?

Is the destination copying data, or building indexes? Do you insert
into a schema that already has indexes in place? If so have you tried
dropping the indexes first and rebuilding them?