Thread: psql \copy hanging

psql \copy hanging

From
"Arnaud L."
Date:
Hi list

On Windows, I have an unattended script that exports data overnight by 
issuing psql commands like :
psql -f mycommands.sql
whith mycommands.sql containing a bunch of \copy lines like :
\copy (SELECT * FROM someview) TO 
'\\windowsserver\windowsshare\somefile.csv'

When I run this script by hand, everything is fine but when it runs 
overnight it hangs and allways on the same line.
There is no lock in the database, and the connection is marked as active 
in the backend and running the statement "COPY (SELECT * FROM myview) TO 
STDOUT".
So where can things be hanging ?
My first guess (appart from db lock) was that the target file was locked 
by the filesystem. But wouldn't psql know about it and fail with an error ?

Thanks for your help !

--
Arnaud



Re: psql \copy hanging

From
Adrian Klaver
Date:
On 8/22/19 1:47 AM, Arnaud L. wrote:
> Hi list
> 
> On Windows, I have an unattended script that exports data overnight by 
> issuing psql commands like :
> psql -f mycommands.sql
> whith mycommands.sql containing a bunch of \copy lines like :
> \copy (SELECT * FROM someview) TO 
> '\\windowsserver\windowsshare\somefile.csv'
> 
> When I run this script by hand, everything is fine but when it runs 
> overnight it hangs and allways on the same line.
> There is no lock in the database, and the connection is marked as active 
> in the backend and running the statement "COPY (SELECT * FROM myview) TO 
> STDOUT".
> So where can things be hanging ?

The first thing I think of when seeing this sort of behavior on Windows 
is anti-virus software. Doing you have any running on the machine?

You say it always hangs on the same line. Is that line doing something 
different from the others? Say writing to a different location.




> My first guess (appart from db lock) was that the target file was locked 
> by the filesystem. But wouldn't psql know about it and fail with an error ?
> 
> Thanks for your help !
> 
> -- 
> Arnaud
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 22/08/2019 à 16:00, Adrian Klaver a écrit :
> The first thing I think of when seeing this sort of behavior on Windows
> is anti-virus software. Doing you have any running on the machine?
> 
> You say it always hangs on the same line. Is that line doing something
> different from the others? Say writing to a different location.

Thanks for your insights Adrian.

There is no AV software on the machine as it is a Server version of 
Windows. (I know some people run AV on Windows Server, but I don't)

All the lines write to the same location which is a network share, but 
the hanging one is clearly the longest query to execute and the output 
file is one of the biggest one (though not the biggest).

Note that the share points to a local folder. The fact the psql writes 
to a share is only for robustness so that this script could be executed 
from any machine.

Cheers

--
Arnaud



Re: psql \copy hanging

From
Adrian Klaver
Date:
On 8/22/19 7:07 AM, Arnaud L. wrote:
> Le 22/08/2019 à 16:00, Adrian Klaver a écrit :
>> The first thing I think of when seeing this sort of behavior on Windows
>> is anti-virus software. Doing you have any running on the machine?
>>
>> You say it always hangs on the same line. Is that line doing something
>> different from the others? Say writing to a different location.
> 
> Thanks for your insights Adrian.
> 
> There is no AV software on the machine as it is a Server version of 
> Windows. (I know some people run AV on Windows Server, but I don't)
> 
> All the lines write to the same location which is a network share, but 
> the hanging one is clearly the longest query to execute and the output 
> file is one of the biggest one (though not the biggest).

The only thing I can think of at the moment is to put the offending line 
in a separate script and see what happens.

> 
> Note that the share points to a local folder. The fact the psql writes 
> to a share is only for robustness so that this script could be executed 
> from any machine.
> 
> Cheers
> 
> -- 
> Arnaud
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 22/08/2019 à 16:21, Adrian Klaver a écrit :
> The only thing I can think of at the moment is to put the offending line
> in a separate script and see what happens.

Then I think I'll first try to switch the line orders. I'll execute this 
action last and see if it changes anything. If it doesn't, I'll break 
the script into smaller parts.

Thanks for you help !

--
Arnaud



Re: psql \copy hanging

From
Jerry Sievers
Date:
"Arnaud L." <arnaud.listes@codata.eu> writes:

> Hi list
>
> On Windows, I have an unattended script that exports data overnight by
> issuing psql commands like :
> psql -f mycommands.sql
> whith mycommands.sql containing a bunch of \copy lines like :
> \copy (SELECT * FROM someview) TO
> '\\windowsserver\windowsshare\somefile.csv'
>
> When I run this script by hand, everything is fine but when it runs
> overnight it hangs and allways on the same line.

Is the overnight run being done as the same DB user you are testing
with?

If not, then perhaps the automated user deviates in resource settings
and/or permissions, in turn resulting in sluggish performance caused by
thrashing and/or inferior query plan.

> There is no lock in the database, and the connection is marked as
> active in the backend and running the statement "COPY (SELECT * FROM
> myview) TO STDOUT".
> So where can things be hanging ?
> My first guess (appart from db lock) was that the target file was
> locked by the filesystem. But wouldn't psql know about it and fail
> with an error ?
>
> Thanks for your help !
>
> --
> Arnaud
>
>
>

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 22/08/2019 à 20:00, Jerry Sievers a écrit :
> Is the overnight run being done as the same DB user you are testing
> with?
> 
> If not, then perhaps the automated user deviates in resource settings
> and/or permissions, in turn resulting in sluggish performance caused by
> thrashing and/or inferior query plan.


Hi Jerry,
yes, they both run under the same user. I've checked my other scheduled 
tasks and I don't see what could interfere, but then this is quite 
complex so I could easily overlook something.
Unfortunately, tonight's run failed for a totally different reason...

Thanks for your help


Cheers
--
Arnaud



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 22/08/2019 à 16:23, Arnaud L. a écrit :
> Le 22/08/2019 à 16:21, Adrian Klaver a écrit :
>> The only thing I can think of at the moment is to put the offending line
>> in a separate script and see what happens.
> 
> Then I think I'll first try to switch the line orders. I'll execute this
> action last and see if it changes anything. If it doesn't, I'll break
> the script into smaller parts.

It ran fine over the week-end so I'll consider this as temporary glitch. 
It's always dissatisfying not to know what has happened, but I'm not 
going to try too hard to reproduce this.

Thanks for helping and sorry for the noise !

Cheers
--
Arnaud



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 22/08/2019 à 10:47, Arnaud L. a écrit :
> On Windows, I have an unattended script that exports data overnight by
> issuing psql commands like :
> psql -f mycommands.sql
> whith mycommands.sql containing a bunch of \copy lines like :
> \copy (SELECT * FROM someview) TO
> '\\windowsserver\windowsshare\somefile.csv'
> 
> When I run this script by hand, everything is fine but when it runs
> overnight it hangs and allways on the same line.
> There is no lock in the database, and the connection is marked as active
> in the backend and running the statement "COPY (SELECT * FROM myview) TO
> STDOUT".
> So where can things be hanging ?
> My first guess (appart from db lock) was that the target file was locked
> by the filesystem. But wouldn't psql know about it and fail with an error ?


OK, so it happended again... I'm really puzzled.
There is no lock on the FS that I can find (openfiles reports 0 lock on 
this file).
The client psql process is sitting idle, doing nothing.
In PostgreSQL, the connection is active with the query being simple copy 
(select * from view) to stdout.
In pg_stat_activity wait_event_type IS NULL, state is active, 
query_start is 2019-08-27 00:18:20.471957+02 and state_change is 
2019-08-27 00:18:20.47196+02...
Nothing in postgresql LOG around that time.
Later I have : 2019-08-27 01:00:04 CEST LOG:  could not receive data 
from client: An existing connection was forcibly closed by the remote host.
But I realy doubt this is related...

Any other idea ? I'll change the lines order for tonight's run, but that 
is not what I'd call a solution...

Thanks a lot !

--
Arnaud



Re: psql \copy hanging

From
Luca Ferrari
Date:
On Tue, Aug 27, 2019 at 9:54 AM Arnaud L. <arnaud.listes@codata.eu> wrote:
> Any other idea ? I'll change the lines order for tonight's run, but that
> is not what I'd call a solution...

Does it hangs against the same line content or the same line number?
Are you able to run the script automatically during working hours (to
avoid firewalling or upgrades running in parrallel to your nightly
script execution)?
Any chance something is querying the same data and a lock blocks the
transaction (pg_locks)?

Just throwing on the table some desperate ideas....

Luca



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 27/08/2019 à 10:00, Luca Ferrari a écrit :
> On Tue, Aug 27, 2019 at 9:54 AM Arnaud L. <arnaud.listes@codata.eu> wrote:
>> Any other idea ? I'll change the lines order for tonight's run, but that
>> is not what I'd call a solution...
> 
> Does it hangs against the same line content or the same line number?
> Are you able to run the script automatically during working hours (to
> avoid firewalling or upgrades running in parrallel to your nightly
> script execution)?
> Any chance something is querying the same data and a lock blocks the
> transaction (pg_locks)?

Hi Luca.
I can run the script just fine during working hours.
I checked pg_locks, and this pid is the only process requesting locks. 
It has around 100 of them since it is a view querying multiple tables, 
but I see nothing blocking. All locks are granted.
I have a csv output of pg_locks so I can post this if asked.

It hangs against the same line in the sql script, all lines being "\copy 
(select ....) to 'file on unc share'".
This line is simply the longest running query because the view inside 
the select outputs almost 1M rows and does some subqueries.

Also when the script is hung, output has not started (file size is 0).


> Just throwing on the table some desperate ideas....

Desperate ideas are very welcome !

Cheers
--
Arnaud



Re: psql \copy hanging

From
Luca Ferrari
Date:
On Tue, Aug 27, 2019 at 10:48 AM Arnaud L. <arnaud.listes@codata.eu> wrote:
> I can run the script just fine during working hours.

I meant thru your scheduler (cron or something).

> It hangs against the same line in the sql script, all lines being "\copy
> (select ....) to 'file on unc share'".

This is a new detail to me: what if you output to a local file and
move it after on the share?


> This line is simply the longest running query because the view inside
> the select outputs almost 1M rows and does some subqueries.
>

I still don't get: is the line content the longest or the highest
numbered in the output?

> Also when the script is hung, output has not started (file size is 0).

This makes me think there is some problem with the share, not the
script. Again: test against a local file that you move (rsync?) to the
share after.

Luca



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 27/08/2019 à 10:57, Luca Ferrari a écrit :
> On Tue, Aug 27, 2019 at 10:48 AM Arnaud L. <arnaud.listes@codata.eu> wrote:
>> I can run the script just fine during working hours.
> 
> I meant thru your scheduler (cron or something).

Yes, it runs fine too. There is indeed something happening on the server 
at this specific time but I don't know what.


>> It hangs against the same line in the sql script, all lines being "\copy
>> (select ....) to 'file on unc share'".
> 
> This is a new detail to me: what if you output to a local file and
> move it after on the share?

I could do this but it would make the script a lot more complicated.
There are a lot of views that I \copy directly to this share, and this 
is the only one that poses any problem.


> I still don't get: is the line content the longest or the highest
> numbered in the output?

Sorry, I don't undertand either. Your question was "Does it hangs 
against the same line content or the same line number?".
I run an sql script that contains a bunch of \copy commands. It is 
always the same \copy that is hanging.


>> Also when the script is hung, output has not started (file size is 0).
> 
> This makes me think there is some problem with the share, not the
> script. Again: test against a local file that you move (rsync?) to the
> share after.

Everything is windows based, so rsync is not really an option. There are 
alternatives of course.
I could write the file locally, but for robustness I'm trying not to 
rely on the local FS so that this script could be run from any machine 
on the network without modifications.

Cheers
--
Arnaud



Re: psql \copy hanging

From
Luca Ferrari
Date:
On Tue, Aug 27, 2019 at 12:34 PM Arnaud L. <arnaud.listes@codata.eu> wrote:
> I could do this but it would make the script a lot more complicated.
> There are a lot of views that I \copy directly to this share, and this
> is the only one that poses any problem.

I would however give it a try. I would schedule a nightly execution
with a local path to see if that completes.

> Sorry, I don't undertand either. Your question was "Does it hangs
> against the same line content or the same line number?".
> I run an sql script that contains a bunch of \copy commands. It is
> always the same \copy that is hanging.

Now I got it: your script is hanging against a specific \copy command,
while I was thinking it was hanging once it was dumping a specific
table line from your copy.

> I could write the file locally, but for robustness I'm trying not to
> rely on the local FS so that this script could be run from any machine
> on the network without modifications.

My personal experience with shares and network that are not so much
stable is that they can block, and that is why I'm suggesting to try
on the local filesystem to see if that is a share-related problem or a
resource problem. Also scheduling the script at a different time could
help, because it could be at that time the machine (and it could mean
the sharing machine) is busy at the point it does not respond.

Luca



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 27/08/2019 à 13:13, Luca Ferrari a écrit :
> My personal experience with shares and network that are not so much
> stable is that they can block, and that is why I'm suggesting to try
> on the local filesystem to see if that is a share-related problem or a
> resource problem. Also scheduling the script at a different time could
> help, because it could be at that time the machine (and it could mean
> the sharing machine) is busy at the point it does not respond.

I'm quite confident that changing the schedule would help (since the 
script runs fine during work hours), but the time-table is already quite 
busy.
I move the offending line at the end of the script, so it will run some 
minutes later, maybe this will be enough.
I'll also try to write it directly on a local folder to rule this out.

Cheers
--
Arnaud



Re: psql \copy hanging

From
Shaozhong SHI
Date:
Interesting topic.

Is there any easy way to debug, and get correct error messages?

Regards,

Shao

On Tue, 27 Aug 2019 at 12:17, Arnaud L. <arnaud.listes@codata.eu> wrote:
Le 27/08/2019 à 13:13, Luca Ferrari a écrit :
> My personal experience with shares and network that are not so much
> stable is that they can block, and that is why I'm suggesting to try
> on the local filesystem to see if that is a share-related problem or a
> resource problem. Also scheduling the script at a different time could
> help, because it could be at that time the machine (and it could mean
> the sharing machine) is busy at the point it does not respond.

I'm quite confident that changing the schedule would help (since the
script runs fine during work hours), but the time-table is already quite
busy.
I move the offending line at the end of the script, so it will run some
minutes later, maybe this will be enough.
I'll also try to write it directly on a local folder to rule this out.

Cheers
--
Arnaud


Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 27/08/2019 à 13:17, Arnaud L. a écrit :
> I move the offending line at the end of the script, so it will run some
> minutes later, maybe this will be enough.

OK, so this was enough for last night's schedule to run without problem.
I still don't get it so I'm not satisfied with this solution, but at 
least it works.
I'll keep the list informed if something new arises.

Thanks to everyone for helping !

Cheers
--
Arnaud



Re: psql \copy hanging

From
Luca Ferrari
Date:
On Wed, Aug 28, 2019 at 9:09 AM Arnaud L. <arnaud.listes@codata.eu> wrote:
> OK, so this was enough for last night's schedule to run without problem.
> I still don't get it so I'm not satisfied with this solution, but at
> least it works.
> I'll keep the list informed if something new arises.

I don't want to be pedantic, but I would have tried with a single
change at a time.
And my bet is: the local file would do the trick (i.e., it is a weird
share problem).

If you are going to invest some time, you could also try to write a
small file on the share just before the copy starts, so that you are
guaranteed the share is working. Something like:

echo $(date) >> $SHARE/log.txt
psql 'copy ...'
echo 'done' >> $SHARE/log.txt

Luca



Re: psql \copy hanging

From
"Arnaud L."
Date:
Le 28/08/2019 à 09:43, Luca Ferrari a écrit :
> I don't want to be pedantic, but I would have tried with a single
> change at a time.
> And my bet is: the local file would do the trick (i.e., it is a weird
> share problem).

You're not don't worry. This process is quite important in our workflow 
(not critical), so I can't really afford to make tests for weeks.

Moving the offending line to the end of the script is what I'd consider 
"single change". To me it makes something clear : the problem occurs 
only at the very specific time this command is running.


> If you are going to invest some time, you could also try to write a
> small file on the share just before the copy starts, so that you are
> guaranteed the share is working. Something like:
> 
> echo $(date) >> $SHARE/log.txt
> psql 'copy ...'
> echo 'done' >> $SHARE/log.txt

Well, I do know that the share is working just before this command runs 
because ALL the commands in the script write to this specific share.
The script is basically outputting many views results in a single share, 
something like :
\copy (select * from view1) TO '\\myserver\myshare\file1.csv'
\copy (select * from view2) TO '\\myserver\myshare\file2.csv'
...
\copy (select * from view99) TO '\\myserver\myshare\file99.csv'
So the \copy command right before (say view15 for instance) is writing 
to this same share just milliseconds before the the problematic command 
(view16) tries to do the same.
Since this particular view takes some time to execute, there is some 
time between the moment the file gets created on the share (which 
happens immediately when the \copy command runs if I get it right), and 
the moment psql receives content and starts writing to it (my tests 
suggest 1min to 1min30s). Either psql doesn't receive anything 
(possible, since the connection is marked as active but it does not look 
as if it's doing anything at all), or there has been some timeout.
It could have been tcp keepalive, but in Windows the default is 2h I 
believe and postgresql uses system default if nothing is specified in 
conf (which is my case).

So with all this in mind I'd rather think I have a problem with either 
psql's \copy or with my query on the server side. But I'm not rulling 
anything out of course.

One other thing I could try is using COPY TO STDOUT \g. From what I 
understand in the documentation this would not be 100% similar to what 
\copy is doing.
Anyway, if it works with the current setup I won't dig into it much deeper.

Cheers
--
Arnaud