Thread: psql \copy hanging
Hi list, Le 28/08/2019 à 09:43, Luca Ferrari a écrit : > I don't want to be pedantic, but I would have tried with a single change at a time. > And my bet is: the local file would do the trick (i.e., it is a weird share problem). Well, this problem is still bugging me, and this time I've tried with a local file. Unfortunately, it did not help. To further rule out filesystem problems, I first took care to delete the target files before copying to it, but it did not help either. So now I'm quite confident that the problem is either psql or even postgresql itself. Does anyone know of anything I could try to try to fix or debug this ? Thanks a lot for your help! Regards -- Arnaud
On 10/2/19 11:51 PM, Arnaud L. wrote: > Hi list, > > Le 28/08/2019 à 09:43, Luca Ferrari a écrit : >> I don't want to be pedantic, but I would have tried with a single >> change at a time. >> And my bet is: the local file would do the trick (i.e., it is a weird >> share problem). > > Well, this problem is still bugging me, and this time I've tried with a > local file. Unfortunately, it did not help. > To further rule out filesystem problems, I first took care to delete the > target files before copying to it, but it did not help either. > > So now I'm quite confident that the problem is either psql or even > postgresql itself. > > Does anyone know of anything I could try to try to fix or debug this ? Going back to the original thread I noticed it was not specified what program was being used to run the script in the overnight session. So what is being used to run the script overnight? > > Thanks a lot for your help! > > Regards > -- > Arnaud > > > > -- Adrian Klaver adrian.klaver@aklaver.com
Le 03/10/2019 à 15:54, Adrian Klaver a écrit : > On 10/2/19 11:51 PM, Arnaud L. wrote: >> Well, this problem is still bugging me, and this time I've tried with a >> local file. Unfortunately, it did not help. >> To further rule out filesystem problems, I first took care to delete the >> target files before copying to it, but it did not help either. >> >> So now I'm quite confident that the problem is either psql or even >> postgresql itself. >> >> Does anyone know of anything I could try to try to fix or debug this ? > > Going back to the original thread I noticed it was not specified what > program was being used to run the script in the overnight session. > > So what is being used to run the script overnight? Yes, sorry for having lost the original thread, my mailbox has a quite stupid automatic purge schedule... The script is run in a windows batch file. Basically, export.bat contains : SET PGUSER=myuser SET PGPASSWORD=mypwd SET PGCLIENTENCODING=UTF8 SET MYPGSERVER=myserverurl SET MYPGDB=mydatabase psql -h %MYPGSERVER% -a -f myscript.sql %MYPGDB% And myscript.sql contains : \copy (SELECT * FROM view1) TO '\\server\share\view1.txt' \copy (SELECT * FROM view2) TO '\\server\share\view2.txt' ... etc with ~60 views Today, I've update the problematic \copy line to be : COPY (SELECT * FROM view) TO STDOUT \g '\\server\share\view.txt' I'll keep you informed (even though a successfull run is not a guarantee of success, because the original script did sometimes work). Regards -- Arnaud
On 10/3/19 7:13 AM, Arnaud L. wrote: > Le 03/10/2019 à 15:54, Adrian Klaver a écrit : >> On 10/2/19 11:51 PM, Arnaud L. wrote: >>> Well, this problem is still bugging me, and this time I've tried with >>> a local file. Unfortunately, it did not help. >>> To further rule out filesystem problems, I first took care to delete >>> the target files before copying to it, but it did not help either. >>> >>> So now I'm quite confident that the problem is either psql or even >>> postgresql itself. >>> >>> Does anyone know of anything I could try to try to fix or debug this ? >> >> Going back to the original thread I noticed it was not specified what >> program was being used to run the script in the overnight session. >> >> So what is being used to run the script overnight? > > > Yes, sorry for having lost the original thread, my mailbox has a quite > stupid automatic purge schedule... > > The script is run in a windows batch file. > Basically, export.bat contains : > > SET PGUSER=myuser > SET PGPASSWORD=mypwd > SET PGCLIENTENCODING=UTF8 > SET MYPGSERVER=myserverurl > SET MYPGDB=mydatabase > psql -h %MYPGSERVER% -a -f myscript.sql %MYPGDB% I may have missed it before, but where is the Postgres server located? Given that this seems to be some sort of resource issue and that the below contains commands that are not dependent on each other, have you thought of splitting myscript.sql into two scripts? If nothing else the n(total line)-problem line in one script and the problem line in another. That would help nail down whether that line is truly the problem or if it is an interaction with running the other 50+ lines. > > And myscript.sql contains : > \copy (SELECT * FROM view1) TO '\\server\share\view1.txt' > \copy (SELECT * FROM view2) TO '\\server\share\view2.txt' > ... > etc with ~60 views > > Today, I've update the problematic \copy line to be : > COPY (SELECT * FROM view) TO STDOUT \g '\\server\share\view.txt' > > I'll keep you informed (even though a successfull run is not a guarantee > of success, because the original script did sometimes work). > > Regards > -- > Arnaud > -- Adrian Klaver adrian.klaver@aklaver.com
Le 03/10/2019 à 16:32, Adrian Klaver a écrit : > I may have missed it before, but where is the Postgres server located? On the same local area network. Not on the computer running the script (so direct COPY TO <file> is not an option). > Given that this seems to be some sort of resource issue and that the > below contains commands that are not dependent on each other, have you > thought of splitting myscript.sql into two scripts? If nothing else the > n(total line)-problem line in one script and the problem line in > another. That would help nail down whether that line is truly the > problem or if it is an interaction with running the other 50+ lines. OK I can do that. I thought I nailed it down to this line because it started failing when this line was ~5th in the script, and it kept failing on that very same line after I moved it at the very end of the script (that's where it is now). As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a script file (i.e. it does not work if the command is passed in a file via the -f argument). The command runs fine, no error is raised either by the client or the server, but no file is written. Regards -- Arnaud
On 10/4/19 12:19 AM, Arnaud L. wrote: > Le 03/10/2019 à 16:32, Adrian Klaver a écrit : >> I may have missed it before, but where is the Postgres server located? > > On the same local area network. Not on the computer running the script > (so direct COPY TO <file> is not an option). > > >> Given that this seems to be some sort of resource issue and that the >> below contains commands that are not dependent on each other, have you >> thought of splitting myscript.sql into two scripts? If nothing else the >> n(total line)-problem line in one script and the problem line in >> another. That would help nail down whether that line is truly the >> problem or if it is an interaction with running the other 50+ lines. > > OK I can do that. I thought I nailed it down to this line because it > started failing when this line was ~5th in the script, and it kept > failing on that very same line after I moved it at the very end of the > script (that's where it is now). Which tends to point to it as the problem. The question is whether it exhibits that behavior on its own or only when in combination with the other commands. > > As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a > script file (i.e. it does not work if the command is passed in a file > via the -f argument). > The command runs fine, no error is raised either by the client or the > server, but no file is written. Yeah not sure how that is supposed to work: production_(postgres)# select version(); version ---------------------------------------------------------------------------------------------------------------------------- PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], 64-bit (1 row) ^ production_(postgres)# \copy (select * from cell_per) TO 'cell.txt' COPY 68 production_(postgres)# \copy (select * from cell_per) TO STDOUT \g 'cell.txt' ERROR: syntax error at or near "\" LINE 1: COPY ( select * from cell_per ) TO STDOUT \g 'cell.txt' ^ production_(postgres)# \copy (select * from cell_per) TO STDOUT\g 'cell.txt' ERROR: syntax error at or near "'cell.txt'" LINE 1: COPY ( select * from cell_per ) TO STDOUT 'cell.txt' > > Regards > -- > Arnaud > > > -- Adrian Klaver adrian.klaver@aklaver.com
Le 04/10/2019 à 19:08, Adrian Klaver a écrit : > On 10/4/19 12:19 AM, Arnaud L. wrote: >> OK I can do that. I thought I nailed it down to this line because it >> started failing when this line was ~5th in the script, and it kept >> failing on that very same line after I moved it at the very end of the >> script (that's where it is now). > > Which tends to point to it as the problem. The question is whether it > exhibits that behavior on its own or only when in combination with the > other commands. Yes. It ran fine this last night. I had moved the line back to its original place, so now everything is exactly like it was before it started showing this behaviour. So, still apparently random... >> As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a >> script file (i.e. it does not work if the command is passed in a file >> via the -f argument). >> The command runs fine, no error is raised either by the client or the >> server, but no file is written. > > Yeah not sure how that is supposed to work: > [...] > production_(postgres)# \copy (select * from cell_per) TO STDOUT \g > 'cell.txt' > ERROR: syntax error at or near "\" > LINE 1: COPY ( select * from cell_per ) TO STDOUT \g 'cell.txt' This works with real SQL commands, so it should be "COPY" here, not "\copy". Regards -- Arnaud
On 10/7/19 12:41 AM, Arnaud L. wrote: > Le 04/10/2019 à 19:08, Adrian Klaver a écrit : >> On 10/4/19 12:19 AM, Arnaud L. wrote: >>> OK I can do that. I thought I nailed it down to this line because it >>> started failing when this line was ~5th in the script, and it kept >>> failing on that very same line after I moved it at the very end of >>> the script (that's where it is now). >> >> Which tends to point to it as the problem. The question is whether it >> exhibits that behavior on its own or only when in combination with the >> other commands. > > Yes. It ran fine this last night. I had moved the line back to its > original place, so now everything is exactly like it was before it > started showing this behaviour. So you are saying that you have not run the problematic line by itself? > > So, still apparently random... >> Yeah not sure how that is supposed to work: >> > [...] >> production_(postgres)# \copy (select * from cell_per) TO STDOUT \g >> 'cell.txt' >> ERROR: syntax error at or near "\" >> LINE 1: COPY ( select * from cell_per ) TO STDOUT \g 'cell.txt' > > This works with real SQL commands, so it should be "COPY" here, not > "\copy". I was not paying attention, thanks for the heads up. > > Regards > -- > Arnaud > -- Adrian Klaver adrian.klaver@aklaver.com
Le 07/10/2019 à 16:36, Adrian Klaver a écrit : > So you are saying that you have not run the problematic line by itself? It depends what you mean by that. I've run this line by itself many times. Everytime the script has failed in fact. But until today I had not splitted the batch script to call two separate SQL scripts with one containing only the problematic line, no. I've changed it this morning, so we'll see how it goes now. Regards -- Arnaud
Le 07/10/2019 à 16:36, Adrian Klaver a écrit : > So you are saying that you have not run the problematic line by itself? It hung during last night's run. I had modified my batch script to run the \copy commands separately, i.e. it now reads as : psql -h myserver -a mydb < originalscriptwithoutproblematicline.sql psql -h myserver -a mydb < problematicline.sql It hung at the problematic line, so during the second psql command. I'm really at loss... I *believe* that the problem lies either in psql or in PostgreSQL, but I really don't know what to try now. Regards -- Arnaud
út 8. 10. 2019 v 9:06 odesílatel Arnaud L. <arnaud.listes@codata.eu> napsal:
Le 07/10/2019 à 16:36, Adrian Klaver a écrit :
> So you are saying that you have not run the problematic line by itself?
It hung during last night's run.
I had modified my batch script to run the \copy commands separately,
i.e. it now reads as :
psql -h myserver -a mydb < originalscriptwithoutproblematicline.sql
psql -h myserver -a mydb < problematicline.sql
It hung at the problematic line, so during the second psql command.
I'm really at loss... I *believe* that the problem lies either in psql
or in PostgreSQL, but I really don't know what to try now.
you can write simple C application with COPY API https://www.postgresql.org/docs/12/libpq-copy.html
Then you can eliminate or ensure locality of problem.
more, you can use server side copy. Superuser can read data from server file system.
Regards
Pavel
Regards
--
Arnaud
Le 08/10/2019 à 09:28, Pavel Stehule a écrit : > you can write simple C application with COPY API > https://www.postgresql.org/docs/12/libpq-copy.html Unfortunately, I don't know C. > Then you can eliminate or ensure locality of problem. > > more, you can use server side copy. Superuser can read data from server > file system. Yes, but in this case the file has to be written to a network share, and the windows user under wich PostgreSQL runs (Network Service) cannot be given write permission on this share. That's the reason for the use of \copy. Now that I think about it, *maybe* this started happening after a server upgrade. Since this is intermittent, I'm not really sure about this, but some time ago we moved our server to a different hardware and upgraded from 9.3 to 11 at the same time. The dates don't perfectly match though, we upgraded around 8th of august and the problem arose ~2 weeks later for the first time. The client was upgraded around that same time period (not exactly the same time if I remember correctly). Regards -- Arnaud
Arnaud L. wrote: > As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a > script file (i.e. it does not work if the command is passed in a file > via the -f argument). > The command runs fine, no error is raised either by the client or the > server, but no file is written. Testing this with 11.5, it works for me. Make sure you're running the latest minor release (on the client side in this case), because a related fix was issued last February. For the 11 branch it was in version 11.2. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Le 08/10/2019 à 12:55, Daniel Verite a écrit : > Arnaud L. wrote: > >> As a side note, COPY (...) TO STDOUT \g 'somefile' does not work in a >> script file (i.e. it does not work if the command is passed in a file >> via the -f argument). >> The command runs fine, no error is raised either by the client or the >> server, but no file is written. > > Testing this with 11.5, it works for me. > Make sure you're running the latest minor release (on the client > side in this case), because a related fix was issued last February. > For the 11 branch it was in version 11.2. I'm on 11.5-1 on the client side, but I had added an output redirection for this batch file to try to understand what was happening during the night runs, and that might be the reason why \g fails (i.e. some interference between the console redirections). I'll give it another try without these redirections. Regards -- Arnaud
Le 08/10/2019 à 12:55, Daniel Verite a écrit : > Testing this with 11.5, it works for me. > Make sure you're running the latest minor release (on the client > side in this case), because a related fix was issued last February. > For the 11 branch it was in version 11.2. OK, my bad, backslashes in a windows-style share path have to be escaped. Anyway, it hung using this syntax during last night's run. I'll give it another try tonight just to be sure. Regards -- Arnaud
On 10/8/19 12:06 AM, Arnaud L. wrote: > Le 07/10/2019 à 16:36, Adrian Klaver a écrit : >> So you are saying that you have not run the problematic line by itself? > > It hung during last night's run. > > I had modified my batch script to run the \copy commands separately, > i.e. it now reads as : > psql -h myserver -a mydb < originalscriptwithoutproblematicline.sql > psql -h myserver -a mydb < problematicline.sql This is going to be hard to troubleshoot if you change your commands. Previously you had: psql -h %MYPGSERVER% -a -f myscript.sql %MYPGDB% Changing more then one thing at a time makes it that much more difficult to isolate the issue. I would create an entirely separate batch file that runs just problematicline.sql. Given that you suspect Postgres it would be helpful to see the query that underlies the view you are copying. You might want to look at autoexplain: https://www.postgresql.org/docs/11/auto-explain.html as a way of getting information at run time. > > It hung at the problematic line, so during the second psql command. > > I'm really at loss... I *believe* that the problem lies either in psql > or in PostgreSQL, but I really don't know what to try now. > > > Regards > -- > Arnaud > -- Adrian Klaver adrian.klaver@aklaver.com
Le 08/10/2019 à 16:03, Adrian Klaver a écrit : > This is going to be hard to troubleshoot if you change your commands. > > Previously you had: > psql -h %MYPGSERVER% -a -f myscript.sql %MYPGDB% > > Changing more then one thing at a time makes it that much more difficult > to isolate the issue. Yes, true. Actually I do one change at a time, I'm just no posting every single attempt. So I tried to feed the script using console redirection rather than -f as you can see. That was the only change. > I would create an entirely separate batch file that runs just > problematicline.sql. OK, that's easy. Actually the batch file is not doing much more than running this psql command, but that's really not a problem/ > Given that you suspect Postgres it would be helpful to see the query > that underlies the view you are copying. > You might want to look at autoexplain: > > https://www.postgresql.org/docs/11/auto-explain.html > > as a way of getting information at run time. OK that's nice. Since I don't want to mess with the whole server configuration, I added some auto_explain settings to my script. So for tonight, my script looks like this : LOAD 'auto_explain'; SET auto_explain.log_min_duration = 0; SET auto_explain.log_analyze = true; COPY (SELECT * FROM myview) TO STDOUT \g '\\\\myserver\\myshare\\myfile.txt' And it'll run in a separate batch. Thanks for your help Adrian ! Regards -- Arnaud
Arnaud L. wrote: > Anyway, it hung using this syntax during last night's run. > I'll give it another try tonight just to be sure. When psql.exe is hanging, maybe you could use a tool like Process Monitor [1] or Process Explorer [2] to get insights about what it's stuck on or what it's doing exactly. [1] https://docs.microsoft.com/en-us/sysinternals/downloads/procmon [2] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Le 08/10/2019 à 16:55, Daniel Verite a écrit : > Arnaud L. wrote: > >> Anyway, it hung using this syntax during last night's run. >> I'll give it another try tonight just to be sure. > > When psql.exe is hanging, maybe you could use a tool like > Process Monitor [1] or Process Explorer [2] to get insights about > what it's stuck on or what it's doing exactly. > > [1] https://docs.microsoft.com/en-us/sysinternals/downloads/procmon > [2] https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer Yes, I already did that, and unfortunately there's no activy. There is absolutely no activity in procmon using psql.exe as a filter. process-xp is not giving more information, processor usage is 0%. Regards -- Arnaud
Le 08/10/2019 à 16:59, Arnaud L. a écrit : > Yes, I already did that, and unfortunately there's no activy. > There is absolutely no activity in procmon using psql.exe as a filter. > process-xp is not giving more information, processor usage is 0%. My apologies, I obviously did something wrong last time I checked this process with process-xp and procmon. Now I see that there IS activity on the problematic process ! 100% CPU and some (but not much) disk activity. I think I'll try to let it run for some time to get the auto_explain do its work. If I kill the backend now, I won't see anything I believe. I dont now it it'll ever complete this query though, it usually takes ~100 seconds, and here it has already been running for 9 hours. Regards -- Arnaud
On 10/10/19 12:00 AM, Arnaud L. wrote: > Le 08/10/2019 à 16:59, Arnaud L. a écrit : >> Yes, I already did that, and unfortunately there's no activy. >> There is absolutely no activity in procmon using psql.exe as a filter. >> process-xp is not giving more information, processor usage is 0%. > > My apologies, I obviously did something wrong last time I checked this > process with process-xp and procmon. > Now I see that there IS activity on the problematic process ! > 100% CPU and some (but not much) disk activity. > > I think I'll try to let it run for some time to get the auto_explain do > its work. If I kill the backend now, I won't see anything I believe. You can look at: pg_locks https://www.postgresql.org/docs/11/view-pg-locks.html and pg_stat_activity https://www.postgresql.org/docs/11/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW in real time to see if there is any useful information. > > I dont now it it'll ever complete this query though, it usually takes > ~100 seconds, and here it has already been running for 9 hours. > > Regards > -- > Arnaud > > > -- Adrian Klaver adrian.klaver@aklaver.com