Thread: BUG #15455: Endless lseek

BUG #15455: Endless lseek

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15455
Logged by:          Andrea Ferranti
Email address:      andrea.ferranti@wolterskluwer.com
PostgreSQL version: 9.6.10
Operating system:   Ubuntu 16
Description:

Dear all, 
We have experienced a strange behaviour on PostgreSQL 9.6.10. 
(for privacy reason I cannot copy the exact query and data on which query
are performed). 

We have a series of query that move data from a table to another by using
the following logic.

create table A;
insert data into A from B;
<insert remaining data into A from B>;
drop B;
rename A to B;

the <insert remaining data into A from B> step has the following logic:

INSERT INTO A SELECT X 
FROM B DEST_TABLE JOIN (
     SELECT --- 
     FROM B SOURCE_TABLE) 
     SOURCE_TABLE
     ON SOURCE_TABLE.S0 = DEST_TABLE.
     AND SOURCE_TABLE.S1 = DEST_TABLE.

the last query doesn't terminate. 
In particular, we have found that the inner select  (...FROM A JOIN B...)
generate a series of lseek as following.

(we have execute a strace of posgtreSQL process)

gettimeofday({1540302906, 733829}, NULL) = 0
epoll_wait(40<anon_inode:[eventpoll]>, {}, 1, 1802) = 0
close(40<anon_inode:[eventpoll]>)       = 0
gettimeofday({1540302908, 538020}, NULL) = 0
gettimeofday({1540302908, 538120}, NULL) = 0
write(2</var/log/postgresql/postgresql-9.6-main.log>, "[2018-10-23 13:55:08
UTC]:127.0."..., 321) = 321
gettimeofday({1540302908, 538310}, NULL) = 0
sendto(10,
"\2\0\0\0x\1\0\0\356\330\254\0\3\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
376, 0, NULL, 0) = 376
gettimeofday({1540302908, 538477}, NULL) = 0
sendto(11,
"\27\3\3\0Y\326\231\n\324\212\351W\26\311\363\t\316&\214\223\237\301o\30\361\0347\242\365\264-'"...,
94, 0, NULL, 0) = 94
recvfrom(11, 0x7fe363a94913, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(3<anon_inode:[eventpoll]>, {{EPOLLIN, {u32=1671705328,
u64=140614606009072}}}, 1, -1) = 1
recvfrom(11, "\27\3\3\0\210", 5, 0, NULL, NULL) = 5
recvfrom(11,
"\326\264\345\323\r\334\213\265\362_`\vC\347H\10P\375\33\255\37\317\317m\211\211T\17\221\361gg"...,
136, 0, NULL, NULL) = 136
gettimeofday({1540302908, 539005}, NULL) = 0
gettimeofday({1540302908, 539076}, NULL) = 0
gettimeofday({1540302908, 539146}, NULL) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={21600, 0}}, NULL) =
0
lseek(23</data2/postgresql/9.6/main/base/11327726/11357540>, 0, SEEK_END) =
193478656
lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
197910528
lseek(23</data2/postgresql/9.6/main/base/11327726/11357540>, 0, SEEK_END) =
193478656
lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
197910528
lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
197910528
lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
197910528
......
......
......

where 197910528 is the table A!.

by adding a "sleep" of 30 seconds before the <insert remaining data into A
from B>;  everything works and following is the strace of the correct
process:

gettimeofday({1540302790, 333762}, NULL) = 0
epoll_wait(39<anon_inode:[eventpoll]>, {}, 1, 23861) = 0
close(39<anon_inode:[eventpoll]>)       = 0
gettimeofday({1540302814, 219009}, NULL) = 0
gettimeofday({1540302814, 219126}, NULL) = 0
write(2</var/log/postgresql/postgresql-9.6-main.log>, "[2018-10-23 13:53:34
UTC]:127.0."..., 323) = 323
gettimeofday({1540302814, 219339}, NULL) = 0
sendto(10,
"\2\0\0\0\250\3\0\0\356\330\254\0\10\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
936, 0, NULL, 0) = 936
sendto(10,
"\2\0\0\0\230\0\0\0\356\330\254\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
152, 0, NULL, 0) = 152
gettimeofday({1540302814, 219572}, NULL) = 0
sendto(11,
"\27\3\3\0Y\326\231\n\324\212\351W\3T\26F\366\3344\"[\375\0262w\2120\33\256QY\375"...,
94, 0, NULL, 0) = 94
recvfrom(11, 0x7fe363a94913, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(3<anon_inode:[eventpoll]>, {{EPOLLIN, {u32=1671705328,
u64=140614606009072}}}, 1, -1) = 1
recvfrom(11, "\27\3\3\0\210", 5, 0, NULL, NULL) = 5
recvfrom(11,
"\326\264\345\323\r\334\213\243\207\216\321\30\225\33\"&\20\313\330\252\32g\r\217j\22i\244\373\321\0210"...,
136, 0, NULL, NULL) = 136
gettimeofday({1540302814, 221199}, NULL) = 0
gettimeofday({1540302814, 221267}, NULL) = 0
close(23</data2/postgresql/9.6/main/base/11327726/11357519>) = 0
gettimeofday({1540302814, 221434}, NULL) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={21600, 0}}, NULL) =
0
open("base/11327726/11357519", O_RDWR)  = 23
lseek(23</data2/postgresql/9.6/main/base/11327726/11357519>, 0, SEEK_END) =
193478656
lseek(34</data2/postgresql/9.6/main/base/11327726/11357525>, 0, SEEK_END) =
197910528
lseek(23</data2/postgresql/9.6/main/base/11327726/11357519>, 0, SEEK_END) =
193478656
lseek(34</data2/postgresql/9.6/main/base/11327726/11357525>, 0, SEEK_END) =
197910528
gettimeofday({1540302814, 313434}, NULL) = 0
write(2</var/log/postgresql/postgresql-9.6-main.log>, "[2018-10-23 13:53:34
UTC]:127.0."..., 406) = 406
gettimeofday({1540302814, 313683}, NULL) = 0
gettimeofday({1540302814, 313757}, NULL) = 0
sendto(11,
"\27\3\3\0N\326\231\n\324\212\351W\4\253\271:\215\321\253\307S\224\344tS>\16\225\222UYc"...,
83, 0, NULL, 0) = 83
recvfrom(11, 0x7fe363a94913, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(3<anon_inode:[eventpoll]>,  <detached ...>


What's the cause of this strange behaviour? What we can do?


Re: BUG #15455: Endless lseek

From
Dilip Kumar
Date:
On Tue, Oct 23, 2018 at 9:35 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      15455
> Logged by:          Andrea Ferranti
> Email address:      andrea.ferranti@wolterskluwer.com
> PostgreSQL version: 9.6.10
> Operating system:   Ubuntu 16
> Description:
>
I think you have already raised the same bug "BUG #15454: Endless
lseek",  Seems this is duplicate.


> Dear all,
> We have experienced a strange behaviour on PostgreSQL 9.6.10.
> (for privacy reason I cannot copy the exact query and data on which query
> are performed).
>
> We have a series of query that move data from a table to another by using
> the following logic.
>
> create table A;
> insert data into A from B;
> <insert remaining data into A from B>;
> drop B;
> rename A to B;
>
> the <insert remaining data into A from B> step has the following logic:
>
> INSERT INTO A SELECT X
> FROM B DEST_TABLE JOIN (
>      SELECT ---
>      FROM B SOURCE_TABLE)
>      SOURCE_TABLE
>      ON SOURCE_TABLE.S0 = DEST_TABLE.
>      AND SOURCE_TABLE.S1 = DEST_TABLE.
>
> the last query doesn't terminate.
> In particular, we have found that the inner select  (...FROM A JOIN B...)
> generate a series of lseek as following.
>
> (we have execute a strace of posgtreSQL process)
>
> gettimeofday({1540302906, 733829}, NULL) = 0
> epoll_wait(40<anon_inode:[eventpoll]>, {}, 1, 1802) = 0
> close(40<anon_inode:[eventpoll]>)       = 0
> gettimeofday({1540302908, 538020}, NULL) = 0
> gettimeofday({1540302908, 538120}, NULL) = 0
> write(2</var/log/postgresql/postgresql-9.6-main.log>, "[2018-10-23 13:55:08
> UTC]:127.0."..., 321) = 321
> gettimeofday({1540302908, 538310}, NULL) = 0
> sendto(10,
> "\2\0\0\0x\1\0\0\356\330\254\0\3\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 376, 0, NULL, 0) = 376
> gettimeofday({1540302908, 538477}, NULL) = 0
> sendto(11,
> "\27\3\3\0Y\326\231\n\324\212\351W\26\311\363\t\316&\214\223\237\301o\30\361\0347\242\365\264-'"...,
> 94, 0, NULL, 0) = 94
> recvfrom(11, 0x7fe363a94913, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> epoll_wait(3<anon_inode:[eventpoll]>, {{EPOLLIN, {u32=1671705328,
> u64=140614606009072}}}, 1, -1) = 1
> recvfrom(11, "\27\3\3\0\210", 5, 0, NULL, NULL) = 5
> recvfrom(11,
> "\326\264\345\323\r\334\213\265\362_`\vC\347H\10P\375\33\255\37\317\317m\211\211T\17\221\361gg"...,
> 136, 0, NULL, NULL) = 136
> gettimeofday({1540302908, 539005}, NULL) = 0
> gettimeofday({1540302908, 539076}, NULL) = 0
> gettimeofday({1540302908, 539146}, NULL) = 0
> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={21600, 0}}, NULL) =
> 0
> lseek(23</data2/postgresql/9.6/main/base/11327726/11357540>, 0, SEEK_END) =
> 193478656
> lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
> 197910528
> lseek(23</data2/postgresql/9.6/main/base/11327726/11357540>, 0, SEEK_END) =
> 193478656
> lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
> 197910528
> lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
> 197910528
> lseek(35</data2/postgresql/9.6/main/base/11327726/11357546>, 0, SEEK_END) =
> 197910528
> ......
> ......
> ......
>
> where 197910528 is the table A!.
>
> by adding a "sleep" of 30 seconds before the <insert remaining data into A
> from B>;  everything works and following is the strace of the correct
> process:
>
> gettimeofday({1540302790, 333762}, NULL) = 0
> epoll_wait(39<anon_inode:[eventpoll]>, {}, 1, 23861) = 0
> close(39<anon_inode:[eventpoll]>)       = 0
> gettimeofday({1540302814, 219009}, NULL) = 0
> gettimeofday({1540302814, 219126}, NULL) = 0
> write(2</var/log/postgresql/postgresql-9.6-main.log>, "[2018-10-23 13:53:34
> UTC]:127.0."..., 323) = 323
> gettimeofday({1540302814, 219339}, NULL) = 0
> sendto(10,
> "\2\0\0\0\250\3\0\0\356\330\254\0\10\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 936, 0, NULL, 0) = 936
> sendto(10,
> "\2\0\0\0\230\0\0\0\356\330\254\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 152, 0, NULL, 0) = 152
> gettimeofday({1540302814, 219572}, NULL) = 0
> sendto(11,
> "\27\3\3\0Y\326\231\n\324\212\351W\3T\26F\366\3344\"[\375\0262w\2120\33\256QY\375"...,
> 94, 0, NULL, 0) = 94
> recvfrom(11, 0x7fe363a94913, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> epoll_wait(3<anon_inode:[eventpoll]>, {{EPOLLIN, {u32=1671705328,
> u64=140614606009072}}}, 1, -1) = 1
> recvfrom(11, "\27\3\3\0\210", 5, 0, NULL, NULL) = 5
> recvfrom(11,
> "\326\264\345\323\r\334\213\243\207\216\321\30\225\33\"&\20\313\330\252\32g\r\217j\22i\244\373\321\0210"...,
> 136, 0, NULL, NULL) = 136
> gettimeofday({1540302814, 221199}, NULL) = 0
> gettimeofday({1540302814, 221267}, NULL) = 0
> close(23</data2/postgresql/9.6/main/base/11327726/11357519>) = 0
> gettimeofday({1540302814, 221434}, NULL) = 0
> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={21600, 0}}, NULL) =
> 0
> open("base/11327726/11357519", O_RDWR)  = 23
> lseek(23</data2/postgresql/9.6/main/base/11327726/11357519>, 0, SEEK_END) =
> 193478656
> lseek(34</data2/postgresql/9.6/main/base/11327726/11357525>, 0, SEEK_END) =
> 197910528
> lseek(23</data2/postgresql/9.6/main/base/11327726/11357519>, 0, SEEK_END) =
> 193478656
> lseek(34</data2/postgresql/9.6/main/base/11327726/11357525>, 0, SEEK_END) =
> 197910528
> gettimeofday({1540302814, 313434}, NULL) = 0
> write(2</var/log/postgresql/postgresql-9.6-main.log>, "[2018-10-23 13:53:34
> UTC]:127.0."..., 406) = 406
> gettimeofday({1540302814, 313683}, NULL) = 0
> gettimeofday({1540302814, 313757}, NULL) = 0
> sendto(11,
> "\27\3\3\0N\326\231\n\324\212\351W\4\253\271:\215\321\253\307S\224\344tS>\16\225\222UYc"...,
> 83, 0, NULL, 0) = 83
> recvfrom(11, 0x7fe363a94913, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> epoll_wait(3<anon_inode:[eventpoll]>,  <detached ...>
>
>
> What's the cause of this strange behaviour? What we can do?
>


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: BUG #15455: Endless lseek

From
Andres Freund
Date:
On 2018-10-24 11:14:29 +0530, Dilip Kumar wrote:
> On Tue, Oct 23, 2018 at 9:35 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference:      15455
> > Logged by:          Andrea Ferranti
> > Email address:      andrea.ferranti@wolterskluwer.com
> > PostgreSQL version: 9.6.10
> > Operating system:   Ubuntu 16
> > Description:
> >
> I think you have already raised the same bug "BUG #15454: Endless
> lseek",  Seems this is duplicate.

I think the other bug is incomplete / truncated...

> > We have experienced a strange behaviour on PostgreSQL 9.6.10.
> > (for privacy reason I cannot copy the exact query and data on which query
> > are performed).
> >
> > We have a series of query that move data from a table to another by using
> > the following logic.
> >
> > create table A;
> > insert data into A from B;
> > <insert remaining data into A from B>;
> > drop B;
> > rename A to B;
> >
> > the <insert remaining data into A from B> step has the following logic:
> >
> > INSERT INTO A SELECT X
> > FROM B DEST_TABLE JOIN (
> >      SELECT ---
> >      FROM B SOURCE_TABLE)
> >      SOURCE_TABLE
> >      ON SOURCE_TABLE.S0 = DEST_TABLE.
> >      AND SOURCE_TABLE.S1 = DEST_TABLE.
> >
> > the last query doesn't terminate.
> > In particular, we have found that the inner select  (...FROM A JOIN B...)
> > generate a series of lseek as following.
> >
> > (we have execute a strace of posgtreSQL process)

> > by adding a "sleep" of 30 seconds before the <insert remaining data into A
> > from B>;  everything works and following is the strace of the correct
> > process:

I assume this might "just" be a chance for autovacuum to analyze the
table. If you do an EXPLAIN of the query both with the 30s wait and
without, does the plan change?  Does adding an explicit ANALYZE of both
a and b before inserting fix the issue?

Greetings,

Andres Freund


Re: BUG #15455: Endless lseek

From
"Ferranti, Andrea"
Date:

AET_AW_001_000014_000007 is table A

JOIN WITHOUT INDEX su AET_AW_001_000014_000007

3000 rows : Execution time 1,5s

--------------------------------------------------------------------------------------------------------------------------------------Nested Loop Anti Join  (cost=0.00..204.02 rows=1 width=4) (actual time=1535.056..1535.056 rows=0 loops=1)  Join Filter: ((d.oid)::text = (s.oid)::text)  Rows Removed by Join Filter: 4498500  ->  Seq Scan on aw_001_000014_000007 s  (cost=0.00..100.00 rows=1 width=90) (actual time=0.003..0.774 rows=3000 loops=1)  ->  Materialize  (cost=0.00..104.00 rows=1 width=90) (actual time=0.000..0.208 rows=1500 loops=3000)        ->  Seq Scan on aet_aw_001_000014_000007 d  (cost=0.00..104.00 rows=1 width=90) (actual time=0.003..0.786 rows=3000 loops=1)Planning time: 0.166 msExecution time: 1535.096 ms

30.000 rows : Execution time214s

 ----------------------------------------------------------------------------------------------------------------------------------------Nested Loop Anti Join  (cost=0.00..2064.02 rows=1 width=4) (actual time=214598.223..214598.223 rows=0 loops=1)  Join Filter: ((d.oid)::text = (s.oid)::text)  Rows Removed by Join Filter: 449985000  ->  Seq Scan on aw_001_000014_000007 s  (cost=0.00..1020.00 rows=1 width=90) (actual time=0.004..12.034 rows=30000 loops=1)  ->  Materialize  (cost=0.00..1044.01 rows=1 width=90) (actual time=0.000..3.114 rows=15000 loops=30000)        ->  Seq Scan on aet_aw_001_000014_000007 d  (cost=0.00..1044.00 rows=1 width=90) (actual time=0.004..9.235 rows=30000 loops=1)Planning time: 0.246 msExecution time: 214598.430 ms

600.000 rows Execution time: Undefined (exponential)
----------------
not recorded

JOIN WITH INDEX on AET_AW_001_000014_000007

600.000 rows : Execution time 1,8s

-------------------------------------------------------------------------------------------------------------------------------------------------------------Nested Loop Anti Join  (cost=0.42..23620.53 rows=1 width=4) (actual time=1850.355..1850.355 rows=0 loops=1)  ->  Seq Scan on aw_001_000014_000007 s  (cost=0.00..23618.00 rows=1 width=90) (actual time=0.005..164.216 rows=695521 loops=1)  ->  Index Only Scan using aet_index on aet_aw_001_000014_000007 d  (cost=0.42..3659.39 rows=3478 width=90) (actual time=0.002..0.002 rows=1 loops=695521)        Index Cond: (oid = (s.oid)::text)        Heap Fetches: 695521Planning time: 0.195 msExecution time: 1850.392 ms

JOIN WITHOUT INDEX but WITH SLEEP(30) SINCE THE CREATION of AET_AW_001_000014_000007

600.000 rows : Execution time 0,8s

-------------------------------------------------------------------------------------------------------------------------------------------------Hash Anti Join  (cost=39808.22..72989.64 rows=1 width=4) (actual time=863.544..863.544 rows=0 loops=1)  Hash Cond: ((s.oid)::text = (d.oid)::text)  ->  Seq Scan on aw_001_000014_000007 s  (cost=0.00..30573.21 rows=695521 width=37) (actual time=0.004..151.082 rows=695521 loops=1)  ->  Hash  (cost=31114.21..31114.21 rows=695521 width=37) (actual time=419.699..419.700 rows=695521 loops=1)        Buckets: 1048576  Batches: 1  Memory Usage: 55059kB        ->  Seq Scan on aet_aw_001_000014_000007 d  (cost=0.00..31114.21 rows=695521 width=37) (actual time=0.006..180.611 rows=695521 loops=1)Planning time: 0.576 msExecution time: 863.772 ms


Da: Andres Freund <andres@anarazel.de>
Inviato: mercoledì 24 ottobre 2018 10:31:36
A: Dilip Kumar
Cc: Ferranti, Andrea; PostgreSQL mailing lists
Oggetto: Re: BUG #15455: Endless lseek
 
On 2018-10-24 11:14:29 +0530, Dilip Kumar wrote:
> On Tue, Oct 23, 2018 at 9:35 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference:      15455
> > Logged by:          Andrea Ferranti
> > Email address:      andrea.ferranti@wolterskluwer.com
> > PostgreSQL version: 9.6.10
> > Operating system:   Ubuntu 16
> > Description:
> >
> I think you have already raised the same bug "BUG #15454: Endless
> lseek",  Seems this is duplicate.

I think the other bug is incomplete / truncated...

> > We have experienced a strange behaviour on PostgreSQL 9.6.10.
> > (for privacy reason I cannot copy the exact query and data on which query
> > are performed).
> >
> > We have a series of query that move data from a table to another by using
> > the following logic.
> >
> > create table A;
> > insert data into A from B;
> > <insert remaining data into A from B>;
> > drop B;
> > rename A to B;
> >
> > the <insert remaining data into A from B> step has the following logic:
> >
> > INSERT INTO A SELECT X
> > FROM B DEST_TABLE JOIN (
> >      SELECT ---
> >      FROM B SOURCE_TABLE)
> >      SOURCE_TABLE
> >      ON SOURCE_TABLE.S0 = DEST_TABLE.
> >      AND SOURCE_TABLE.S1 = DEST_TABLE.
> >
> > the last query doesn't terminate.
> > In particular, we have found that the inner select  (...FROM A JOIN B...)
> > generate a series of lseek as following.
> >
> > (we have execute a strace of posgtreSQL process)

> > by adding a "sleep" of 30 seconds before the <insert remaining data into A
> > from B>;  everything works and following is the strace of the correct
> > process:

I assume this might "just" be a chance for autovacuum to analyze the
table. If you do an EXPLAIN of the query both with the 30s wait and
without, does the plan change?  Does adding an explicit ANALYZE of both
a and b before inserting fix the issue?

Greetings,

Andres Freund