Thread: [BUGS] Segfault 11 on PG10 with max_parallel_workers_per_gather>3
Hi I can reproduce a segfault by executing a query. I run Postgresql 10.0-1.pgdg16.04+1 on Ubuntu 16.04.3 The machine has hyperthreading enabled and 48 virtual cores: 2xE5-2690v3 I have a materialized view: refresh materialized view concurrently ; --works results.as_20171025_20170930_ut78777; --works set max_parallel_workers_per_gather to 0; SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE (((oadr_gkz IN (2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000)) AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000)))); --works: 129587 set max_parallel_workers_per_gather to 3; SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE (((oadr_gkz IN (2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000)) AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000)))); --works: 129587 set max_parallel_workers_per_gather to 4; SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE (((oadr_gkz IN (2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000)) AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000)))); --SEGFAULT! set max_parallel_workers_per_gather to 4; explain SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE ((((oart_zwangsversteigerung_janein IS NULL)) AND (oadr_gkz IN (2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000)) AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS NULL OR enddate>='2012-01-01')) OR (((oart_zwangsversteigerung_janein IS NULL)) AND (oadr_gkz IN (2000000)))) "Finalize Aggregate (cost=186411.37..186411.38 rows=1 width=8)" " -> Gather (cost=186410.95..186411.36 rows=4 width=8)" " Workers Planned: 4" " -> Partial Aggregate (cost=185410.95..185410.96 rows=1 width=8)" " -> Parallel Bitmap Heap Scan on as_20171025_20170930_ut78777 rt (cost=12058.69..185353.14 rows=23121 width=0)" " Recheck Cond: (((oadr_gkz = ANY ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[])) AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))" " Filter: ((oart_zwangsversteigerung_janein IS NULL) AND (((oadr_gkz = ANY ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[])) AND (objekttyp_grob = 1 (...)" " -> BitmapOr (cost=12058.69..12058.69 rows=94046 width=0)" " -> BitmapAnd (cost=11726.20..11726.20 rows=76321 width=0)" " -> Bitmap Index Scan on as_20171025_20170930_ut78777_oadr_gkz_wnnidx (cost=0.00..3129.41 rows=185997 width=0)" " Index Cond: (oadr_gkz = ANY ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))" " -> Bitmap Index Scan on as_20171025_20170930_ut78777_objekttyp_grob_idx (cost=0.00..8550.30 rows=491449 width=0)" " Index Cond: (objekttyp_grob = 1)" " -> Bitmap Index Scan on as_20171025_20170930_ut78777_oadr_gkz_wnnidx (cost=0.00..309.37 rows=17726 width=0)" " Index Cond: (oadr_gkz = 2000000)" And the postgresql-10.log says: >2017-10-25 13:45:35.149 CEST [6345] LOG: Serverprozess (PID 25637) wurde von Signal 11 beendet: Segmentation fault >2017-10-25 13:45:35.149 CEST [6345] DETAIL: Der fehlgeschlagene Prozess führte aus: ... >2017-10-25 13:42:14.332 CEST [25629] LOG: Redo beginnt bei 108/449A9D98 >2017-10-25 13:42:14.396 CEST [25629] LOG: unerwartete Pageaddr 107/6F8CC000 in Logsegment 000000010000010800000045, Offset 9224192 >2017-10-25 13:42:14.396 CEST [25629] LOG: Redo fertig bei 108/458CA968 I upgraded Postgresql using pg_upgrade with hard links a few days ago. This view has not been upgraded from PG9.6 to 10, but has been created freshly on PG10 this morning. Other related settings in postgresql.conf are: >max_worker_processes = 12 >max_parallel_workers_per_gather = 4 >max_parallel_workers = 12 So what I fugured out is that it only crashed when I increase max_parallel_workers_per_gather to more than 3. Probably I missunderstood some of the max_parallel_-Setting and i do bogus, but the Database should probably not segfault... How I can I help you with more information? Steve -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Stefan Tzeggai <tzeggai@empirica-systeme.de> writes: > I can reproduce a segfault by executing a query. That sounds like a bug, all right, but you've not provided enough detail for anyone else to reproduce it. A self-contained test case would be the best thing. If you can't provide that, it's possible that a stack trace from the core dump would be enough info to diagnose the problem, but no promises ... https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hi To be precise I can only reproduce the bug about 20% of the times I execute the query. I have to run the query four, five times, it it crashes. Reproduced that many times. I have a feeling that it hast todo with the number of parallel workers that the planner starts. I found no way to force it to any number. I have seen this segfault on at least two mashines (running the same application with same data). Have not seen it since I lowered max_parallel_workers_per_gather to 2. I tried to generate a table+matview+indexes etc. to reproduce the crash from scratch, but i had no success so far. I also tried to get a sensible stack trace. I attached 9 gdb to all postgres-pids and when I triggered the crash, two of the gdb had some output and produced something on 'bt'. Attached.. If I would be able to dump the relevant data from my db and I would be able to reproduce the crash with it on a fresh PG10 install - Would anyone have time to look at it? I guess its would no more than 50Mb... I am happy to help as good as i can, Steve Program received signal SIGUSR1, User defined signal 1. 0x00007f12334039b3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 in ../sysdeps/unix/syscall-template.S Continuing. Program received signal SIGUSR1, User defined signal 1. 0x00007f12334039b3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 in ../sysdeps/unix/syscall-template.S #0 0x00007f12334039b3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x00005564bcaccd01 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffce2d47e90, cur_timeout=200, set=0x5564beab53a8) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:1048 #2 WaitEventSetWait (set=set@entry=0x5564beab53a8, timeout=timeout@entry=200, occurred_events=occurred_events@entry=0x7ffce2d47e90, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=83886093) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:1000 #3 0x00005564bcacd174 in WaitLatchOrSocket (latch=0x7f1227241be4, wakeEvents=wakeEvents@entry=25, sock=sock@entry=-1, timeout=200, wait_event_info=wait_event_info@entry=83886093) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:385 #4 0x00005564bcacd225 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=25, timeout=<optimized out>, wait_event_info=wait_event_info@entry=83886093) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/storage/ipc/latch.c:339 #5 0x00005564bca8193f in WalWriterMain () at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/walwriter.c:293 #6 0x00005564bc8c0401 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffce2d48070) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/bootstrap/bootstrap.c:442 #7 0x00005564bca7cd83 in StartChildProcess (type=WalWriterProcess) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:5313 #8 0x00005564bca7e11a in reaper (postgres_signal_arg=<optimized out>) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:2871 #9 <signal handler called> #10 0x00007f12333f9573 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #11 0x00005564bc82a489 in ServerLoop () at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1717 #12 0x00005564bca7fa6b in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1361 #13 0x00005564bc82c2d5 in main (argc=5, argv=0x5564bea7a850) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/main/main.c:228 ########## second one: Continuing. Program received signal SIGUSR1, User defined signal 1. 0x00007f12333f9573 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 in ../sysdeps/unix/syscall-template.S #0 0x00007f12333f9573 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x00005564bc82a489 in ServerLoop () at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1717 #2 0x00005564bca7fa6b in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/postmaster/postmaster.c:1361 #3 0x00005564bc82c2d5 in main (argc=5, argv=0x5564bea7a850) at /build/postgresql-10-YNofMT/postgresql-10-10.0/build/../src/backend/main/main.c:228 -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
Stefan Tzeggai <tzeggai@empirica-systeme.de> writes: > I also tried to get a sensible stack trace. I attached 9 gdb to all > postgres-pids and when I triggered the crash, two of the gdb had some > output and produced something on 'bt'. Attached.. Those look like normal operation --- SIGUSR1 isn't a crash condition, it's what PG normally uses to wake up a sleeping process. If you want to attach gdb before provoking the crash, you need to tell it to ignore SIGUSR1 (I think "handle SIGUSR1 pass nostop noprint" is the right incantation). It might be easier to enable core files ("ulimit -c unlimited" before starting the postmaster) and then gdb the core files. > If I would be able to dump the relevant data from my db and I would be > able to reproduce the crash with it on a fresh PG10 install - Would > anyone have time to look at it? I guess its would no more than 50Mb... Sure, I or somebody else would look at it. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Just to keep you updated: I am on holydays this week. I will put more time into this next week. Steve Am 25.10.2017 um 22:41 schrieb Tom Lane: >> If I would be able to dump the relevant data from my db and I would be >> able to reproduce the crash with it on a fresh PG10 install - Would >> anyone have time to look at it? I guess its would no more than 50Mb... > > Sure, I or somebody else would look at it. > > regards, tom lane > > -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hi The segfaults thrown when I run my application on PG10 got worse. I have found more segfaults even when max_parallel_workers_per_gather is left default. I have been able to create a 6Mb PG10 dump that I can be imported into a vigin PG10 install on Ubuntu 16.04 and I have a query that segfaults PG10 100% (on my mashines at least). The dataset has been wiped of sensitive data, but it still should not go public. I will send a password by email to all PG-hackers interested. https://oc.empirica-systeme.de/index.php/s/0XLKObTrUjRlCV7 The dump contains a * a table with lots of columns called "basedata", 56k rows * a mat view created as select * from basedatacalled "mv", 56k rows * Lots of btree indexes on most of the mv-colums I do the following on my laptop running latest Ubuntu 16.04 with the PG-APT-Repository: #!PURGING ALL PG STUFF HERE TO GET A CLEAN START!! sudo apt-get purge postgresql-9.6 postgresql-10 postgresql-10-postgis-2.4-scripts postgresql-10-postgis-2.4 sudo rm /etc/postgresql -rf # Installing 10.0-1.pgdg16.04+1 sudo apt install postgresql-10 sudo su postgres dropdb analyst createdb analyst SELECT count(1) FROM mv WHERE ((nachfrageart IN (1)) AND (oadr_gkz IN (6611000) OR oadr_kkz IN (3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6535,6631,6632,6633,6634,6635,6636,7315,8125,8215,8221,8222,9663)) AND (objekttyp_grob IN (1,2)) AND (startdate>='2012-01-01' OR enddate IS NULL OR enddate>='2012-01-01'))OR ((nachfrageart IN (0)) AND (nutzungsart IN (0)) AND (oadr_gkz IN (6611000,8121000,8212000) OR oadr_kkz IN (3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6631,6633,7315,8125,8221,8222,9663)) AND (objekttyp_grob IN (1,2,3)) AND (startdate>='2012-01-01' OR enddate IS NULL OR enddate>='2012-01-01')); 100% of the times segfault when running above query. Tested on thrwew Ubuntu 16.04 servers and one Ubuntu 16.04 Desktop. Where data comes from: The data is created on a PG9.6 mashine daily, dumped, and imported into PG10. The whole dataflow is stable with PG9.6. I have seen the problem with every fresh dataset. I hope this finally makes the bug reproducable to you. If it does not segfault on your mashine, please try to increase max_parallel_workers_per_gather to 5. I am very sorry that I didn't test PG10 earlier when it was beta. I guess the current bughunt makes it more likely that I will test PG11 beta with my application. Promised! Greetings,Steve Am 25.10.2017 um 22:41 schrieb Tom Lane: > Stefan Tzeggai <tzeggai@empirica-systeme.de> writes: >> I also tried to get a sensible stack trace. I attached 9 gdb to all >> postgres-pids and when I triggered the crash, two of the gdb had some >> output and produced something on 'bt'. Attached.. > > Those look like normal operation --- SIGUSR1 isn't a crash condition, > it's what PG normally uses to wake up a sleeping process. If you > want to attach gdb before provoking the crash, you need to tell it > to ignore SIGUSR1 (I think "handle SIGUSR1 pass nostop noprint" > is the right incantation). > > It might be easier to enable core files ("ulimit -c unlimited" before > starting the postmaster) and then gdb the core files. > >> If I would be able to dump the relevant data from my db and I would be >> able to reproduce the crash with it on a fresh PG10 install - Would >> anyone have time to look at it? I guess its would no more than 50Mb... > > Sure, I or somebody else would look at it. > > regards, tom lane > > -- Stefan Tzeggai -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Nov 8, 2017 at 5:20 PM, Stefan Tzeggai <tzeggai@empirica-systeme.de> wrote: > Hi > > The segfaults thrown when I run my application on PG10 got worse. I have > found more segfaults even when max_parallel_workers_per_gather is left > default. > > I have been able to create a 6Mb PG10 dump that I can be imported into a > vigin PG10 install on Ubuntu 16.04 and I have a query that segfaults > PG10 100% (on my mashines at least). > > The dataset has been wiped of sensitive data, but it still should not go > public. I will send a password by email to all PG-hackers interested. > Can you share the password with me? My guess is that this is related to Parallel Bitmap Heap Scan as that is a newly introduced feature in PG10 and the previous email shows that in the plan. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Oct 25, 2017 at 5:46 PM, Stefan Tzeggai <tzeggai@empirica-systeme.de> wrote: > "Finalize Aggregate (cost=186411.37..186411.38 rows=1 width=8)" > " -> Gather (cost=186410.95..186411.36 rows=4 width=8)" > " Workers Planned: 4" > " -> Partial Aggregate (cost=185410.95..185410.96 rows=1 width=8)" > " -> Parallel Bitmap Heap Scan on > as_20171025_20170930_ut78777 rt (cost=12058.69..185353.14 rows=23121 > width=0)" > " Recheck Cond: (((oadr_gkz = ANY > ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[])) > AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))" > " Filter: ((oart_zwangsversteigerung_janein IS NULL) > AND (((oadr_gkz = ANY > ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[])) > AND (objekttyp_grob = 1 (...)" > " -> BitmapOr (cost=12058.69..12058.69 rows=94046 > width=0)" > " -> BitmapAnd (cost=11726.20..11726.20 > rows=76321 width=0)" > " -> Bitmap Index Scan on By looking at the plan it seems like the issue what got fixed in below commit. Author: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:23:28 Committer: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:35:14 Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref() so it won't lie to aggregate final functions.) Child: cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's ability to check for improperly-nested aggregates.) Branch: remotes/origin/REL_10_STABLE Follows: REL_10_0 Precedes: REL_10_1 Fix possible crash with Parallel Bitmap Heap Scan. If a Parallel Bitmap Heap scan's chain of leftmost descendents includes a BitmapOr whose first child is a BitmapAnd,the prior coding would mistakenly create a non-shared TIDBitmap and then try to perform shared iteration. Report by Tomas Vondra. Patch by Dilip Kumar. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Sorry, I missed a line: > # Installing 10.0-1.pgdg16.04+1 > sudo apt install postgresql-10 > sudo su postgres > dropdb analyst > createdb analyst AND THEN pg_restore -U postgres -v -d analyst -Fc dumpSegfaultSmaller.backup Sorry Am 08.11.2017 um 12:50 schrieb Stefan Tzeggai: > Hi > > The segfaults thrown when I run my application on PG10 got worse. I have > found more segfaults even when max_parallel_workers_per_gather is left > default. > > I have been able to create a 6Mb PG10 dump that I can be imported into a > vigin PG10 install on Ubuntu 16.04 and I have a query that segfaults > PG10 100% (on my mashines at least). > > The dataset has been wiped of sensitive data, but it still should not go > public. I will send a password by email to all PG-hackers interested. > > https://oc.empirica-systeme.de/index.php/s/0XLKObTrUjRlCV7 > > The dump contains a > * a table with lots of columns called "basedata", 56k rows > * a mat view created as select * from basedata called "mv", 56k rows > * Lots of btree indexes on most of the mv-colums > > I do the following on my laptop running latest Ubuntu 16.04 with the > PG-APT-Repository: > > #!PURGING ALL PG STUFF HERE TO GET A CLEAN START!! > sudo apt-get purge postgresql-9.6 postgresql-10 > postgresql-10-postgis-2.4-scripts postgresql-10-postgis-2.4 > sudo rm /etc/postgresql -rf > > # Installing 10.0-1.pgdg16.04+1 > sudo apt install postgresql-10 > sudo su postgres > dropdb analyst > createdb analyst > > > SELECT count(1) FROM mv WHERE > ((nachfrageart IN (1)) AND (oadr_gkz IN (6611000) OR oadr_kkz IN > (3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6535,6631,6632,6633,6634,6635,6636,7315,8125,8215,8221,8222,9663)) > AND (objekttyp_grob IN (1,2)) AND (startdate>='2012-01-01' OR enddate IS > NULL OR enddate>='2012-01-01')) > OR > ((nachfrageart IN (0)) AND (nutzungsart IN (0)) AND (oadr_gkz IN > (6611000,8121000,8212000) OR oadr_kkz IN > (3152,5111,5113,5158,5162,5314,5315,5362,5378,5382,5515,5711,6411,6412,6413,6414,6431,6432,6433,6434,6435,6436,6438,6439,6440,6531,6532,6534,6631,6633,7315,8125,8221,8222,9663)) > AND (objekttyp_grob IN (1,2,3)) AND (startdate>='2012-01-01' OR enddate > IS NULL OR enddate>='2012-01-01')); > > 100% of the times segfault when running above query. Tested on thrwew > Ubuntu 16.04 servers and one Ubuntu 16.04 Desktop. > > Where data comes from: The data is created on a PG9.6 mashine daily, > dumped, and imported into PG10. The whole dataflow is stable with PG9.6. > I have seen the problem with every fresh dataset. > > I hope this finally makes the bug reproducable to you. If it does not > segfault on your mashine, please try to increase > max_parallel_workers_per_gather to 5. > > I am very sorry that I didn't test PG10 earlier when it was beta. I > guess the current bughunt makes it more likely that I will test PG11 > beta with my application. Promised! > > Greetings, > Steve > > > Am 25.10.2017 um 22:41 schrieb Tom Lane: >> Stefan Tzeggai <tzeggai@empirica-systeme.de> writes: >>> I also tried to get a sensible stack trace. I attached 9 gdb to all >>> postgres-pids and when I triggered the crash, two of the gdb had some >>> output and produced something on 'bt'. Attached.. >> >> Those look like normal operation --- SIGUSR1 isn't a crash condition, >> it's what PG normally uses to wake up a sleeping process. If you >> want to attach gdb before provoking the crash, you need to tell it >> to ignore SIGUSR1 (I think "handle SIGUSR1 pass nostop noprint" >> is the right incantation). >> >> It might be easier to enable core files ("ulimit -c unlimited" before >> starting the postmaster) and then gdb the core files. >> >>> If I would be able to dump the relevant data from my db and I would be >>> able to reproduce the crash with it on a fresh PG10 install - Would >>> anyone have time to look at it? I guess its would no more than 50Mb... >> >> Sure, I or somebody else would look at it. >> >> regards, tom lane >> >> > -- empirica-systeme GmbH Stefan Tzeggai Brunsstr. 31 72074 Tübingen email tzeggai@empirica-systeme.de phone +49 7071 6392922 mobile +49 176 40 38 9559 "Wer nichts zu verbergen hat, braucht auch keine Hose!" -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hi Am 08.11.2017 um 14:24 schrieb Dilip Kumar: > By looking at the plan it seems like the issue what got fixed in below commit. >> > Author: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:23:28 > Committer: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:35:14 > Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref() > so it won't lie to aggregate final functions.) > Child: cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's > ability to check for improperly-nested aggregates.) > Branch: remotes/origin/REL_10_STABLE > Follows: REL_10_0 > Precedes: REL_10_1 > > Fix possible crash with Parallel Bitmap Heap Scan. > > If a Parallel Bitmap Heap scan's chain of leftmost descendents > includes a BitmapOr whose first child is a BitmapAnd, the prior coding > would mistakenly create a non-shared TIDBitmap and then try to perform > shared iteration. > > Report by Tomas Vondra. Patch by Dilip Kumar. Do I understand it correctly, that that fix would be released with 10.1 tomorrow (according to https://www.postgresql.org/developer/roadmap/) and I could then test it? Steve Am 08.11.2017 um 14:24 schrieb Dilip Kumar: > On Wed, Oct 25, 2017 at 5:46 PM, Stefan Tzeggai > <tzeggai@empirica-systeme.de> wrote: > >> "Finalize Aggregate (cost=186411.37..186411.38 rows=1 width=8)" >> " -> Gather (cost=186410.95..186411.36 rows=4 width=8)" >> " Workers Planned: 4" >> " -> Partial Aggregate (cost=185410.95..185410.96 rows=1 width=8)" >> " -> Parallel Bitmap Heap Scan on >> as_20171025_20170930_ut78777 rt (cost=12058.69..185353.14 rows=23121 >> width=0)" >> " Recheck Cond: (((oadr_gkz = ANY >> ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[])) >> AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))" >> " Filter: ((oart_zwangsversteigerung_janein IS NULL) >> AND (((oadr_gkz = ANY >> ('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[])) >> AND (objekttyp_grob = 1 (...)" >> " -> BitmapOr (cost=12058.69..12058.69 rows=94046 >> width=0)" >> " -> BitmapAnd (cost=11726.20..11726.20 >> rows=76321 width=0)" >> " -> Bitmap Index Scan on > > > By looking at the plan it seems like the issue what got fixed in below commit. > > Author: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:23:28 > Committer: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:35:14 > Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref() > so it won't lie to aggregate final functions.) > Child: cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's > ability to check for improperly-nested aggregates.) > Branch: remotes/origin/REL_10_STABLE > Follows: REL_10_0 > Precedes: REL_10_1 > > Fix possible crash with Parallel Bitmap Heap Scan. > > If a Parallel Bitmap Heap scan's chain of leftmost descendents > includes a BitmapOr whose first child is a BitmapAnd, the prior coding > would mistakenly create a non-shared TIDBitmap and then try to perform > shared iteration. > > Report by Tomas Vondra. Patch by Dilip Kumar. > -- empirica-systeme GmbH Stefan Tzeggai Brunsstr. 31 72074 Tübingen email tzeggai@empirica-systeme.de phone +49 7071 6392922 mobile +49 176 40 38 9559 "Wer nichts zu verbergen hat, braucht auch keine Hose!" -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Nov 8, 2017 at 7:00 PM, Stefan Tzeggai <tzeggai@empirica-systeme.de> wrote: > Sorry, I missed a line: > >> # Installing 10.0-1.pgdg16.04+1 >> sudo apt install postgresql-10 >> sudo su postgres >> dropdb analyst >> createdb analyst > > AND THEN pg_restore -U postgres -v -d analyst -Fc > dumpSegfaultSmaller.backup > Thanks for the information, I could reproduce the issue at v10 stamp commit[1] and it's fixed at [2], I have also verified from the core dump that issue is same what got fixed at [2] [1] commit 5df0e99bea1c3e5fbffa7fbd0982da88ea149bb6 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon Oct 2 17:09:15 2017 -0400 Stamp 10.0. [2] commit a3b1c221893f739950e9232b4b789750f247cee5 Author: Robert Haas <rhaas@postgresql.org> Date: Fri Oct 13 14:53:28 2017 -0400 Fix possible crash with Parallel Bitmap Heap Scan. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Dear Dilip Am 08.11.2017 um 16:15 schrieb Dilip Kumar: > Thanks for the information, I could reproduce the issue at v10 stamp > commit[1] and > it's fixed at [2], I have also verified from the core dump that issue > is same what > got fixed at [2] > > [1] > commit 5df0e99bea1c3e5fbffa7fbd0982da88ea149bb6 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Mon Oct 2 17:09:15 2017 -0400 > > Stamp 10.0. > > [2] > commit a3b1c221893f739950e9232b4b789750f247cee5 > Author: Robert Haas <rhaas@postgresql.org> > Date: Fri Oct 13 14:53:28 2017 -0400 > > Fix possible crash with Parallel Bitmap Heap Scan. > Thats great news! Just a pitty that I invested all afternoon yesterday to fiddle together that tiny dataset to redprouce the bug. But thats how live goes ;-) It was still a great experience to see how prefessional and fast you guys look at the bugs! +1 And I will test 11 RC next year! Last question before I downgrade my PG 10 installations... The fix will be released this week with 10.1 ? https://www.postgresql.org/developer/roadmap/ Thanks again Steve -- Stefan Tzeggai "Wer nichts zu verbergen hat, braucht auch keine Hose!" -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Stefan Tzeggai <tzeggai@empirica-systeme.de> writes: > Last question before I downgrade my PG 10 installations... The fix will > be released this week with 10.1 ? Yes, it's in 10.1. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Nov 8, 2017 at 11:02 PM, Stefan Tzeggai <tzeggai@empirica-systeme.de> wrote: > Am 08.11.2017 um 14:24 schrieb Dilip Kumar: >> By looking at the plan it seems like the issue what got fixed in below > commit. >>> >> Author: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:23:28 >> Committer: Robert Haas <rhaas@postgresql.org> 2017-10-14 00:35:14 >> Parent: d48bf6a94d295c3779c6af4df118d95a6606192f (Fix AggGetAggref() >> so it won't lie to aggregate final functions.) >> Child: cb591fcbfbba1df6fda1839ece53665e85e491e3 (Restore nodeAgg.c's >> ability to check for improperly-nested aggregates.) >> Branch: remotes/origin/REL_10_STABLE >> Follows: REL_10_0 >> Precedes: REL_10_1 >> >> Fix possible crash with Parallel Bitmap Heap Scan. >> >> If a Parallel Bitmap Heap scan's chain of leftmost descendents >> includes a BitmapOr whose first child is a BitmapAnd, the prior coding >> would mistakenly create a non-shared TIDBitmap and then try to perform >> shared iteration. >> >> Report by Tomas Vondra. Patch by Dilip Kumar. > > Do I understand it correctly, that that fix would be released with 10.1 > tomorrow (according to https://www.postgresql.org/developer/roadmap/) > and I could then test it? Yes, this commit is included in 10.1. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs