pg11.1: dsa_area could not attach to segment - Mailing list pgsql-hackers

From Justin Pryzby
Subject pg11.1: dsa_area could not attach to segment
Date
Msg-id 20181231221734.GB25379@telsasoft.com
Whole thread Raw
Responses Re: pg11.1: dsa_area could not attach to segment  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: pg11.1: dsa_area could not attach to segment  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
In our query logs I saw:

postgres=# SELECT log_time, session_id, session_line, left(message,99), left(query,99) FROM postgres_log WHERE
error_severity='ERROR'AND message NOT LIKE 'cancel%';
 
-[ RECORD 1 ]+----------------------------------------------------------------------------------------------------
log_time     | 2018-12-31 15:39:11.917-05
session_id   | 5c2a7e6f.1fa4
session_line | 1
left         | dsa_area could not attach to segment
left         | SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array
-[ RECORD 2 ]+----------------------------------------------------------------------------------------------------
log_time     | 2018-12-31 15:39:11.917-05
session_id   | 5c2a7e6f.1fa3
session_line | 4
left         | dsa_area could not attach to segment
left         | SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array

The full query + plan is:

|ts=# explain SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols,
array_agg(format_type(colpar.atttypid,colpar.atttypmod) ORDER BY colpar.attnum) AS types
 
|FROM queued_alters qa
|JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped
|JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid
ANDcolcld.attnum>0 AND NOT colcld.attisdropped
 
|WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
|GROUP BY 1,2
|ORDER BY
|parent LIKE 'unused%', -- Do them last
|regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC,
|        regexp_replace(colcld.child, '.*_', '') DESC
|LIMIT 1;

|QUERY PLAN
|Limit  (cost=67128.06..67128.06 rows=1 width=307)
|  ->  Sort  (cost=67128.06..67137.84 rows=3912 width=307)
|        Sort Key: (((qa_1.parent)::text ~~ 'unused%'::text)),
(regexp_replace((((pg_attribute.attrelid)::regclass)::text),
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$'::text,'\3\5'::text)) DESC,
(regexp_replace((((pg_attribute.attrelid)::regclass)::text),'.*_'::text, ''::text)) DESC
 
|        ->  GroupAggregate  (cost=66893.34..67108.50 rows=3912 width=307)
|              Group Key: (((pg_attribute.attrelid)::regclass)::text), qa_1.parent
|              ->  Sort  (cost=66893.34..66903.12 rows=3912 width=256)
|                    Sort Key: (((pg_attribute.attrelid)::regclass)::text), qa_1.parent
|                    ->  Gather  (cost=40582.28..66659.91 rows=3912 width=256)
|                          Workers Planned: 2
|                          ->  Parallel Hash Join  (cost=39582.28..65268.71 rows=1630 width=256)
|                                Hash Cond: (((to_regclass((qa_1.child)::text))::oid = pg_attribute.attrelid) AND
(colpar.attname= pg_attribute.attname))
 
|                                Join Filter: (colpar.atttypid <> pg_attribute.atttypid)
|                                ->  Nested Loop  (cost=0.43..25614.89 rows=11873 width=366)
|                                      ->  Parallel Append  (cost=0.00..12.00 rows=105 width=292)
|                                            ->  Parallel Seq Scan on queued_alters_child qa_1  (cost=0.00..11.47
rows=147width=292)
 
|                                            ->  Parallel Seq Scan on queued_alters qa  (cost=0.00..0.00 rows=1
width=292)
|                                      ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute colpar
(cost=0.43..242.70rows=114 width=78)
 
|                                            Index Cond: ((attrelid = (to_regclass((qa_1.parent)::text))::oid) AND
(attnum> 0))
 
|                                            Filter: (NOT attisdropped)
|                                ->  Parallel Hash  (cost=33651.38..33651.38 rows=395365 width=72)
|                                      ->  Parallel Seq Scan on pg_attribute  (cost=0.00..33651.38 rows=395365
width=72)
|                                            Filter: ((NOT attisdropped) AND (attnum > 0))
|

queued_alters is usually empty, and looks like it would've last been nonempty on 2018-12-10.

ts=# \d queued_alters
                  Table "public.queued_alters"
 Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
 child  | character varying(64) |           |          |
 parent | character varying(64) |           |          |
Indexes:
    "queued_alters_child_parent_key" UNIQUE CONSTRAINT, btree (child, parent)
Number of child tables: 1 (Use \d+ to list them.)

I found this other log at that time:
 2018-12-31 15:39:11.918-05 | 30831 | 5bf38e71.786f |            5 | background worker "parallel worker" (PID 8100)
exitedwith exit code 1
 

Which is the postmaster, or its PID in any case..

[pryzbyj@telsasoft-db ~]$ ps -wwwf 30831
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres 30831     1  0 Nov19 ?        S     62:44 /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data

postgres=# SELECT log_time, pid, session_line, left(message,99) FROM postgres_log WHERE session_id='5bf38e71.786f' ;
          log_time          |  pid  | session_line |                                  left

----------------------------+-------+--------------+-------------------------------------------------------------------------
 2018-12-31 15:39:11.918-05 | 30831 |            5 | background worker "parallel worker" (PID 8100) exited with exit
code1
 
 2018-12-31 15:39:11.935-05 | 30831 |            6 | background worker "parallel worker" (PID 8101) exited with exit
code1
 
 2018-12-31 16:40:47.42-05  | 30831 |            7 | background worker "parallel worker" (PID 7239) exited with exit
code1
 
 2018-12-31 16:40:47.42-05  | 30831 |            8 | background worker "parallel worker" (PID 7240) exited with exit
code1
 
 2018-12-31 16:41:00.151-05 | 30831 |            9 | background worker "parallel worker" (PID 7371) exited with exit
code1
 
 2018-12-31 16:41:00.151-05 | 30831 |           10 | background worker "parallel worker" (PID 7372) exited with exit
code1
 
 2018-12-31 16:41:04.024-05 | 30831 |           11 | background worker "parallel worker" (PID 7451) exited with exit
code1
 
 2018-12-31 16:41:04.024-05 | 30831 |           12 | background worker "parallel worker" (PID 7450) exited with exit
code1
 
 2018-12-31 16:41:23.845-05 | 30831 |           13 | background worker "parallel worker" (PID 7658) exited with exit
code1
 
 2018-12-31 16:41:23.845-05 | 30831 |           14 | background worker "parallel worker" (PID 7659) exited with exit
code1
 
 2018-12-31 16:43:58.854-05 | 30831 |           15 | background worker "parallel worker" (PID 10825) exited with exit
code1
 
 2018-12-31 16:43:58.854-05 | 30831 |           16 | background worker "parallel worker" (PID 10824) exited with exit
code1
 

I seem to be missing logs for session_lines 1-4 , which would've been rotated
to oblivion if older than 24h.

I found these:

https://www.postgresql.org/message-id/flat/CAPa4P2YzgRTBHHRx2KAPUO1_xkmqQgT2xP0tS_e%3DphWvNzWdkA%40mail.gmail.com#4a1a94bc71d4c99c01babc759fe5b6ec
https://www.postgresql.org/message-id/CAEepm=0Mv9BigJPpribGQhnHqVGYo2+kmzekGUVJJc9Y_ZVaYA@mail.gmail.com

Which appears to have been commited so I think this error is not unexpected.

|commit fd7c0fa732d97a4b4ebb58730e6244ea30d0a618
|Author: Robert Haas <rhaas@postgresql.org>
|Date:   Mon Dec 18 12:17:37 2017 -0500
|
|Fix crashes on plans with multiple Gather (Merge) nodes.

I will try to reproduce and provide bt...but all I can think to do is run a
tight loop around that query and hope something else comes by and tickles it a
few more times.

Happy Grecian newyear.

Justin


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: stats_temp_directory conflicts
Next
From: Adam Brightwell
Date:
Subject: Re: Changing SQL Inlining Behaviour (or...?)