Thread: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Greg Williamson

Date:

09 August 2012, 21:05:06

We have a small database (few hundred megs of data, lass than half that in indexes) that suffers from index bloat. Currently we handle this with an hourly REINDEX command. This works but causes some small issues, so I have been expploring other methods.

When I try to to drop constraints (typically UNIQUE) I sometimes get a crash of the psql client at the ALTER TABLE X DROP CONSTRAINT, and sometimes after I recreate the constraint at a commit. Transcript below:

usher=# BEGIN;

BEGIN

gusher=# ALTER TABLE ourcodes DROP CONSTRAINT IF EXISTS ourcodes_pkey CASCADE;

ALTER TABLE

gusher=# ALTER TABLE ourcodes ADD CONSTRAINT ourcodes_pkey PRIMARY KEY (id);

NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "ourcodes_pkey" for table "ourcodes"

ALTER TABLE

gusher=# COMMIT;

FATAL: terminating connection due to administrator command

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Succeeded.

gusher=# \d ourcodes

Unlogged table "public.ourcodes"

Column | Type | Modifiers

-------------+--------+---------------------------------------------------------

id | bigint | not null default nextval('ourcodes_id_seq'::regclass)

source | bigint | not null

destination | bigint |

started_on | bigint | default (date_part('epoch'::text, now()))::bigint

type | text |

status | text |

updated_on | bigint | default date_part('epoch'::text, now())

worker | text |

version | text |

Indexes:

"ourcodes_pkey" PRIMARY KEY, btree (id)

"ourcodes_source" btree (source, type)

gusher=# select count(*) from ourcodes;LTER TABLE

gusher=# ALTER TABLE ourcodes ADD CONSTRAINT ourcodes_pkey PRIMARY KEY (id);

NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "ourcodes_pkey" for table "ourcodes"

ALTER TABLE

gusher=# COMMIT;

FATAL: terminating connection due to administrator command

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Succeeded.

gusher=# \d ourcodes

Unlogged table "public.ourcodes"

Column | Type | Modifiers

-------------+--------+---------------------------------------------------------

id | bigint | not null default nextval('ourcodes_id_seq'::regclass)

source | bigint | not null

destination | bigint |

started_on | bigint | default (date_part('epoch'::text, now()))::bigint

type | text |

status | text |

updated_on | bigint | default date_part('epoch'::text, now())

worker | text |

version | text |

Indexes:

"ourcodes_pkey" PRIMARY KEY, btree (id)

"ourcodes_source" btree (source, type)

gusher=# select count(*) from ourcodes;

count

-------

1626

(1 row)

count

-------

1626

(1 row)

gusher=# select version();

version

----------------------------------------------------------------------------------------------------------------

PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3, 64-bit

(1 row)

========

This table is only updated a few times a minute (mostly inserts and deletes, some updates), others get hit a lot more (many times a second). pgsql is the same version and I am on the same server.

Is this known / documented ? Any suggestions (other than not to do this ; -- )

Thanks,

Greg Williamson

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Tom Lane

Date:

09 August 2012, 21:19:58

Greg Williamson <gwilliamson39@yahoo.com> writes:
> We have a small database (few hundred megs of data, lass than half that in indexes) that suffers from index bloat.
Currentlywe handle this with an hourly REINDEX command. This works but causes some small issues, so I have been
expploringother methods. 
> When I try to to drop constraints (typically UNIQUE) I sometimes get a crash of the psql client at the ALTER TABLE X
DROPCONSTRAINT, and sometimes after I recreate the constraint at a commit. Transcript below: 

> usher=# BEGIN;
> BEGIN
> gusher=# ALTER TABLE ourcodes DROP CONSTRAINT IF EXISTS ourcodes_pkey CASCADE;
> ALTER TABLE
> gusher=# ALTER TABLE ourcodes ADD CONSTRAINT ourcodes_pkey PRIMARY KEY (id);
> NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "ourcodes_pkey" for table "ourcodes"
> ALTER TABLE
> gusher=# COMMIT;
> FATAL:  terminating connection due to administrator command

WTF?  That should certainly not happen.  Have you perhaps got a script
that runs around sending SIGTERM to backends that it thinks are blocking
something?  Does anything show up in the postmaster log when this
happens?

            regards, tom lane

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Greg Williamson

Date:

09 August 2012, 21:37:53

Tom --

<...>
>
> WTF? That should certainly not happen. Have you perhaps got a script
> that runs around sending SIGTERM to backends that it thinks are blocking
> something? Does anything show up in the postmaster log when this
> happens?

No, we have processes that kill IDLE IN TRANSACTIONs ruthlessly, but nothing that kills any other processes. There is an outside chance that this ran at just the same time that a cronjob ran the REINDEX command for the table (unlikely but I am going to shut off the cronjob and try this again).

It does not log an ERROR to the log file; I can wade intio the log and see what else it says but it will take a little while (about 800 megs per hour) to answer that.

Thanks, and I'll report back soon.

Greg
-

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Greg Williamson

Date:

09 August 2012, 22:31:20

Tom, et. al.,

From: Greg Williamson <gwilliamson39@yahoo.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
Sent: Thursday, August 9, 2012 5:35 PM
Subject: Re: [ADMIN] Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

Tom --

><...>
>>
>> WTF? That should certainly not happen. Have you perhaps got a script
>> that runs around sending SIGTERM to backends that it thinks are blocking
>> something? Does anything show up in the postmaster log when this
>> happens?
>
> No, we have processes that kill IDLE IN TRANSACTIONs ruthlessly, but nothing that kills any other processes. There is an outside chance that this ran at just the
> same time that a cronjob ran the REINDEX command for the table (unlikely but I am going to shut off the cronjob and try this again).
>
> It does not log an ERROR to the log file; I can wade intio the log and see what else it says but it will take a little while (about 800 megs per hour) to answer that.
>
> Thanks, and I'll report back soon.

This was as all run as user postgres, unlike everything else so here is everyhting in the logsw:
2012-08-09 16:50:25.398 PDT,"postgres","gusher",13873,"[local]",50244cc1.3631,2,"authentication",2012-08-09 16:50:25 PDT,19/61906190,0,LOG,00000,"connection authorized: user=postgres database=gusher",,,,,,,,,""
2012-08-09 16:50:41.206 PDT,"postgres","gusher",13873,"[local]",50244cc1.3631,3,"SELECT",2012-08-09 16:50:25 PDT,19/0,0,LOG,00000,"duration: 3.437 ms statement: SELECT c.oid,
2012-08-09 16:50:41.212 PDT,"postgres","gusher",13873,"[local]",50244cc1.3631,4,"SELECT",2012-08-09 16:50:25 PDT,19/0,0,LOG,00000,"duration: 4.461 ms statement: SELECT a.attname,
2012-08-09 16:50:59.238 PDT,"postgres","gusher",13873,"[local]",50244cc1.3631,5,"ALTER TABLE",2012-08-09 16:50:25 PDT,19/61906197,1518480799,LOG,00000,"duration: 4.612 ms statement: ALTER TABLE ourcodes ADD CONSTRAINT ourcodes_pkey PRIMARY KEY (id);",,,,,,,,,"psql"
2012-08-09 16:51:01.808 PDT,"postgres","gusher",13873,"[local]",50244cc1.3631,6,"idle in transaction",2012-08-09 16:50:25 PDT,19/61906197,1518480799,FATAL,57P01,"terminating connection due to administrator command",,,,,,,,,"psql"
2012-08-09 16:51:01.859 PDT,"postgres","gusher",13873,"[local]",50244cc1.3631,7,"idle in transaction",2012-08-09 16:50:25 PDT,,0,LOG,00000,"disconnection: session time: 0:00:36.458 user=postgres database=gusher host=[local]",,,,,,,,,"psql"
2012-08-09 16:51:02.309 PDT,"postgres","gusher",14000,"[local]",50244ce6.36b0,2,"authentication",2012-08-09 16:51:02 PDT,19/61906206,0,LOG,00000,"connection authorized: user=postgres database=gusher",,,,,,,,,""
2012-08-09 16:51:18.551 PDT,"postgres","gusher",14000,"[local]",50244ce6.36b0,3,"SELECT",2012-08-09 16:51:02 PDT,19/0,0,LOG,00000,"duration: 3.979 ms statement: SELECT a.attname,
2012-08-09 16:51:18.556 PDT,"postgres","gusher",14000,"[local]",50244ce6.36b0,4,"SELECT",2012-08-09 16:51:02 PDT,19/0,0,LOG,00000,"duration: 5.255 ms statement: SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered, i.indisvalid, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
2012-08-09 16:59:34.443 PDT,"postgres","gusher",14000,"[local]",50244ce6.36b0,5,"idle",2012-08-09 16:51:02 PDT,,0,LOG,00000,"disconnection: session time: 0:08:32.135 user=postgres database=gusher host=[local]",,,,,,,,,"psql"

Every minute on the minute there is a cronjob that kills IDLE IN CONNECTION procs older than 3 seconds ... possibly that process killed the process I was running, but to me the user it seemed as if the COMMIT came before the disconnect.

We have had some issues on this box so I probably won't test this much more today; I'll post more if I can get a reproducible case.

Thanks,

Greg

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Tom Lane

Date:

09 August 2012, 23:07:19

Greg Williamson <gwilliamson39@yahoo.com> writes:
> Every minute on the minute there is a cronjob that kills IDLE IN CONNECTION procs older than 3 seconds ... possibly
thatprocess killed the process I was running, but to me the user it seemed as if the COMMIT came before the disconnect. 

That sure sounds like the source of your problem.  It might be sane if
it killed only processes that *had been idle* for at least three
seconds, but I'm not sure there is any easy way to determine that ...

            regards, tom lane

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Craig Ringer

Date:

09 August 2012, 23:25:16

On 08/10/2012 10:06 AM, Tom Lane wrote:
> Greg Williamson <gwilliamson39@yahoo.com> writes:
>> Every minute on the minute there is a cronjob that kills IDLE IN CONNECTION procs older than 3 seconds ... possibly
thatprocess killed the process I was running, but to me the user it seemed as if the COMMIT came before the disconnect. 
> That sure sounds like the source of your problem.  It might be sane if
> it killed only processes that *had been idle* for at least three
> seconds, but I'm not sure there is any easy way to determine that ...
wouldn't:

select * from pg_stat_activity
where current_query = '<IDLE> in transaction'
AND query_start < current_timestamp - INTERVAL '3 seconds';

do it?

--
Craig Ringer

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Tom Lane

Date:

10 August 2012, 01:21:14

Craig Ringer <ringerc@ringerc.id.au> writes:
> On 08/10/2012 10:06 AM, Tom Lane wrote:
>> That sure sounds like the source of your problem.  It might be sane if
>> it killed only processes that *had been idle* for at least three
>> seconds, but I'm not sure there is any easy way to determine that ...

> wouldn't:

> select * from pg_stat_activity
> where current_query = '<IDLE> in transaction'
> AND query_start < current_timestamp - INTERVAL '3 seconds';

> do it?

No, that would find sessions that were idle and whose last command
started at least 3 seconds ago.  But it might have completed only
microseconds ago.  The symptoms Greg is describing are consistent
with this kill script running during a short interval between his
index-build command and his COMMIT.

As of 9.2 there's a "state_change" timestamp column in pg_stat_activity
that would provide a safer check, but it's not in existing releases ...

            regards, tom lane

Re: Question about ALTER TABLE DROP CONSTRAINT on 9.1 -- psql crashes

From

Craig Ringer

Date:

10 August 2012, 02:09:05

On 08/10/2012 12:20 PM, Tom Lane wrote:
> Craig Ringer <ringerc@ringerc.id.au> writes:
>> On 08/10/2012 10:06 AM, Tom Lane wrote:
>>> That sure sounds like the source of your problem.  It might be sane if
>>> it killed only processes that *had been idle* for at least three
>>> seconds, but I'm not sure there is any easy way to determine that ...
>
>> wouldn't:
>
>> select * from pg_stat_activity
>> where current_query = '<IDLE> in transaction'
>> AND query_start < current_timestamp - INTERVAL '3 seconds';
>
>> do it?
>
> No, that would find sessions that were idle and whose last command
> started at least 3 seconds ago.  But it might have completed only
> microseconds ago.  The symptoms Greg is describing are consistent
> with this kill script running during a short interval between his
> index-build command and his COMMIT.

Eh. For some reason I thought query_start was updated whenever
current_query was set, so it'd be updated when the session went idle.
Nonesensical in retrospect.

Does this approach look sane? Note idlers and then reap them if they
haven't started new queries next pass:

CREATE UNLOGGED TABLE reap (
   procpid integer,
   query_start timestamptz,
   added timestamptz
);

-- Run every 20s or whatever to note idlers and when we noticed them
insert into reap (procpid,query_start,added)
select procpid, query_start, current_timestamp
from pg_stat_activity where current_query = '<IDLE> in transaction';

-- Clean out old entries and kill idlers
WITH r AS (
   DELETE FROM reap r2
   WHERE r2.added + INTERVAL '10 seconds' < current_timestamp
   RETURNING *
)
SELECT r.procpid, pg_terminate_backend(r.procpid)
FROM r INNER JOIN pg_stat_activity s
   ON (r.procpid = s.procpid AND r.query_start = s.query_start )
GROUP BY r.procpid;

ie "where we've noted more than 1 minute ago that a pid was idle, and
query_start hasn't changed so no new query has been run, kill the backend".

Of course, there's race where the backend could be starting a new query
even as you kill it for idleness, but that's true in any
idle-in-transaction control scheme (other than having a true transaction
timeout feature *within* the backend).

--
Craig Ringer