Segfault in BackendIdGetTransactionIds with psycopg2 - Mailing list psycopg
From | Vladimir Borodin |
---|---|
Subject | Segfault in BackendIdGetTransactionIds with psycopg2 |
Date | |
Msg-id | 11D569E2-D848-4430-8783-EEB2F323761F@simply.name Whole thread Raw |
Responses |
Re: Segfault in BackendIdGetTransactionIds with psycopg2
|
List | psycopg |
Hi all.
After upgrading from 9.3.6 to 9.4.1 (both installed from packages on yum.postgresql.org) we have started getting segfaults of different backends. Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3 0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value optimized out>) at pgstatfuncs.c:1080
#4 0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270, econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at execQual.c:2023
#5 0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5304
#6 ExecProject (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5519
#7 0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0, pergroup=0x1f5e380) at nodeAgg.c:556
#8 0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>) at nodeAgg.c:1223
#9 ExecAgg (node=<value optimized out>) at nodeAgg.c:1115
#10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at execProcnode.c:476
#11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:1486
#12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:319
#13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#14 0x00000000006879c1 in PortalRun (portal=0x1ea5660, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528, altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790
#15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT sum(numbackends) FROM pg_stat_database;") at postgres.c:1072
#16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value optimized out>) at postgres.c:4074
#17 0x0000000000632d7d in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4155
#18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829
#19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597
#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244
#21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228
(gdb)
Unfortunatelly, I can't give a clear sequence of steps to reproduce the problem, segfaults are happening in quiet random time and under random workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL is built with --enable-debug flag to give you more information.
But the common conditions are:
1. it happens only on master hosts (never on any of the streaming replicas),
2. it happens on simple queries to pg_catalog or system views as shown in the backtrace above,
3. it happens only with direct connecting to PostgreSQL (production-queries go through pgbouncer and no coredumps contain production queries). And it happened only with python-psycopg2 (we have tried versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6 with postgresql94-libs).
The code that lead to backtrace above is the following:
#!/usr/bin/env python
import socket
import time
import sys
import psycopg2
import os
me = sys.argv[0].split('/')[-1].split('.')[0]
hostname = socket.gethostname()
hostname_s = hostname.replace('.', '_')
current_ts = int(time.time())
gr_prefix = "mail.pg"
metrics = ['numbackends', 'xact_commit', 'xact_rollback', 'blks_read',
'blks_hit', 'tup_returned', 'tup_fetched', 'tup_inserted',
'tup_updated', 'tup_deleted']
user = 'monitor'
password = ''
with open(os.path.expanduser("~/.pgpass")) as pgpass:
for line in pgpass:
tokens = line.rstrip().split(':')
if tokens[3] == user:
password = tokens[4]
break
conn = psycopg2.connect('host=localhost port=5432 dbname=postgres ' +
'user=%s password=%s ' % (user, password) +
'connect_timeout=1')
cur = conn.cursor()
for metric in metrics:
cur.execute("SELECT sum(%s) FROM pg_stat_database;" % metric)
result = cur.fetchone()[0]
print("%s.%s.%s.%s %d %d" % (gr_prefix, hostname_s, me, metric,
result, current_ts))
May this be related to psycopg2 or I should write to pgsql-bugs@?
Thanks.