Thread: Segfault in BackendIdGetTransactionIds with psycopg2
Hi all.
After upgrading from 9.3.6 to 9.4.1 (both installed from packages on yum.postgresql.org) we have started getting segfaults of different backends. Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3 0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value optimized out>) at pgstatfuncs.c:1080
#4 0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270, econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at execQual.c:2023
#5 0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5304
#6 ExecProject (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5519
#7 0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0, pergroup=0x1f5e380) at nodeAgg.c:556
#8 0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>) at nodeAgg.c:1223
#9 ExecAgg (node=<value optimized out>) at nodeAgg.c:1115
#10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at execProcnode.c:476
#11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:1486
#12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:319
#13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#14 0x00000000006879c1 in PortalRun (portal=0x1ea5660, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528, altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790
#15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT sum(numbackends) FROM pg_stat_database;") at postgres.c:1072
#16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value optimized out>) at postgres.c:4074
#17 0x0000000000632d7d in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4155
#18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829
#19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597
#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244
#21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228
(gdb)
Unfortunatelly, I can't give a clear sequence of steps to reproduce the problem, segfaults are happening in quiet random time and under random workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL is built with --enable-debug flag to give you more information.
But the common conditions are:
1. it happens only on master hosts (never on any of the streaming replicas),
2. it happens on simple queries to pg_catalog or system views as shown in the backtrace above,
3. it happens only with direct connecting to PostgreSQL (production-queries go through pgbouncer and no coredumps contain production queries). And it happened only with python-psycopg2 (we have tried versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6 with postgresql94-libs).
The code that lead to backtrace above is the following:
#!/usr/bin/env python
import socket
import time
import sys
import psycopg2
import os
me = sys.argv[0].split('/')[-1].split('.')[0]
hostname = socket.gethostname()
hostname_s = hostname.replace('.', '_')
current_ts = int(time.time())
gr_prefix = "mail.pg"
metrics = ['numbackends', 'xact_commit', 'xact_rollback', 'blks_read',
'blks_hit', 'tup_returned', 'tup_fetched', 'tup_inserted',
'tup_updated', 'tup_deleted']
user = 'monitor'
password = ''
with open(os.path.expanduser("~/.pgpass")) as pgpass:
for line in pgpass:
tokens = line.rstrip().split(':')
if tokens[3] == user:
password = tokens[4]
break
conn = psycopg2.connect('host=localhost port=5432 dbname=postgres ' +
'user=%s password=%s ' % (user, password) +
'connect_timeout=1')
cur = conn.cursor()
for metric in metrics:
cur.execute("SELECT sum(%s) FROM pg_stat_database;" % metric)
result = cur.fetchone()[0]
print("%s.%s.%s.%s %d %d" % (gr_prefix, hostname_s, me, metric,
result, current_ts))
May this be related to psycopg2 or I should write to pgsql-bugs@?
Thanks.
On Mon, Mar 30, 2015 at 3:11 PM, Vladimir Borodin <root@simply.name> wrote: > May this be related to psycopg2 or I should write to pgsql-bugs@? This is definitely a server-side problem: -bugs is the right place to report it. -- Daniele