Segfault in BackendIdGetTransactionIds with psycopg2 - Mailing list psycopg

From Vladimir Borodin
Subject Segfault in BackendIdGetTransactionIds with psycopg2
Date
Msg-id 11D569E2-D848-4430-8783-EEB2F323761F@simply.name
Whole thread Raw
Responses Re: Segfault in BackendIdGetTransactionIds with psycopg2
List psycopg
Hi all.

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on yum.postgresql.org) we have started getting segfaults of different backends. Backtraces of all coredumps look similar:
(gdb) bt
#0  0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1  0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2  0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3  0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value optimized out>) at pgstatfuncs.c:1080
#4  0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270, econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at execQual.c:2023
#5  0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5304
#6  ExecProject (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5519
#7  0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0, pergroup=0x1f5e380) at nodeAgg.c:556
#8  0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>) at nodeAgg.c:1223
#9  ExecAgg (node=<value optimized out>) at nodeAgg.c:1115
#10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at execProcnode.c:476
#11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:1486
#12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:319
#13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#14 0x00000000006879c1 in PortalRun (portal=0x1ea5660, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528, altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790
#15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT sum(numbackends) FROM pg_stat_database;") at postgres.c:1072
#16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value optimized out>) at postgres.c:4074
#17 0x0000000000632d7d in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4155
#18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829
#19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597
#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244
#21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228
(gdb)

Unfortunatelly, I can't give a clear sequence of steps to reproduce the problem, segfaults are happening in quiet random time and under random workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL is built with --enable-debug flag to give you more information.

But the common conditions are:
  1. it happens only on master hosts (never on any of the streaming replicas),
  2. it happens on simple queries to pg_catalog or system views as shown in the backtrace above,
  3. it happens only with direct connecting to PostgreSQL (production-queries go through pgbouncer and no coredumps contain production queries). And it happened only with python-psycopg2 (we have tried versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6 with postgresql94-libs). 

The code that lead to backtrace above is the following:

#!/usr/bin/env python

import socket
import time
import sys
import psycopg2
import os

me = sys.argv[0].split('/')[-1].split('.')[0]
hostname = socket.gethostname()
hostname_s = hostname.replace('.', '_')
current_ts = int(time.time())
gr_prefix = "mail.pg"

metrics = ['numbackends', 'xact_commit', 'xact_rollback', 'blks_read',
           'blks_hit', 'tup_returned', 'tup_fetched', 'tup_inserted',
           'tup_updated', 'tup_deleted']

user = 'monitor'
password = ''

with open(os.path.expanduser("~/.pgpass")) as pgpass:
    for line in pgpass:
        tokens = line.rstrip().split(':')
        if tokens[3] == user:
            password = tokens[4]
            break

conn = psycopg2.connect('host=localhost port=5432 dbname=postgres ' +
                        'user=%s password=%s ' % (user, password) +
                        'connect_timeout=1')
cur = conn.cursor()

for metric in metrics:
    cur.execute("SELECT sum(%s) FROM pg_stat_database;" % metric)
    result = cur.fetchone()[0]
    print("%s.%s.%s.%s %d %d" % (gr_prefix, hostname_s, me, metric,
                                 result, current_ts))


May this be related to psycopg2 or I should write to pgsql-bugs@?

Thanks.

--
May the force be with you…

psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Fwd: error in cur.mogrify line
Next
From: Vladimir Borodin
Date:
Subject: Segfault in BackendIdGetTransactionIds with psycopg2