Thread: Segfault in BackendIdGetTransactionIds with psycopg2

Segfault in BackendIdGetTransactionIds with psycopg2

From
Vladimir Borodin
Date:
Hi all.

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on yum.postgresql.org) we have started getting segfaults of different backends. Backtraces of all coredumps look similar:
(gdb) bt
#0  0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1  0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2  0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3  0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value optimized out>) at pgstatfuncs.c:1080
#4  0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270, econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at execQual.c:2023
#5  0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5304
#6  ExecProject (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5519
#7  0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0, pergroup=0x1f5e380) at nodeAgg.c:556
#8  0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>) at nodeAgg.c:1223
#9  ExecAgg (node=<value optimized out>) at nodeAgg.c:1115
#10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at execProcnode.c:476
#11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:1486
#12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:319
#13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#14 0x00000000006879c1 in PortalRun (portal=0x1ea5660, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528, altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790
#15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT sum(numbackends) FROM pg_stat_database;") at postgres.c:1072
#16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value optimized out>) at postgres.c:4074
#17 0x0000000000632d7d in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4155
#18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829
#19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597
#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244
#21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228
(gdb)

Unfortunatelly, I can't give a clear sequence of steps to reproduce the problem, segfaults are happening in quiet random time and under random workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL is built with --enable-debug flag to give you more information.

But the common conditions are:
  1. it happens only on master hosts (never on any of the streaming replicas),
  2. it happens on simple queries to pg_catalog or system views as shown in the backtrace above,
  3. it happens only with direct connecting to PostgreSQL (production-queries go through pgbouncer and no coredumps contain production queries). And it happened only with python-psycopg2 (we have tried versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6 with postgresql94-libs). 

The code that lead to backtrace above is the following:

#!/usr/bin/env python

import socket
import time
import sys
import psycopg2
import os

me = sys.argv[0].split('/')[-1].split('.')[0]
hostname = socket.gethostname()
hostname_s = hostname.replace('.', '_')
current_ts = int(time.time())
gr_prefix = "mail.pg"

metrics = ['numbackends', 'xact_commit', 'xact_rollback', 'blks_read',
           'blks_hit', 'tup_returned', 'tup_fetched', 'tup_inserted',
           'tup_updated', 'tup_deleted']

user = 'monitor'
password = ''

with open(os.path.expanduser("~/.pgpass")) as pgpass:
    for line in pgpass:
        tokens = line.rstrip().split(':')
        if tokens[3] == user:
            password = tokens[4]
            break

conn = psycopg2.connect('host=localhost port=5432 dbname=postgres ' +
                        'user=%s password=%s ' % (user, password) +
                        'connect_timeout=1')
cur = conn.cursor()

for metric in metrics:
    cur.execute("SELECT sum(%s) FROM pg_stat_database;" % metric)
    result = cur.fetchone()[0]
    print("%s.%s.%s.%s %d %d" % (gr_prefix, hostname_s, me, metric,
                                 result, current_ts))


May this be related to psycopg2 or I should write to pgsql-bugs@?

Thanks.

--
May the force be with you…

Re: Segfault in BackendIdGetTransactionIds with psycopg2

From
Daniele Varrazzo
Date:
On Mon, Mar 30, 2015 at 3:11 PM, Vladimir Borodin <root@simply.name> wrote:

> May this be related to psycopg2 or I should write to pgsql-bugs@?

This is definitely a server-side problem: -bugs is the right place to report it.

-- Daniele