Thread: Segfault in BackendIdGetTransactionIds with psycopg2

Segfault in BackendIdGetTransactionIds with psycopg2

From

Vladimir Borodin

Date:

30 March 2015, 14:11:17

Hi all.

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on yum.postgresql.org) we have started getting segfaults of different backends. Backtraces of all coredumps look similar:

(gdb) bt

#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426

#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871

#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

#3 0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value optimized out>) at pgstatfuncs.c:1080

#4 0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270, econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at execQual.c:2023

#5 0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5304

#6 ExecProject (projInfo=<value optimized out>, isDone=0x0) at execQual.c:5519

#7 0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0, pergroup=0x1f5e380) at nodeAgg.c:556

#8 0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>) at nodeAgg.c:1223

#9 ExecAgg (node=<value optimized out>) at nodeAgg.c:1115

#10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at execProcnode.c:476

#11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:1486

#12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized out>, count=0) at execMain.c:319

#13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946

#14 0x00000000006879c1 in PortalRun (portal=0x1ea5660, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528, altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790

#15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT sum(numbackends) FROM pg_stat_database;") at postgres.c:1072

#16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value optimized out>) at postgres.c:4074

#17 0x0000000000632d7d in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4155

#18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829

#19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597

#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244

#21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228

(gdb)

Unfortunatelly, I can't give a clear sequence of steps to reproduce the problem, segfaults are happening in quiet random time and under random workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL is built with --enable-debug flag to give you more information.

But the common conditions are:

1. it happens only on master hosts (never on any of the streaming replicas),

2. it happens on simple queries to pg_catalog or system views as shown in the backtrace above,

3. it happens only with direct connecting to PostgreSQL (production-queries go through pgbouncer and no coredumps contain production queries). And it happened only with python-psycopg2 (we have tried versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6 with postgresql94-libs).

The code that lead to backtrace above is the following:

#!/usr/bin/env python

import socket

import time

import sys

import psycopg2

import os

me = sys.argv[0].split('/')[-1].split('.')[0]

hostname = socket.gethostname()

hostname_s = hostname.replace('.', '_')

current_ts = int(time.time())

gr_prefix = "mail.pg"

metrics = ['numbackends', 'xact_commit', 'xact_rollback', 'blks_read',

'blks_hit', 'tup_returned', 'tup_fetched', 'tup_inserted',

'tup_updated', 'tup_deleted']

user = 'monitor'

password = ''

with open(os.path.expanduser("~/.pgpass")) as pgpass:

for line in pgpass:

tokens = line.rstrip().split(':')

if tokens[3] == user:

password = tokens[4]

break

conn = psycopg2.connect('host=localhost port=5432 dbname=postgres ' +

'user=%s password=%s ' % (user, password) +

'connect_timeout=1')

cur = conn.cursor()

for metric in metrics:

cur.execute("SELECT sum(%s) FROM pg_stat_database;" % metric)

result = cur.fetchone()[0]

print("%s.%s.%s.%s %d %d" % (gr_prefix, hostname_s, me, metric,

result, current_ts))

May this be related to psycopg2 or I should write to pgsql-bugs@?

Thanks.

May the force be with you…

https://simply.name

Re: Segfault in BackendIdGetTransactionIds with psycopg2

From

Daniele Varrazzo

Date:

30 March 2015, 14:43:33

On Mon, Mar 30, 2015 at 3:11 PM, Vladimir Borodin <root@simply.name> wrote:

> May this be related to psycopg2 or I should write to pgsql-bugs@?

This is definitely a server-side problem: -bugs is the right place to report it.

-- Daniele