[BUGS] plpython bug - Mailing list pgsql-bugs

From joel.traf@magwerks.com
Subject [BUGS] plpython bug
Date
Msg-id 20170205215158.AD118E06FA@smtp.hushmail.com
Whole thread Raw
Responses Re: [BUGS] plpython bug  (Euler Taveira <euler@timbira.com.br>)
List pgsql-bugs
Hello Developers

found an interesting bug that causes a complete crash and resetting of all connections to PG 9.5.0 compiled Visual C++ 1800, 64 bit
log states  server process (PID 2720) was terminated by exception 0XC0000005
failed process was running: select upsert_items2(false)
Hint: see C include file ntstatus.h for a description of hexadecimal value  (this error is a memory access error)
log: Terminating any other active server processes

Upsert_items2::  Is a function that copies/updates data from ERP database to a Website database. It just iterates over all the items, figures out how to group items together based on class, decides if the item should be seen on the website based on rules,  Nothing complicated.

As these are two different PG  databases, the plpython connects to the erp database via psycopg2 copying the data into local plpython tuple result set.   I then just executed the plpy.execute("insert on conflict do update") for each item.  processing time was around 5 minutes to go through 12,000 items updating 5 tables on the website database.

The users want this to be faster. The only solution i could think of was to cut down on all context switching and data type mapping that happens when calling plpy.execute() for each item.  So decided to just build a really big SQL update command  then at the end plpython function call plpy.execute().

That worked and processing time was reduced from 5 minutes to 8 seconds.  As this is work in process I started adding in more rules that change what gets updated  and i had a couple of typo's  in the SQL statements.

That's when i crashed the PG server hard.  Its taken me a few hours to figure out were my mistake was, as this crash will only occur if a SQL error is very deep in the command sent via plpy.execute().  If there is an error in the first few lines it returns plpython SPI error.   How deep into the SQL command an error has to be appear to cause this access error i do not know.

The length of the command sent via plpy.execute()  is 1,454,561  characters long

Any suggestions on how to narrow down at what point a bad command causes this error??


FYI only
The first write of this function was written in plpgsql  using DbLink which took 15 minutes to process the same data-set,  second write used FDW processing time was still 15 minutes, in both cases it spent all the time getting data from the remote PG database doing executing Fetchs. 


Thanks

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] postgres client connection issue
Next
From: rob stone
Date:
Subject: Re: [BUGS] Parser failed to return an error