Home > mailing lists

Memory - Mailing list psycopg

From	Vladimir Ryabtsev
Subject	Memory
Date	December 21 16:45:45
Msg-id	CAMqTPqn8y8Y+uDY0FPvX6ghD1DftLyz2nD6n6HhGOg-gHP4JdA@mail.gmail.com Whole thread Raw
Responses	Re: Memory
List	psycopg

Tree view

Hi community,

I am reading a big dataset using code similar to this:

query = '''
SELECT timestamp, data_source, tag, agg_value
FROM my_table
'''I
batch_size = 10_000_000

with psycopg.connect(cs, cursor_factory=psycopg.ClientrCursor) as conn:
with conn.cursor('my_table') as cur:
cur = cur.execute(query)
while True:
rows = cur.fetchmany(batch_size)

# ...
if not rows:
break

The code is executed on a Databricks node, if that matters. The library version is the latest.

I found that despite fetching in batches, memory consumption grows continuously throughout the loop iterations and eventually the node goes OOM. My code does not save any references, so it might be something internal to the library.

If I change the factory to ServerCursor, the issue fixes, memory does not grow after the first iteration.

I looked the documentation, but did not find specifics related to performance differences between Server and Client cursors.

I am fine with ServerCursor, but I need to ask, is it by design that with ClientCursor the result set is copied into memory despite fetchmany() limit? ClientCursor is the default class, so may be worth documenting the difference (sorry, if I missed that).

Thank you.

psycopg by date:

From: Adrian Klaver
Date: 09 December, 22:53:21
Subject: Re: Best Practices for Checking PostgreSQL Server Mode with Multiple Connections

From: Adrian Klaver
Date: 21 December, 22:34:51
Subject: Re: Memory

Memory - Mailing list psycopg

Previous

Next