Home > mailing lists

Re: performance issue with psycopg - Mailing list psycopg

From	Daniele Varrazzo
Subject	Re: performance issue with psycopg
Date	December 12, 2011 15:08:39
Msg-id	CA+mi_8ZVqs3=WkKRkSoEUs-7RtE0ejL+L3SrwRqv-iqxSPeWWA@mail.gmail.com Whole thread
In response to	Re: performance issue with psycopg (Sushant Sinha <sushant354@gmail.com>)
Responses	Re: performance issue with psycopg
List	psycopg

Tree view

On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote:
> On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote:
>> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote:
>> > I am using psycopg 2.4.2 and facing performance issue. An explain
>> > analyze of the query on the database shows 0.75secs. But when used
>> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra
>> > cost and how to optimize it?
>> >
>> > from psycopg2.extras import DictCursor
>> > from psycopg2.extensions import UNICODE, UNICODEARRAY
>> > psycopg2.extensions.register_type(UNICODE)
>> > psycopg2.extensions.register_type(UNICODEARRAY)
>> >
>> >
>> > cursor = dbCnx.cursor(cursor_factory=DictCursor)
>> > cursor.execute(query, params)
>> > results = cursor.fetchall()
>>
>> What is the performance using only the basic cursor? And how big is
>> the returned set?
>>
>> -- Daniele
>
>
> Number of rows: 25111, number of columns: 12
> With basic cursor it is 1.2 secs
>
>        1    0.000    0.000    1.230    1.230 a.py:1(<module>)
>        1    0.876    0.876    0.877    0.877 {method 'execute' of
> 'psycopg2._psycopg.cursor' objects}
>        1    0.142    0.142    0.202    0.202 {method 'fetchall' of
> 'psycopg2._psycopg.cursor' objects}
>        1    0.000    0.000    0.148    0.148 db_utility.py:1(<module>)
>        1    0.002    0.002    0.132    0.132 extras.py:5(<module>)
>        1    0.000    0.000    0.125    0.125 uuid.py:45(<module>)
>        2    0.000    0.000    0.120    0.060 util.py:235(find_library)
>        2    0.000    0.000    0.120    0.060
> util.py:207(_findSoname_ldconfig)

This timing includes the module import, which is now (psycopg 2.4.3)
lazy and is not loaded if not used. The overhead is only paid once at
import time though.

The execution time for a client-side cursor is in execute(): it
contains the query time plus the fetch time, so it has an overhead of
0.12 over the explain analyze. The 0.75 of the explain doesn't include
the transfer time from server to client: it can be probably estimated
running the query in psql and dumping the output to /dev/null. The
method doesn't do much more than calling PQexec, so there's probably
little to save there.

Conversion from postgres strings to python objects happen into
fetchall. the standard cursor only makes a tuple out of the converted
types, the dictcursor does more: the overhead is 0.778 - 0.202.  So
you are paying 0.5 sec for using the more complex dictcursor over the
basic cursor.

Every postgres data type is converted into a python object using a
specific cast function: some conversion function may be slower than
the other. You may exclude some of the columns out of your query to
understand if there is one data type particularly slower than the
other.

Said this, I don't see a specific hotspot in the timing you have
reported: maybe dictcursor can be optimized, maybe some of the
typecasters do. I don't know if there is any gross bug, e.g. causing a
quadratic behaviour, or if some time can be shaved off by
micro-optimizing DictCursor or some adapter. I'll be happy to fix the
library if anything that can be improved is pointed out, but I'm not
going to investigate on possible optimizations by myself: I'll leave
it to you or to some other volunteer.

-- Daniele

psycopg by date:

From: Sushant Sinha
Date: 12 December 2011, 13:46:35
Subject: Re: performance issue with psycopg

From: Sushant Sinha
Date: 13 December 2011, 09:45:31
Subject: Re: performance issue with psycopg

Re: performance issue with psycopg - Mailing list psycopg

Previous

Next