Thread: performance issue with psycopg

performance issue with psycopg

From

Sushant Sinha

Date:

12 December 2011, 12:54:14

I am using psycopg 2.4.2 and facing performance issue. An explain
analyze of the query on the database shows 0.75secs. But when used
through the psycopg layer I am seeing 1.5 secs. Any idea why the extra
cost and how to optimize it?

from psycopg2.extras import DictCursor
from psycopg2.extensions import UNICODE, UNICODEARRAY
psycopg2.extensions.register_type(UNICODE)
psycopg2.extensions.register_type(UNICODEARRAY)


cursor = dbCnx.cursor(cursor_factory=DictCursor)
cursor.execute(query, params)
results = cursor.fetchall()

Here is the profiling for extras.py:

       35    0.896    0.026    0.896    0.026 extras.py:118(execute)
       35    0.296    0.008    0.778    0.022 extras.py:80(fetchall)
   276433    0.270    0.000    0.314    0.000 extras.py:148(__setitem__)
        1    0.001    0.001    0.129    0.129 extras.py:5(<module>)
    25153    0.112    0.000    0.116    0.000 extras.py:139(__init__)
    17176    0.018    0.000    0.028    0.000 extras.py:143(__getitem__)
        1    0.000    0.000    0.002    0.002
extras.py:566(HstoreAdapter)
        1    0.000    0.000    0.002    0.002
extras.py:762(CompositeCaster)
     4001    0.001    0.000    0.001    0.000 extras.py:162(has_key)
       35    0.000    0.000    0.000    0.000
extras.py:128(_build_index)
       35    0.000    0.000    0.000    0.000 extras.py:113(__init__)
       35    0.000    0.000    0.000    0.000 extras.py:50(__init__)
       46    0.000    0.000    0.000    0.000 extras.py:914(<lambda>)
        1    0.000    0.000    0.000    0.000
extras.py:254(NamedTupleCursor)
       10    0.000    0.000    0.000    0.000 extras.py:156(keys)
        1    0.000    0.000    0.000    0.000 extras.py:134(DictRow)
        1    0.000    0.000    0.000    0.000
extras.py:373(LoggingCursor)
        1    0.000    0.000    0.000    0.000
extras.py:47(DictCursorBase)
        1    0.000    0.000    0.000    0.000
extras.py:433(UUID_adapter)
        1    0.000    0.000    0.000    0.000
extras.py:389(MinTimeLoggingConnection)
        1    0.000    0.000    0.000    0.000 extras.py:110(DictCursor)
        1    0.000    0.000    0.000    0.000 extras.py:493(Inet)
        1    0.000    0.000    0.000    0.000 extras.py:233(RealDictRow)
        1    0.000    0.000    0.000    0.000
extras.py:203(RealDictCursor)
        1    0.000    0.000    0.000    0.000
extras.py:326(LoggingConnection)
        1    0.000    0.000    0.000    0.000
extras.py:248(NamedTupleConnection)
        1    0.000    0.000    0.000    0.000
extras.py:195(RealDictConnection)
        1    0.000    0.000    0.000    0.000
extras.py:416(MinTimeLoggingCursor)
        1    0.000    0.000    0.000    0.000
extras.py:102(DictConnection)

Re: performance issue with psycopg

From

Daniele Varrazzo

Date:

12 December 2011, 13:20:15

On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote:
> I am using psycopg 2.4.2 and facing performance issue. An explain
> analyze of the query on the database shows 0.75secs. But when used
> through the psycopg layer I am seeing 1.5 secs. Any idea why the extra
> cost and how to optimize it?
>
> from psycopg2.extras import DictCursor
> from psycopg2.extensions import UNICODE, UNICODEARRAY
> psycopg2.extensions.register_type(UNICODE)
> psycopg2.extensions.register_type(UNICODEARRAY)
>
>
> cursor = dbCnx.cursor(cursor_factory=DictCursor)
> cursor.execute(query, params)
> results = cursor.fetchall()

What is the performance using only the basic cursor? And how big is
the returned set?

-- Daniele

Re: performance issue with psycopg

From

Sushant Sinha

Date:

12 December 2011, 13:46:35

On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote:
> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote:
> > I am using psycopg 2.4.2 and facing performance issue. An explain
> > analyze of the query on the database shows 0.75secs. But when used
> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra
> > cost and how to optimize it?
> >
> > from psycopg2.extras import DictCursor
> > from psycopg2.extensions import UNICODE, UNICODEARRAY
> > psycopg2.extensions.register_type(UNICODE)
> > psycopg2.extensions.register_type(UNICODEARRAY)
> >
> >
> > cursor = dbCnx.cursor(cursor_factory=DictCursor)
> > cursor.execute(query, params)
> > results = cursor.fetchall()
>
> What is the performance using only the basic cursor? And how big is
> the returned set?
>
> -- Daniele


Number of rows: 25111, number of columns: 12
With basic cursor it is 1.2 secs

        1    0.000    0.000    1.230    1.230 a.py:1(<module>)
        1    0.876    0.876    0.877    0.877 {method 'execute' of
'psycopg2._psycopg.cursor' objects}
        1    0.142    0.142    0.202    0.202 {method 'fetchall' of
'psycopg2._psycopg.cursor' objects}
        1    0.000    0.000    0.148    0.148 db_utility.py:1(<module>)
        1    0.002    0.002    0.132    0.132 extras.py:5(<module>)
        1    0.000    0.000    0.125    0.125 uuid.py:45(<module>)
        2    0.000    0.000    0.120    0.060 util.py:235(find_library)
        2    0.000    0.000    0.120    0.060
util.py:207(_findSoname_ldconfig)

Re: performance issue with psycopg

From

Daniele Varrazzo

Date:

12 December 2011, 15:08:39

On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote:
> On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote:
>> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote:
>> > I am using psycopg 2.4.2 and facing performance issue. An explain
>> > analyze of the query on the database shows 0.75secs. But when used
>> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra
>> > cost and how to optimize it?
>> >
>> > from psycopg2.extras import DictCursor
>> > from psycopg2.extensions import UNICODE, UNICODEARRAY
>> > psycopg2.extensions.register_type(UNICODE)
>> > psycopg2.extensions.register_type(UNICODEARRAY)
>> >
>> >
>> > cursor = dbCnx.cursor(cursor_factory=DictCursor)
>> > cursor.execute(query, params)
>> > results = cursor.fetchall()
>>
>> What is the performance using only the basic cursor? And how big is
>> the returned set?
>>
>> -- Daniele
>
>
> Number of rows: 25111, number of columns: 12
> With basic cursor it is 1.2 secs
>
>        1    0.000    0.000    1.230    1.230 a.py:1(<module>)
>        1    0.876    0.876    0.877    0.877 {method 'execute' of
> 'psycopg2._psycopg.cursor' objects}
>        1    0.142    0.142    0.202    0.202 {method 'fetchall' of
> 'psycopg2._psycopg.cursor' objects}
>        1    0.000    0.000    0.148    0.148 db_utility.py:1(<module>)
>        1    0.002    0.002    0.132    0.132 extras.py:5(<module>)
>        1    0.000    0.000    0.125    0.125 uuid.py:45(<module>)
>        2    0.000    0.000    0.120    0.060 util.py:235(find_library)
>        2    0.000    0.000    0.120    0.060
> util.py:207(_findSoname_ldconfig)

This timing includes the module import, which is now (psycopg 2.4.3)
lazy and is not loaded if not used. The overhead is only paid once at
import time though.

The execution time for a client-side cursor is in execute(): it
contains the query time plus the fetch time, so it has an overhead of
0.12 over the explain analyze. The 0.75 of the explain doesn't include
the transfer time from server to client: it can be probably estimated
running the query in psql and dumping the output to /dev/null. The
method doesn't do much more than calling PQexec, so there's probably
little to save there.

Conversion from postgres strings to python objects happen into
fetchall. the standard cursor only makes a tuple out of the converted
types, the dictcursor does more: the overhead is 0.778 - 0.202.  So
you are paying 0.5 sec for using the more complex dictcursor over the
basic cursor.

Every postgres data type is converted into a python object using a
specific cast function: some conversion function may be slower than
the other. You may exclude some of the columns out of your query to
understand if there is one data type particularly slower than the
other.

Said this, I don't see a specific hotspot in the timing you have
reported: maybe dictcursor can be optimized, maybe some of the
typecasters do. I don't know if there is any gross bug, e.g. causing a
quadratic behaviour, or if some time can be shaved off by
micro-optimizing DictCursor or some adapter. I'll be happy to fix the
library if anything that can be improved is pointed out, but I'm not
going to investigate on possible optimizations by myself: I'll leave
it to you or to some other volunteer.

-- Daniele

Re: performance issue with psycopg

From

Sushant Sinha

Date:

13 December 2011, 09:45:31

Thanks Daniele! You are absolutely correct. I think limiting the number
of rows getting pulled out may solve the problem for me. I will look
into the overhead of dict_cursor later on.

-Sushant.

On Mon, 2011-12-12 at 19:08 +0000, Daniele Varrazzo wrote:
> On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote:
> > On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote:
> >> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote:
> >> > I am using psycopg 2.4.2 and facing performance issue. An explain
> >> > analyze of the query on the database shows 0.75secs. But when used
> >> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra
> >> > cost and how to optimize it?
> >> >
> >> > from psycopg2.extras import DictCursor
> >> > from psycopg2.extensions import UNICODE, UNICODEARRAY
> >> > psycopg2.extensions.register_type(UNICODE)
> >> > psycopg2.extensions.register_type(UNICODEARRAY)
> >> >
> >> >
> >> > cursor = dbCnx.cursor(cursor_factory=DictCursor)
> >> > cursor.execute(query, params)
> >> > results = cursor.fetchall()
> >>
> >> What is the performance using only the basic cursor? And how big is
> >> the returned set?
> >>
> >> -- Daniele
> >
> >
> > Number of rows: 25111, number of columns: 12
> > With basic cursor it is 1.2 secs
> >
> >        1    0.000    0.000    1.230    1.230 a.py:1(<module>)
> >        1    0.876    0.876    0.877    0.877 {method 'execute' of
> > 'psycopg2._psycopg.cursor' objects}
> >        1    0.142    0.142    0.202    0.202 {method 'fetchall' of
> > 'psycopg2._psycopg.cursor' objects}
> >        1    0.000    0.000    0.148    0.148 db_utility.py:1(<module>)
> >        1    0.002    0.002    0.132    0.132 extras.py:5(<module>)
> >        1    0.000    0.000    0.125    0.125 uuid.py:45(<module>)
> >        2    0.000    0.000    0.120    0.060 util.py:235(find_library)
> >        2    0.000    0.000    0.120    0.060
> > util.py:207(_findSoname_ldconfig)
>
> This timing includes the module import, which is now (psycopg 2.4.3)
> lazy and is not loaded if not used. The overhead is only paid once at
> import time though.
>
> The execution time for a client-side cursor is in execute(): it
> contains the query time plus the fetch time, so it has an overhead of
> 0.12 over the explain analyze. The 0.75 of the explain doesn't include
> the transfer time from server to client: it can be probably estimated
> running the query in psql and dumping the output to /dev/null. The
> method doesn't do much more than calling PQexec, so there's probably
> little to save there.
>
> Conversion from postgres strings to python objects happen into
> fetchall. the standard cursor only makes a tuple out of the converted
> types, the dictcursor does more: the overhead is 0.778 - 0.202.  So
> you are paying 0.5 sec for using the more complex dictcursor over the
> basic cursor.
>
> Every postgres data type is converted into a python object using a
> specific cast function: some conversion function may be slower than
> the other. You may exclude some of the columns out of your query to
> understand if there is one data type particularly slower than the
> other.
>
> Said this, I don't see a specific hotspot in the timing you have
> reported: maybe dictcursor can be optimized, maybe some of the
> typecasters do. I don't know if there is any gross bug, e.g. causing a
> quadratic behaviour, or if some time can be shaved off by
> micro-optimizing DictCursor or some adapter. I'll be happy to fix the
> library if anything that can be improved is pointed out, but I'm not
> going to investigate on possible optimizations by myself: I'll leave
> it to you or to some other volunteer.
>
> -- Daniele