Re: performance issue with psycopg - Mailing list psycopg
From | Daniele Varrazzo |
---|---|
Subject | Re: performance issue with psycopg |
Date | |
Msg-id | CA+mi_8ZVqs3=WkKRkSoEUs-7RtE0ejL+L3SrwRqv-iqxSPeWWA@mail.gmail.com Whole thread Raw |
In response to | Re: performance issue with psycopg (Sushant Sinha <sushant354@gmail.com>) |
Responses |
Re: performance issue with psycopg
(Sushant Sinha <sushant354@gmail.com>)
|
List | psycopg |
On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote: > On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote: >> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote: >> > I am using psycopg 2.4.2 and facing performance issue. An explain >> > analyze of the query on the database shows 0.75secs. But when used >> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra >> > cost and how to optimize it? >> > >> > from psycopg2.extras import DictCursor >> > from psycopg2.extensions import UNICODE, UNICODEARRAY >> > psycopg2.extensions.register_type(UNICODE) >> > psycopg2.extensions.register_type(UNICODEARRAY) >> > >> > >> > cursor = dbCnx.cursor(cursor_factory=DictCursor) >> > cursor.execute(query, params) >> > results = cursor.fetchall() >> >> What is the performance using only the basic cursor? And how big is >> the returned set? >> >> -- Daniele > > > Number of rows: 25111, number of columns: 12 > With basic cursor it is 1.2 secs > > 1 0.000 0.000 1.230 1.230 a.py:1(<module>) > 1 0.876 0.876 0.877 0.877 {method 'execute' of > 'psycopg2._psycopg.cursor' objects} > 1 0.142 0.142 0.202 0.202 {method 'fetchall' of > 'psycopg2._psycopg.cursor' objects} > 1 0.000 0.000 0.148 0.148 db_utility.py:1(<module>) > 1 0.002 0.002 0.132 0.132 extras.py:5(<module>) > 1 0.000 0.000 0.125 0.125 uuid.py:45(<module>) > 2 0.000 0.000 0.120 0.060 util.py:235(find_library) > 2 0.000 0.000 0.120 0.060 > util.py:207(_findSoname_ldconfig) This timing includes the module import, which is now (psycopg 2.4.3) lazy and is not loaded if not used. The overhead is only paid once at import time though. The execution time for a client-side cursor is in execute(): it contains the query time plus the fetch time, so it has an overhead of 0.12 over the explain analyze. The 0.75 of the explain doesn't include the transfer time from server to client: it can be probably estimated running the query in psql and dumping the output to /dev/null. The method doesn't do much more than calling PQexec, so there's probably little to save there. Conversion from postgres strings to python objects happen into fetchall. the standard cursor only makes a tuple out of the converted types, the dictcursor does more: the overhead is 0.778 - 0.202. So you are paying 0.5 sec for using the more complex dictcursor over the basic cursor. Every postgres data type is converted into a python object using a specific cast function: some conversion function may be slower than the other. You may exclude some of the columns out of your query to understand if there is one data type particularly slower than the other. Said this, I don't see a specific hotspot in the timing you have reported: maybe dictcursor can be optimized, maybe some of the typecasters do. I don't know if there is any gross bug, e.g. causing a quadratic behaviour, or if some time can be shaved off by micro-optimizing DictCursor or some adapter. I'll be happy to fix the library if anything that can be improved is pointed out, but I'm not going to investigate on possible optimizations by myself: I'll leave it to you or to some other volunteer. -- Daniele