Re: performance issue with psycopg - Mailing list psycopg
From | Sushant Sinha |
---|---|
Subject | Re: performance issue with psycopg |
Date | |
Msg-id | 1323783907.1829.1.camel@dragflick Whole thread Raw |
In response to | Re: performance issue with psycopg (Daniele Varrazzo <daniele.varrazzo@gmail.com>) |
List | psycopg |
Thanks Daniele! You are absolutely correct. I think limiting the number of rows getting pulled out may solve the problem for me. I will look into the overhead of dict_cursor later on. -Sushant. On Mon, 2011-12-12 at 19:08 +0000, Daniele Varrazzo wrote: > On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote: > > On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote: > >> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote: > >> > I am using psycopg 2.4.2 and facing performance issue. An explain > >> > analyze of the query on the database shows 0.75secs. But when used > >> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra > >> > cost and how to optimize it? > >> > > >> > from psycopg2.extras import DictCursor > >> > from psycopg2.extensions import UNICODE, UNICODEARRAY > >> > psycopg2.extensions.register_type(UNICODE) > >> > psycopg2.extensions.register_type(UNICODEARRAY) > >> > > >> > > >> > cursor = dbCnx.cursor(cursor_factory=DictCursor) > >> > cursor.execute(query, params) > >> > results = cursor.fetchall() > >> > >> What is the performance using only the basic cursor? And how big is > >> the returned set? > >> > >> -- Daniele > > > > > > Number of rows: 25111, number of columns: 12 > > With basic cursor it is 1.2 secs > > > > 1 0.000 0.000 1.230 1.230 a.py:1(<module>) > > 1 0.876 0.876 0.877 0.877 {method 'execute' of > > 'psycopg2._psycopg.cursor' objects} > > 1 0.142 0.142 0.202 0.202 {method 'fetchall' of > > 'psycopg2._psycopg.cursor' objects} > > 1 0.000 0.000 0.148 0.148 db_utility.py:1(<module>) > > 1 0.002 0.002 0.132 0.132 extras.py:5(<module>) > > 1 0.000 0.000 0.125 0.125 uuid.py:45(<module>) > > 2 0.000 0.000 0.120 0.060 util.py:235(find_library) > > 2 0.000 0.000 0.120 0.060 > > util.py:207(_findSoname_ldconfig) > > This timing includes the module import, which is now (psycopg 2.4.3) > lazy and is not loaded if not used. The overhead is only paid once at > import time though. > > The execution time for a client-side cursor is in execute(): it > contains the query time plus the fetch time, so it has an overhead of > 0.12 over the explain analyze. The 0.75 of the explain doesn't include > the transfer time from server to client: it can be probably estimated > running the query in psql and dumping the output to /dev/null. The > method doesn't do much more than calling PQexec, so there's probably > little to save there. > > Conversion from postgres strings to python objects happen into > fetchall. the standard cursor only makes a tuple out of the converted > types, the dictcursor does more: the overhead is 0.778 - 0.202. So > you are paying 0.5 sec for using the more complex dictcursor over the > basic cursor. > > Every postgres data type is converted into a python object using a > specific cast function: some conversion function may be slower than > the other. You may exclude some of the columns out of your query to > understand if there is one data type particularly slower than the > other. > > Said this, I don't see a specific hotspot in the timing you have > reported: maybe dictcursor can be optimized, maybe some of the > typecasters do. I don't know if there is any gross bug, e.g. causing a > quadratic behaviour, or if some time can be shaved off by > micro-optimizing DictCursor or some adapter. I'll be happy to fix the > library if anything that can be improved is pointed out, but I'm not > going to investigate on possible optimizations by myself: I'll leave > it to you or to some other volunteer. > > -- Daniele