Thread: performance issue with psycopg
I am using psycopg 2.4.2 and facing performance issue. An explain analyze of the query on the database shows 0.75secs. But when used through the psycopg layer I am seeing 1.5 secs. Any idea why the extra cost and how to optimize it? from psycopg2.extras import DictCursor from psycopg2.extensions import UNICODE, UNICODEARRAY psycopg2.extensions.register_type(UNICODE) psycopg2.extensions.register_type(UNICODEARRAY) cursor = dbCnx.cursor(cursor_factory=DictCursor) cursor.execute(query, params) results = cursor.fetchall() Here is the profiling for extras.py: 35 0.896 0.026 0.896 0.026 extras.py:118(execute) 35 0.296 0.008 0.778 0.022 extras.py:80(fetchall) 276433 0.270 0.000 0.314 0.000 extras.py:148(__setitem__) 1 0.001 0.001 0.129 0.129 extras.py:5(<module>) 25153 0.112 0.000 0.116 0.000 extras.py:139(__init__) 17176 0.018 0.000 0.028 0.000 extras.py:143(__getitem__) 1 0.000 0.000 0.002 0.002 extras.py:566(HstoreAdapter) 1 0.000 0.000 0.002 0.002 extras.py:762(CompositeCaster) 4001 0.001 0.000 0.001 0.000 extras.py:162(has_key) 35 0.000 0.000 0.000 0.000 extras.py:128(_build_index) 35 0.000 0.000 0.000 0.000 extras.py:113(__init__) 35 0.000 0.000 0.000 0.000 extras.py:50(__init__) 46 0.000 0.000 0.000 0.000 extras.py:914(<lambda>) 1 0.000 0.000 0.000 0.000 extras.py:254(NamedTupleCursor) 10 0.000 0.000 0.000 0.000 extras.py:156(keys) 1 0.000 0.000 0.000 0.000 extras.py:134(DictRow) 1 0.000 0.000 0.000 0.000 extras.py:373(LoggingCursor) 1 0.000 0.000 0.000 0.000 extras.py:47(DictCursorBase) 1 0.000 0.000 0.000 0.000 extras.py:433(UUID_adapter) 1 0.000 0.000 0.000 0.000 extras.py:389(MinTimeLoggingConnection) 1 0.000 0.000 0.000 0.000 extras.py:110(DictCursor) 1 0.000 0.000 0.000 0.000 extras.py:493(Inet) 1 0.000 0.000 0.000 0.000 extras.py:233(RealDictRow) 1 0.000 0.000 0.000 0.000 extras.py:203(RealDictCursor) 1 0.000 0.000 0.000 0.000 extras.py:326(LoggingConnection) 1 0.000 0.000 0.000 0.000 extras.py:248(NamedTupleConnection) 1 0.000 0.000 0.000 0.000 extras.py:195(RealDictConnection) 1 0.000 0.000 0.000 0.000 extras.py:416(MinTimeLoggingCursor) 1 0.000 0.000 0.000 0.000 extras.py:102(DictConnection)
On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote: > I am using psycopg 2.4.2 and facing performance issue. An explain > analyze of the query on the database shows 0.75secs. But when used > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra > cost and how to optimize it? > > from psycopg2.extras import DictCursor > from psycopg2.extensions import UNICODE, UNICODEARRAY > psycopg2.extensions.register_type(UNICODE) > psycopg2.extensions.register_type(UNICODEARRAY) > > > cursor = dbCnx.cursor(cursor_factory=DictCursor) > cursor.execute(query, params) > results = cursor.fetchall() What is the performance using only the basic cursor? And how big is the returned set? -- Daniele
On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote: > On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote: > > I am using psycopg 2.4.2 and facing performance issue. An explain > > analyze of the query on the database shows 0.75secs. But when used > > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra > > cost and how to optimize it? > > > > from psycopg2.extras import DictCursor > > from psycopg2.extensions import UNICODE, UNICODEARRAY > > psycopg2.extensions.register_type(UNICODE) > > psycopg2.extensions.register_type(UNICODEARRAY) > > > > > > cursor = dbCnx.cursor(cursor_factory=DictCursor) > > cursor.execute(query, params) > > results = cursor.fetchall() > > What is the performance using only the basic cursor? And how big is > the returned set? > > -- Daniele Number of rows: 25111, number of columns: 12 With basic cursor it is 1.2 secs 1 0.000 0.000 1.230 1.230 a.py:1(<module>) 1 0.876 0.876 0.877 0.877 {method 'execute' of 'psycopg2._psycopg.cursor' objects} 1 0.142 0.142 0.202 0.202 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects} 1 0.000 0.000 0.148 0.148 db_utility.py:1(<module>) 1 0.002 0.002 0.132 0.132 extras.py:5(<module>) 1 0.000 0.000 0.125 0.125 uuid.py:45(<module>) 2 0.000 0.000 0.120 0.060 util.py:235(find_library) 2 0.000 0.000 0.120 0.060 util.py:207(_findSoname_ldconfig)
On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote: > On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote: >> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote: >> > I am using psycopg 2.4.2 and facing performance issue. An explain >> > analyze of the query on the database shows 0.75secs. But when used >> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra >> > cost and how to optimize it? >> > >> > from psycopg2.extras import DictCursor >> > from psycopg2.extensions import UNICODE, UNICODEARRAY >> > psycopg2.extensions.register_type(UNICODE) >> > psycopg2.extensions.register_type(UNICODEARRAY) >> > >> > >> > cursor = dbCnx.cursor(cursor_factory=DictCursor) >> > cursor.execute(query, params) >> > results = cursor.fetchall() >> >> What is the performance using only the basic cursor? And how big is >> the returned set? >> >> -- Daniele > > > Number of rows: 25111, number of columns: 12 > With basic cursor it is 1.2 secs > > 1 0.000 0.000 1.230 1.230 a.py:1(<module>) > 1 0.876 0.876 0.877 0.877 {method 'execute' of > 'psycopg2._psycopg.cursor' objects} > 1 0.142 0.142 0.202 0.202 {method 'fetchall' of > 'psycopg2._psycopg.cursor' objects} > 1 0.000 0.000 0.148 0.148 db_utility.py:1(<module>) > 1 0.002 0.002 0.132 0.132 extras.py:5(<module>) > 1 0.000 0.000 0.125 0.125 uuid.py:45(<module>) > 2 0.000 0.000 0.120 0.060 util.py:235(find_library) > 2 0.000 0.000 0.120 0.060 > util.py:207(_findSoname_ldconfig) This timing includes the module import, which is now (psycopg 2.4.3) lazy and is not loaded if not used. The overhead is only paid once at import time though. The execution time for a client-side cursor is in execute(): it contains the query time plus the fetch time, so it has an overhead of 0.12 over the explain analyze. The 0.75 of the explain doesn't include the transfer time from server to client: it can be probably estimated running the query in psql and dumping the output to /dev/null. The method doesn't do much more than calling PQexec, so there's probably little to save there. Conversion from postgres strings to python objects happen into fetchall. the standard cursor only makes a tuple out of the converted types, the dictcursor does more: the overhead is 0.778 - 0.202. So you are paying 0.5 sec for using the more complex dictcursor over the basic cursor. Every postgres data type is converted into a python object using a specific cast function: some conversion function may be slower than the other. You may exclude some of the columns out of your query to understand if there is one data type particularly slower than the other. Said this, I don't see a specific hotspot in the timing you have reported: maybe dictcursor can be optimized, maybe some of the typecasters do. I don't know if there is any gross bug, e.g. causing a quadratic behaviour, or if some time can be shaved off by micro-optimizing DictCursor or some adapter. I'll be happy to fix the library if anything that can be improved is pointed out, but I'm not going to investigate on possible optimizations by myself: I'll leave it to you or to some other volunteer. -- Daniele
Thanks Daniele! You are absolutely correct. I think limiting the number of rows getting pulled out may solve the problem for me. I will look into the overhead of dict_cursor later on. -Sushant. On Mon, 2011-12-12 at 19:08 +0000, Daniele Varrazzo wrote: > On Mon, Dec 12, 2011 at 5:46 PM, Sushant Sinha <sushant354@gmail.com> wrote: > > On Mon, 2011-12-12 at 17:20 +0000, Daniele Varrazzo wrote: > >> On Mon, Dec 12, 2011 at 4:41 PM, Sushant Sinha <sushant354@gmail.com> wrote: > >> > I am using psycopg 2.4.2 and facing performance issue. An explain > >> > analyze of the query on the database shows 0.75secs. But when used > >> > through the psycopg layer I am seeing 1.5 secs. Any idea why the extra > >> > cost and how to optimize it? > >> > > >> > from psycopg2.extras import DictCursor > >> > from psycopg2.extensions import UNICODE, UNICODEARRAY > >> > psycopg2.extensions.register_type(UNICODE) > >> > psycopg2.extensions.register_type(UNICODEARRAY) > >> > > >> > > >> > cursor = dbCnx.cursor(cursor_factory=DictCursor) > >> > cursor.execute(query, params) > >> > results = cursor.fetchall() > >> > >> What is the performance using only the basic cursor? And how big is > >> the returned set? > >> > >> -- Daniele > > > > > > Number of rows: 25111, number of columns: 12 > > With basic cursor it is 1.2 secs > > > > 1 0.000 0.000 1.230 1.230 a.py:1(<module>) > > 1 0.876 0.876 0.877 0.877 {method 'execute' of > > 'psycopg2._psycopg.cursor' objects} > > 1 0.142 0.142 0.202 0.202 {method 'fetchall' of > > 'psycopg2._psycopg.cursor' objects} > > 1 0.000 0.000 0.148 0.148 db_utility.py:1(<module>) > > 1 0.002 0.002 0.132 0.132 extras.py:5(<module>) > > 1 0.000 0.000 0.125 0.125 uuid.py:45(<module>) > > 2 0.000 0.000 0.120 0.060 util.py:235(find_library) > > 2 0.000 0.000 0.120 0.060 > > util.py:207(_findSoname_ldconfig) > > This timing includes the module import, which is now (psycopg 2.4.3) > lazy and is not loaded if not used. The overhead is only paid once at > import time though. > > The execution time for a client-side cursor is in execute(): it > contains the query time plus the fetch time, so it has an overhead of > 0.12 over the explain analyze. The 0.75 of the explain doesn't include > the transfer time from server to client: it can be probably estimated > running the query in psql and dumping the output to /dev/null. The > method doesn't do much more than calling PQexec, so there's probably > little to save there. > > Conversion from postgres strings to python objects happen into > fetchall. the standard cursor only makes a tuple out of the converted > types, the dictcursor does more: the overhead is 0.778 - 0.202. So > you are paying 0.5 sec for using the more complex dictcursor over the > basic cursor. > > Every postgres data type is converted into a python object using a > specific cast function: some conversion function may be slower than > the other. You may exclude some of the columns out of your query to > understand if there is one data type particularly slower than the > other. > > Said this, I don't see a specific hotspot in the timing you have > reported: maybe dictcursor can be optimized, maybe some of the > typecasters do. I don't know if there is any gross bug, e.g. causing a > quadratic behaviour, or if some time can be shaved off by > micro-optimizing DictCursor or some adapter. I'll be happy to fix the > library if anything that can be improved is pointed out, but I'm not > going to investigate on possible optimizations by myself: I'll leave > it to you or to some other volunteer. > > -- Daniele