Thread: [Patch] Data type cache to speed up query tool
Hi pgadmin hackers, I have been having trouble with the speed of the pgAdmin query tool on remote links, and I'm sure I'm not the only one. The delay is due to loading the data type information for every column: > QUERY : Scalar query (localhost:50119): SELECT format_type(oid,-1) as typname FROM pg_type WHERE oid = 1114 > QUERY : Scalar query (localhost:50119): SELECT CASE WHEN typbasetype=0 THEN oid else typbasetype END AS basetype FROM pg_type WHERE oid=1114 I have been playing around with a fix for quite a while, but I have finally sat myself down and put it on github (https://github.com/aiht/pgadmin3/commit/80a896e674bf68cc67aa3dc0dd425db9aa7372aa) (patch attached, too). I'm still trying to make sure it does exactly what I expect, so the debug logging is quite heavy, but I wanted to ask a few questions to see if I am on the right track. If this code is useful to the project, I'll trim the logging down so it isn't in the way. Code conventions: * Does the pg prefix for a class name have a specific meaning and if so, am I using it wrongly? * I have copied the i prefix on some methods, but I don't really understand what it's for; I was thinking perhaps 'internal'? Further suggestions: * This code supports pre-caching all types in one go, but currently only loads them one by one. Maybe this should be an option. * When a pgConn is Duplicate()d, its cache is copied for the new connection to use, but as far as I can see this is only called when doing File->New Window in the query tool. If more pgConns are constructed by Duplicate() then caching may be more efficient. * Quite a few other parts of the code do their own type-info loading; at least a few of them could share in this cache. However, the other code does not load types one by one so they do not have the same latency issues as the query tool. Possible problems: * Threading? As far as I have seen, the code doesn't tend to use locking. The query threads just avoid accessing each others objects while they are active. I have not added any locking, but each cache object belongs to a single connection, and is only used from code that was already using that connection, so I think it is okay. * The cache is currently not flushed, because I was not sure when to trigger it. The main object tree should probably flush its cache when the view is refreshed, but I am not sure about what kind of changes the server may make to existing types that require a flush. * I have not tested this on Postgres variants (EDB etc.) or on old versions. I have not tested the current version on Windows (although I did test an old version of the same code long ago). I have not yet patched the MSVS project file, but the only change required should be to add the pgTypeCache.{h,cpp} files. I hope it helps! Harun
Attachment
Hi On Sat, Nov 24, 2012 at 8:55 AM, aiht <aihtdikh@gmail.com> wrote: > Hi pgadmin hackers, > > I have been having trouble with the speed of the pgAdmin query tool on > remote links, and I'm sure I'm not the only one. > The delay is due to loading the data type information for every column: >> QUERY : Scalar query (localhost:50119): SELECT format_type(oid,-1) as >> typname FROM pg_type WHERE oid = 1114 >> QUERY : Scalar query (localhost:50119): SELECT CASE WHEN typbasetype=0 >> THEN oid else typbasetype END AS basetype FROM pg_type WHERE oid=1114 > > I have been playing around with a fix for quite a while, but I have finally > sat myself down and put it on github > (https://github.com/aiht/pgadmin3/commit/80a896e674bf68cc67aa3dc0dd425db9aa7372aa) > (patch attached, too). This is a nice idea, but can't we use/extend the pgDatatype class? That's already intended to handle type caching, albeit not between connections, if memory serves (I'm not sure that's a good idea anyway, as one session may see a different set of types to another at any given point in time). > Code conventions: > * Does the pg prefix for a class name have a specific meaning and if so, am > I using it wrongly? It's used to denote classes related to Postgres (any fork). We have some edb* and some gp* classes as well, for things specific to EnterpriseDB's Postgres Plus Advanced Server and Greenplum database. The original thinking was that eventually we might also support some other databases (e.g. odbc*, my* etc) to some extent. > * I have copied the i prefix on some methods, but I don't really understand > what it's for; I was thinking perhaps 'internal'? Yes - it's used for setters that are supposed to be private. I wouldn't be surprised if some of them are public now though (they could be fixed if you're feeling bored). > Further suggestions: > * This code supports pre-caching all types in one go, but currently only > loads them one by one. Maybe this should be an option. It should probably pre-cache them all. That'll be a lot cheaper. > * When a pgConn is Duplicate()d, its cache is copied for the new connection > to use, but as far as I can see this is only called when doing File->New > Window in the query tool. If more pgConns are constructed by Duplicate() > then caching may be more efficient. Not so sure about that, as mentioned above. > * Quite a few other parts of the code do their own type-info loading; at > least a few of them could share in this cache. However, the other code does > not load types one by one so they do not have the same latency issues as the > query tool. Yeah, but it would give us a common interface, and overall would improve efficiency. > Possible problems: > * Threading? As far as I have seen, the code doesn't tend to use locking. > The query threads just avoid accessing each others objects while they are > active. I have not added any locking, but each cache object belongs to a > single connection, and is only used from code that was already using that > connection, so I think it is okay. Yes, it should be. > * The cache is currently not flushed, because I was not sure when to trigger > it. The main object tree should probably flush its cache when the view is > refreshed, but I am not sure about what kind of changes the server may make > to existing types that require a flush. Agreed on the tree. Not sure about the general case either though. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Dave, On 24/11/12 23:21, Dave Page wrote: > This is a nice idea, but can't we use/extend the pgDatatype class? > That's already intended to handle type caching, albeit not between > connections, if memory serves (I'm not sure that's a good idea anyway, > as one session may see a different set of types to another at any > given point in time). Good idea; I was being overly conservative, replacing only like for like. I'm converting to use this class so the cache is more generally applicable. It's more efficient, too: no need to load every typmod variant from the server. I only had a few hours to work on it so far, so no patch release yet, but the WIP is on github aiht/pgadmin if you're interested. >> * I have copied the i prefix on some methods, but I don't really understand >> what it's for; I was thinking perhaps 'internal'? > Yes - it's used for setters that are supposed to be private. I > wouldn't be surprised if some of them are public now though (they > could be fixed if you're feeling bored). Thanks. I'm not quite that bored right now, sorry. Maybe later :) > It should probably pre-cache them all. That'll be a lot cheaper. Agreed. I have also added support for single-record loading in the DatatypeReader so that new types (created after the cache was filled) can still be loaded later. >> * When a pgConn is Duplicate()d, its cache is copied for the new connection >> to use, but as far as I can see this is only called when doing File->New >> Window in the query tool. If more pgConns are constructed by Duplicate() >> then caching may be more efficient. > Not so sure about that, as mentioned above. Yeah, now that I've looked closer it doesn't seem so easy to Duplicate() other connections. It seems safe in the query tool because the new window uses the same host/db/etc. But perhaps I am missing some points about e.g. schema search paths? > >> * Quite a few other parts of the code do their own type-info loading; at >> least a few of them could share in this cache. However, the other code does >> not load types one by one so they do not have the same latency issues as the >> query tool. > Yeah, but it would give us a common interface, and overall would > improve efficiency. Agreed; still worthwhile. Thanks, Harun
Hi On Mon, Nov 26, 2012 at 1:12 AM, aiht <aihtdikh@gmail.com> wrote: > > >>> * When a pgConn is Duplicate()d, its cache is copied for the new >>> connection >>> to use, but as far as I can see this is only called when doing File->New >>> Window in the query tool. If more pgConns are constructed by Duplicate() >>> then caching may be more efficient. >> >> Not so sure about that, as mentioned above. > > Yeah, now that I've looked closer it doesn't seem so easy to Duplicate() > other connections. It seems safe in the query tool because the new window > uses the same host/db/etc. But perhaps I am missing some points about e.g. > schema search paths? It should be safe - we're only duplicating the connection parameters and then opening a new connection with them. We're not copying session state or anything like that. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company