Thread: Redundant database objects.
Hi all,
Our project has been running for 10 years now.
We have a large number of orphaned or redundant tables, views, and functions, due to many years of inadequate source management.
We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "
Is there an effective way to identify these objects using the stats tables? Something like a last accessed/used or some such column?
Any suggestions welcomed.
Thanks
Andrew Bartley
Aimstats Pty Ltd
On 07/12/2010 02:40 PM, Andrew Bartley wrote: > We have a large number of orphaned or redundant tables, views, and > functions, due to many years of inadequate source management. > > We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC > gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 " > > Is there an effective way to identify these objects using the stats > tables? Something like a last accessed/used or some such column? Maybe pg_statio* views? http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & Support
Attachment
Thanks Joe,
Unfortunately these views only give me what appears to be a certain time frame. This does not help all that much. It will give a list of tables, indexes and sequences that have been used in the time frame, so that is at least a start.
It would be good if there was a timestamp (last accessed) that would give me a clearer indication.
Thanks
Andrew
On 13 July 2010 08:46, Joe Conway <mail@joeconway.com> wrote:
On 07/12/2010 02:40 PM, Andrew Bartley wrote:Maybe pg_statio* views?
> We have a large number of orphaned or redundant tables, views, and
> functions, due to many years of inadequate source management.
>
> We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
> gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "
>
> Is there an effective way to identify these objects using the stats
> tables? Something like a last accessed/used or some such column?
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support
Andrew Bartley wrote: > Unfortunately these views only give me what appears to be a certain > time frame. This does not help all that much. It will give a list of > tables, indexes and sequences that have been used in the time frame, > so that is at least a start. You can use pg_stat_reset() to set those back to 0 again and then see what actually gets used moving forward from the point you do that. That's a reasonable idea to do anyway to make all those statistics better reflect recent activity rather than historical. Just be warned that it will screw up many monitoring systems if you have them pointed toward those statistics tables and grabbing snapshots, some will view the reset as the values going negative which doesn't make any real-world sense. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Thanks Greg,
It seems that the underlying stats tables are reset on a periodic basis, can i stop this process? Is it a .conf setting? I have had a good look around, nothing sticks out. If I can stop it, then i could use pg_stat_reset() then monitor the stat views over an extended period without them being reset by some periodic job. That, at the moment, is my main concern.
Thanks again.
Also i need to find similar information regarding functions and views.... Any suggestions?
Thanks
Andrew Bartley
On 13 July 2010 09:45, Greg Smith <greg@2ndquadrant.com> wrote:
Andrew Bartley wrote:You can use pg_stat_reset() to set those back to 0 again and then see what actually gets used moving forward from the point you do that. That's a reasonable idea to do anyway to make all those statistics better reflect recent activity rather than historical. Just be warned that it will screw up many monitoring systems if you have them pointed toward those statistics tables and grabbing snapshots, some will view the reset as the values going negative which doesn't make any real-world sense.Unfortunately these views only give me what appears to be a certain time frame. This does not help all that much. It will give a list of tables, indexes and sequences that have been used in the time frame, so that is at least a start.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
Hello. When we moved old projects from postgresql 7.x to 8.4, I just looked at modification time for files in base/<dboid>/<toid> . So, I could determine, that some databases were inactive (precisely,not updated) for about a year and move them to archive... Andrew Bartley wrote: > Is there an effective way to identify these objects using the stats > tables? Something like a last accessed/used or some such column? > > Any suggestions welcomed. > > Thanks > > Andrew Bartley > Aimstats Pty Ltd > -- С уважением, Александр Пыхалов, системный администратор ЮГИНФО ЮФУ.
Thanks Alexander,
Wish i had thought of that.
I still need some way of finding redundant functions
Thanks again
Andrew
On 13 July 2010 15:38, Alexander Pyhalov <alp@rsu.ru> wrote:
Hello.
When we moved old projects from postgresql 7.x to 8.4, I just looked at modification time for files in base/<dboid>/<toid> . So, I could determine, that some databases were inactive (precisely,not updated) for about a year and move them to archive...--
Andrew Bartley wrote:Is there an effective way to identify these objects using the stats tables? Something like a last accessed/used or some such column?
Any suggestions welcomed.
Thanks
Andrew Bartley
Aimstats Pty Ltd
С уважением,
Александр Пыхалов,
системный администратор ЮГИНФО ЮФУ.
On 07/12/2010 11:07 PM, Andrew Bartley wrote: > > I still need some way of finding redundant functions A bit of a blunt instrument, but you could log all statements for a while, and then grep through the logs using a list of all functions of interest to see which ones never show up. Be wary of the performance hit and rapidly growing log though. Another idea would be to modify a copy of 8.3.x source code (I think that's what you said you were on in an earlier post) to emit a NOTICE with a name whenever a function is called if it meets some criteria. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & Support
Attachment
Hello 2010/7/13 Andrew Bartley <ambartley@gmail.com>: > Thanks Alexander, > Wish i had thought of that. > I still need some way of finding redundant functions > Thanks again > Andrew > I used a function source code injection for this task see http://www.postgres.cz/index.php/Injekt%C3%A1%C5%BE_zdrojov%C3%A9ho_k%C3%B3du_PL/pgSQL_funkc%C3%AD_za_%C3%BA%C4%8Delem_z%C3%ADsk%C3%A1n%C3%AD_seznamu_pou%C5%BE%C3%ADvan%C3%BDch_funkc%C3%AD article is czech, but comments in code are in english (I used it in 8.3) you can use a PL profiler too http://www.depesz.com/index.php/2008/05/15/waiting-for-84-function-stats/ Regards Pavel Stehule
Andrew Bartley wrote: > It seems that the underlying stats tables are reset on a periodic > basis, can i stop this process? Is it a .conf setting? Up until PostgreSQL 8.2 there's a setting named stats_reset_on_server_start that clears everything when the server stops: http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.html If you're on that version or earlier and it's turned on, there's your problem. This went away in 8.3. > Also i need to find similar information regarding functions and > views.... Any suggestions? Some suggestions already popped up here for functions. Views are tougher because they essentially work like a macro substitution: the content of the view gets substituted into the query where it appears, and off the query planner goes. That's why there's no statistics about them, they don't actually exist as objects that things are executed against. I don't know of any way to track their use other than to log all your queries and look for them popping up. A grep against the application source code for them can be useful too. The flip side to that is that eliminating views doesn't really improve performance, so it's rarely a top priority to get rid of them--unlike unused indexes for example. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Thanks to all that replied,
I used Joe Conway's suggestion, using grep and an extracted list of tables, functions and views form the DB. It worked very well.
I will attach the code I used to this thread once complete.
I will attach the code I used to this thread once complete.
Again Thanks
Andrew Bartley
On 14 July 2010 00:43, Greg Smith <greg@2ndquadrant.com> wrote:
Andrew Bartley wrote:Up until PostgreSQL 8.2 there's a setting named stats_reset_on_server_start that clears everything when the server stops: http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.htmlIt seems that the underlying stats tables are reset on a periodic basis, can i stop this process? Is it a .conf setting?
If you're on that version or earlier and it's turned on, there's your problem. This went away in 8.3.Some suggestions already popped up here for functions. Views are tougher because they essentially work like a macro substitution: the content of the view gets substituted into the query where it appears, and off the query planner goes. That's why there's no statistics about them, they don't actually exist as objects that things are executed against. I don't know of any way to track their use other than to log all your queries and look for them popping up. A grep against the application source code for them can be useful too.Also i need to find similar information regarding functions and views.... Any suggestions?
The flip side to that is that eliminating views doesn't really improve performance, so it's rarely a top priority to get rid of them--unlike unused indexes for example.--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
On Thu, Jul 15, 2010 at 1:04 AM, Andrew Bartley <ambartley@gmail.com> wrote: > Thanks to all that replied, > I used Joe Conway's suggestion, using grep and an extracted list of tables, > functions and views form the DB. It worked very well. > > I will attach the code I used to this thread once complete. > Again Thanks > Andrew Bartley > > On 14 July 2010 00:43, Greg Smith <greg@2ndquadrant.com> wrote: >> >> Andrew Bartley wrote: >>> >>> It seems that the underlying stats tables are reset on a periodic basis, >>> can i stop this process? Is it a .conf setting? >> >> Up until PostgreSQL 8.2 there's a setting named >> stats_reset_on_server_start that clears everything when the server stops: >> http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.html >> >> If you're on that version or earlier and it's turned on, there's your >> problem. This went away in 8.3. >> >>> Also i need to find similar information regarding functions and views.... >>> Any suggestions? >> >> Some suggestions already popped up here for functions. Views are tougher >> because they essentially work like a macro substitution: the content of the >> view gets substituted into the query where it appears, and off the query >> planner goes. That's why there's no statistics about them, they don't >> actually exist as objects that things are executed against. I don't know of >> any way to track their use other than to log all your queries and look for >> them popping up. A grep against the application source code for them can be >> useful too. >> >> The flip side to that is that eliminating views doesn't really improve >> performance, so it's rarely a top priority to get rid of them--unlike unused >> indexes for example. >> -- >> Greg Smith 2ndQuadrant US Baltimore, MD >> PostgreSQL Training, Services and Support >> greg@2ndQuadrant.com www.2ndQuadrant.us >> > > After some time of looking for a solution to a similar problem I came up with the following probable solution. 1)Enable logging to CSV format. 2)Then log all queries, insert, update, deletion statements for the given cluster. 3)Run your application for a period sufficient to have captured the execution of all possible functions and the deletes, inserts and updates to the tables and sequences. 4)Create a table in other database that conforms to the the field structure of CSV logging. 5)Populate this table with the contents of your CSV file. 6)Now (I think) all the database objects invoked or used in anyway (not sure about nested function calls though) will be listed in this table. So now you may query the appropriate field(s) to find out the "active" database objects. Since you are interested in weeding out the "inactive" objects and while obviously not loosing data or useful database objects I was thinking of the additional steps 7)After working hours stop your application(s) and somehow perform a database dump and restore (to other brand new database) on only these "active" database objects (maybe using --table=<activetable1> --table=<activetable..n>). Other database objects appearing in your "active database objects list" may require manual creation on your new database. 8)Rename the original database to other suitable label and rename the new database to the original name of the original database. 9)Start your application, perform some investigations to see if all is well. Allan.