Thread: Redundant database objects.

Redundant database objects.

From
Andrew Bartley
Date:
Hi all,

Our project has been running for 10 years now.

We have a large number of orphaned or redundant tables, views, and functions, due to many years of inadequate source management. 

We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "

Is there an effective way to identify these objects using the stats tables?  Something like a last accessed/used or some such column? 

Any suggestions welcomed.  

Thanks

Andrew Bartley
Aimstats Pty Ltd

Re: Redundant database objects.

From
Joe Conway
Date:
On 07/12/2010 02:40 PM, Andrew Bartley wrote:
> We have a large number of orphaned or redundant tables, views, and
> functions, due to many years of inadequate source management.
>
> We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
> gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "
>
> Is there an effective way to identify these objects using the stats
> tables?  Something like a last accessed/used or some such column?

Maybe pg_statio* views?
  http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support


Attachment

Re: Redundant database objects.

From
Andrew Bartley
Date:
Thanks Joe,

Unfortunately these views only give me what appears to be a certain time frame.  This does not help all that much.  It will give a list of tables, indexes and sequences that have been used in the time frame, so that is at least a start.

It would be good if there was a timestamp (last accessed) that would give me a clearer indication.

Thanks

Andrew        

On 13 July 2010 08:46, Joe Conway <mail@joeconway.com> wrote:
On 07/12/2010 02:40 PM, Andrew Bartley wrote:
> We have a large number of orphaned or redundant tables, views, and
> functions, due to many years of inadequate source management.
>
> We are running " PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
> gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 "
>
> Is there an effective way to identify these objects using the stats
> tables?  Something like a last accessed/used or some such column?

Maybe pg_statio* views?
 http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support


Re: Redundant database objects.

From
Greg Smith
Date:
Andrew Bartley wrote:
> Unfortunately these views only give me what appears to be a certain
> time frame.  This does not help all that much.  It will give a list of
> tables, indexes and sequences that have been used in the time frame,
> so that is at least a start.

You can use pg_stat_reset() to set those back to 0 again and then see
what actually gets used moving forward from the point you do that.
That's a reasonable idea to do anyway to make all those statistics
better reflect recent activity rather than historical.  Just be warned
that it will screw up many monitoring systems if you have them pointed
toward those statistics tables and grabbing snapshots, some will view
the reset as the values going negative which doesn't make any real-world
sense.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Redundant database objects.

From
Andrew Bartley
Date:
Thanks Greg,

It seems that the underlying stats tables are reset on a periodic basis, can i stop this process? Is it a .conf setting?  I have had a good look around, nothing sticks out. If I can stop it, then i could use pg_stat_reset() then monitor the stat views over an extended period without them being reset by some periodic job.  That, at the moment, is my main concern. 

Thanks again.

Also i need to find similar information regarding functions and views....  Any suggestions?

Thanks 

Andrew Bartley


On 13 July 2010 09:45, Greg Smith <greg@2ndquadrant.com> wrote:
Andrew Bartley wrote:
Unfortunately these views only give me what appears to be a certain time frame.  This does not help all that much.  It will give a list of tables, indexes and sequences that have been used in the time frame, so that is at least a start.

You can use pg_stat_reset() to set those back to 0 again and then see what actually gets used moving forward from the point you do that.  That's a reasonable idea to do anyway to make all those statistics better reflect recent activity rather than historical.  Just be warned that it will screw up many monitoring systems if you have them pointed toward those statistics tables and grabbing snapshots, some will view the reset as the values going negative which doesn't make any real-world sense.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Redundant database objects.

From
Alexander Pyhalov
Date:
Hello.
When we moved old projects from postgresql 7.x to 8.4, I just looked at
modification time for files in base/<dboid>/<toid> . So, I could
determine, that some databases were inactive (precisely,not updated) for
about a year and move them to archive...
Andrew Bartley wrote:

> Is there an effective way to identify these objects using the stats
> tables?  Something like a last accessed/used or some such column?
>
> Any suggestions welcomed.
>
> Thanks
>
> Andrew Bartley
> Aimstats Pty Ltd
>


--
С уважением,
Александр Пыхалов,
системный администратор ЮГИНФО ЮФУ.


Re: Redundant database objects.

From
Andrew Bartley
Date:
Thanks Alexander,

Wish i had thought of that.

I still need some way of finding redundant functions

Thanks again

Andrew


On 13 July 2010 15:38, Alexander Pyhalov <alp@rsu.ru> wrote:
Hello.
When we moved old projects from postgresql 7.x to 8.4, I just looked at modification time for files in base/<dboid>/<toid> . So, I could determine, that some databases were inactive (precisely,not updated) for about a year and move them to archive...

Andrew Bartley wrote:

Is there an effective way to identify these objects using the stats tables?  Something like a last accessed/used or some such column?
Any suggestions welcomed.  
Thanks

Andrew Bartley
Aimstats Pty Ltd



--
С уважением,
Александр Пыхалов,
системный администратор ЮГИНФО ЮФУ.


Re: Redundant database objects.

From
Joe Conway
Date:
On 07/12/2010 11:07 PM, Andrew Bartley wrote:
>
> I still need some way of finding redundant functions

A bit of a blunt instrument, but you could log all statements for a
while, and then grep through the logs using a list of all functions of
interest to see which ones never show up. Be wary of the performance hit
and rapidly growing log though. Another idea would be to modify a copy
of 8.3.x source code (I think that's what you said you were on in an
earlier post) to emit a NOTICE with a name whenever a function is called
if it meets some criteria.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support


Attachment

Re: Redundant database objects.

From
Pavel Stehule
Date:
Hello

2010/7/13 Andrew Bartley <ambartley@gmail.com>:
> Thanks Alexander,
> Wish i had thought of that.
> I still need some way of finding redundant functions
> Thanks again
> Andrew
>

I used a function source code injection for this task

see
http://www.postgres.cz/index.php/Injekt%C3%A1%C5%BE_zdrojov%C3%A9ho_k%C3%B3du_PL/pgSQL_funkc%C3%AD_za_%C3%BA%C4%8Delem_z%C3%ADsk%C3%A1n%C3%AD_seznamu_pou%C5%BE%C3%ADvan%C3%BDch_funkc%C3%AD

article is czech, but comments in code are in english (I used it in 8.3)

you can use a PL profiler too

http://www.depesz.com/index.php/2008/05/15/waiting-for-84-function-stats/

Regards

Pavel Stehule

Re: Redundant database objects.

From
Greg Smith
Date:
Andrew Bartley wrote:
> It seems that the underlying stats tables are reset on a periodic
> basis, can i stop this process? Is it a .conf setting?

Up until PostgreSQL 8.2 there's a setting named
stats_reset_on_server_start that clears everything when the server
stops:
http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.html

If you're on that version or earlier and it's turned on, there's your
problem.  This went away in 8.3.

> Also i need to find similar information regarding functions and
> views....  Any suggestions?

Some suggestions already popped up here for functions.  Views are
tougher because they essentially work like a macro substitution:  the
content of the view gets substituted into the query where it appears,
and off the query planner goes.  That's why there's no statistics about
them, they don't actually exist as objects that things are executed
against.  I don't know of any way to track their use other than to log
all your queries and look for them popping up.  A grep against the
application source code for them can be useful too.

The flip side to that is that eliminating views doesn't really improve
performance, so it's rarely a top priority to get rid of them--unlike
unused indexes for example.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Redundant database objects.

From
Andrew Bartley
Date:
Thanks to all that replied,

I used Joe Conway's suggestion, using grep and an extracted list of tables, functions and views form the DB.  It worked very well.

I will attach the code I used to this thread once complete.

Again Thanks

Andrew Bartley 

On 14 July 2010 00:43, Greg Smith <greg@2ndquadrant.com> wrote:
Andrew Bartley wrote:
It seems that the underlying stats tables are reset on a periodic basis, can i stop this process? Is it a .conf setting?

Up until PostgreSQL 8.2 there's a setting named stats_reset_on_server_start that clears everything when the server stops:  http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.html

If you're on that version or earlier and it's turned on, there's your problem.  This went away in 8.3.


Also i need to find similar information regarding functions and views....  Any suggestions?

Some suggestions already popped up here for functions.  Views are tougher because they essentially work like a macro substitution:  the content of the view gets substituted into the query where it appears, and off the query planner goes.  That's why there's no statistics about them, they don't actually exist as objects that things are executed against.  I don't know of any way to track their use other than to log all your queries and look for them popping up.  A grep against the application source code for them can be useful too.

The flip side to that is that eliminating views doesn't really improve performance, so it's rarely a top priority to get rid of them--unlike unused indexes for example.
--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: Redundant database objects.

From
Allan Kamau
Date:
On Thu, Jul 15, 2010 at 1:04 AM, Andrew Bartley <ambartley@gmail.com> wrote:
> Thanks to all that replied,
> I used Joe Conway's suggestion, using grep and an extracted list of tables,
> functions and views form the DB.  It worked very well.
>
> I will attach the code I used to this thread once complete.
> Again Thanks
> Andrew Bartley
>
> On 14 July 2010 00:43, Greg Smith <greg@2ndquadrant.com> wrote:
>>
>> Andrew Bartley wrote:
>>>
>>> It seems that the underlying stats tables are reset on a periodic basis,
>>> can i stop this process? Is it a .conf setting?
>>
>> Up until PostgreSQL 8.2 there's a setting named
>> stats_reset_on_server_start that clears everything when the server stops:
>>  http://www.postgresql.org/docs/8.2/static/runtime-config-statistics.html
>>
>> If you're on that version or earlier and it's turned on, there's your
>> problem.  This went away in 8.3.
>>
>>> Also i need to find similar information regarding functions and views....
>>>  Any suggestions?
>>
>> Some suggestions already popped up here for functions.  Views are tougher
>> because they essentially work like a macro substitution:  the content of the
>> view gets substituted into the query where it appears, and off the query
>> planner goes.  That's why there's no statistics about them, they don't
>> actually exist as objects that things are executed against.  I don't know of
>> any way to track their use other than to log all your queries and look for
>> them popping up.  A grep against the application source code for them can be
>> useful too.
>>
>> The flip side to that is that eliminating views doesn't really improve
>> performance, so it's rarely a top priority to get rid of them--unlike unused
>> indexes for example.
>> --
>> Greg Smith  2ndQuadrant US  Baltimore, MD
>> PostgreSQL Training, Services and Support
>> greg@2ndQuadrant.com   www.2ndQuadrant.us
>>
>
>



After some time of looking for a solution to a similar problem I came
up with the following probable solution.
1)Enable logging to CSV format.
2)Then log all queries, insert, update, deletion statements for the
given cluster.
3)Run your application for a period sufficient to have captured the
execution of all possible functions and the deletes, inserts and
updates to the tables and sequences.
4)Create a table in other database that conforms to the the field
structure of CSV logging.
5)Populate this table with the contents of your CSV file.
6)Now (I think) all the database objects invoked or used in anyway
(not sure about nested function calls though) will be listed in this
table. So now you may query the appropriate field(s) to find out the
"active" database objects.

Since you are interested in weeding out the "inactive" objects and
while obviously not loosing data or useful database objects I was
thinking of the additional steps
7)After working hours stop your application(s) and somehow perform a
database dump and restore (to other brand new database) on only these
"active" database objects (maybe using --table=<activetable1>
--table=<activetable..n>). Other database objects appearing in your
"active database objects list" may require manual creation on your new
database.
8)Rename the original database to other suitable label and rename the
new database to the original name of the original database.
9)Start your application, perform some investigations to see if all is well.

Allan.