Re: bloated postgres data folder, clean up - Mailing list pgsql-general

From Julien Rouhaud
Subject Re: bloated postgres data folder, clean up
Date
Msg-id 56D88C20.6010008@dalibo.com
Whole thread Raw
In response to Re: bloated postgres data folder, clean up  (Rémi Cura <remi.cura@gmail.com>)
Responses Re: bloated postgres data folder, clean up  (Rémi Cura <remi.cura@gmail.com>)
List pgsql-general
On 03/03/2016 18:15, Rémi Cura wrote:
> Hey,

Hello Rémi,

> first I forgot something in the querry to remove the annoying .XXX :
> -----------
> SELECT distinct substring(file_name from '\d+' )
> FROM find_useless_postgres_file('your_database_name') ;
> -----------
>

Thanks for working on this :)

I added a comment on the gist URL you provided. It's a simplified
version of the main query that should work fine and detect more orphan
files. Double checking it would be a good idea though.

Also, as you can't check other databases than the one you're connected
to, I used current_database() instead of user defined database name.

It's also important to warn that all of this only work for finding
orphan files on the default tablespace (and to never blindly remove
files in the PGDATA of course).

> Now it seems you do everything all right,
> with a slight confusion between bloating and useless files.
>
> From what I understood, bloating is when postgres creates lots of file
>  as a cache for a table that once was big, or saw a lot of actions.
> Still, the created files are referenced by postgres and postgres might
> use them.
> They are _not_ to be deleted, and are _not_ useless.
> You can reduce the number by `VACUUM FULL`
>
> On the other hand, you may end up with useless files in your file system
> that are _not_ referenced by postgres in any way,
> and are basically trash, uselss for anything.
> (broken files)
> On way to produce those useless files is for example to launch a vacuum full
> (create a lot of files, then use those to clean table, then delete those
> files),
> and kill this vacuum full before it ends.
> There might by other causes but I'm not an expert.
>
> So to conclude, the fact that my function doesn't find useless file in
> you postgres database folder is rather good news,
> and by no mean imply that your tables are bloated (or not)
>
> Cheers
> Rémi-C
>
>
>
> 2016-03-03 8:31 GMT+01:00 Johnny Morano <johnny.morano@payon.com
> <mailto:johnny.morano@payon.com>>:
>
>     Hi,____
>
>     __ __
>
>     So, I still think the SQL function isnt really working well.____
>
>     Here’s what I did:____
>
>     __ __
>
>     /data/postgres # psql ____
>
>     __ __
>
>     postgres=# \copy ( select * from find_useless_postgres_file('live')
>     ) to /tmp/useless_files.csv delimiter ';' csv header;____
>
>     postgres=# \q____
>
>     __ __
>
>     /data/postgres # wc -l /tmp/useless_files.csv____
>
>     7422 /tmp/useless_files.csv____
>
>     __ __
>
>     # filter out the .## files, e.g.:____
>
>     #
>     48175847.37;/base/16398/48175847.37;/data/postgres/base/16398/48175847.37;1047420928____
>
>     #
>     48175847.36;/base/16398/48175847.36;/data/postgres/base/16398/48175847.36;1073741824____
>
>     #
>     48175847.35;/base/16398/48175847.35;/data/postgres/base/16398/48175847.35;1073741824____
>
>     #
>     48175847.34;/base/16398/48175847.34;/data/postgres/base/16398/48175847.34;1073741824____
>
>     #
>     48175847.33;/base/16398/48175847.33;/data/postgres/base/16398/48175847.33;1073741824____
>
>     #
>     48175847.32;/base/16398/48175847.32;/data/postgres/base/16398/48175847.32;1073741824____
>
>     #
>     48175847.31;/base/16398/48175847.31;/data/postgres/base/16398/48175847.31;1073741824____
>
>     #
>     48175847.30;/base/16398/48175847.30;/data/postgres/base/16398/48175847.30;1073741824____
>
>     #____
>
>     # because oid2name doesn't like them, gives error:____
>
>     # /data/postgres # oid2name -f 48175847.30 -i -S -q -d live____
>
>     # oid2name: query failed: ERROR:  invalid input syntax for type oid:
>     "48175847.30"____
>
>     # LINE 11:   (pg_catalog.pg_relation_filenode(c.oid) IN
>     ('48175847.30')...____
>
>     #                                                        ^____
>
>     # ____
>
>     # oid2name: query was: SELECT pg_catalog.pg_relation_filenode(c.oid)
>     as "Filenode", relname as "Table Name" ____
>
>     # FROM pg_catalog.pg_class c ____
>
>     #      LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
>     ____
>
>     #      LEFT JOIN pg_catalog.pg_database d ON d.datname =
>     pg_catalog.current_database(),____
>
>     #      pg_catalog.pg_tablespace t ____
>
>     # WHERE relkind IN ('r', 'm', 'i', 'S', 't') AND ____
>
>     #            t.oid = CASE____
>
>     #                   WHEN reltablespace <> 0 THEN reltablespace____
>
>     #                   ELSE dattablespace____
>
>     #            END AND ____
>
>     #   (pg_catalog.pg_relation_filenode(c.oid) IN ('48175847.30')) ____
>
>     # ORDER BY relname____
>
>     __ __
>
>     /data/postgres # cut -d ';' -f1 /tmp/useless_files.csv | cut -d'.'
>     -f1 | sort -n | uniq >/tmp/potential_useless_oids.csv____
>
>     /data/postgres # wc -l /tmp/potential_useless_oids.csv____
>
>     1017 /tmp/potential_useless_oids.csv____
>
>     __ __
>
>     # get a list of all used oids, instead of examing one by one____
>
>     /data/postgres # oid2name -i -S -q -d live | awk '{print $1}' | sort
>     -n | uniq >/tmp/used_oids.csv____
>
>     /data/postgres # wc -l /tmp/used_oids.csv ____
>
>     940 /tmp/used_oids.csv____
>
>     __ __
>
>     /data/postgres # while read i; do grep $i /tmp/used_oids.csv
>     >/dev/null || (echo "$i" >>/tmp/not_in_use_oids.csv); done <
>     /tmp/potential_useless_oids.csv____
>
>     /data/postgres # wc -l /tmp/not_in_use_oids.csv____
>
>     168 /tmp/not_in_use_oids.csv____
>
>     __ __
>
>     /data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv |wc -l____
>
>     1____
>
>     /data/postgres # egrep -v 'fsm|vm' /tmp/not_in_use_oids.csv       ____
>
>     file_name____
>
>     # The CSV header only____
>
>     __ __
>
>     So, no bloated files.____
>
>     But, using the function from
>     https://wiki.postgresql.org/wiki/Show_database_bloat I get:____
>
>     __ __
>
>     live=# select tbloat,wasted_space from table_bloat order by
>     wasted_space desc limit 25;____
>
>     tbloat │ wasted_space ____
>
>     ────────┼──────────────____
>
>         1.0 │ 9976 kB____
>
>         1.2 │ 98 GB____
>
>         1.0 │ 97 MB____
>
>         1.4 │ 96 kB____
>
>         1.2 │ 920 kB____
>
>         1.2 │ 88 kB____
>
>         1.1 │ 88 kB____
>
>         2.0 │ 8192 bytes____
>
>         0.0 │ 8192 bytes____
>
>         1.3 │ 8192 bytes____
>
>         2.0 │ 8192 bytes____
>
>         1.3 │ 8192 bytes____
>
>         1.5 │ 8192 bytes____
>
>         1.5 │ 8192 bytes____
>
>         2.0 │ 8192 bytes____
>
>         1.1 │ 8192 bytes____
>
>         1.0 │ 8192 bytes____
>
>         1.1 │ 8192 bytes____
>
>         1.3 │ 8192 bytes____
>
>         1.5 │ 8192 bytes____
>
>         1.1 │ 80 kB____
>
>         1.0 │ 7584 kB____
>
>         1.6 │ 71 MB____
>
>         1.0 │ 704 kB____
>
>         1.1 │ 6968 kB____
>
>     (25 rows)____
>
>     __ __
>
>     So actually, quite a lot of bloated data J____
>
>     What am I doing wrong?____
>
>     __ __
>
>     Mit freundlichen Grüßen / With kind regards,____
>
>     Johnny Morano____
>
>     ________________________________________________________
>
>     __ __
>
>     *Johnny Morano  |  Principal Systems Engineer____*
>
>     __ __
>
>     PAY.ON GmbH  |  AN ACI WORLDWIDE COMPANY  |  WWW.PAYON.COM
>     <http://www.payon.com/>____
>
>     Jakob-Haringer-Str. 1  |  5020 Salzburg  |  Austria____
>
>     __ __
>
>     This email message and any attachments may contain confidential,
>     proprietary or non-public information. This information is intended
>     solely for the designated recipient(s). If an addressing or
>     transmission error has misdirected this email, please notify the
>     sender immediately and destroy this email. Any review,
>     dissemination, use or reliance upon this information by unintended
>     recipients is prohibited. Any opinions expressed in this email are
>     those of the author personally.____
>
>     __ __
>
>     *From:*Rémi Cura [mailto:remi.cura@gmail.com
>     <mailto:remi.cura@gmail.com>]
>     *Sent:* Mittwoch, 2. März 2016 17:49
>     *To:* Johnny Morano
>     *Cc:* Alvaro Herrera; PostgreSQL General
>
>
>     *Subject:* Re: [GENERAL] bloated postgres data folder, clean up____
>
>     __ __
>
>     Hey,____
>
>     this is quite the *opposite*.____
>
>     The function find files in the postgres database folder that are not
>     used by the database.____
>
>     To use it : ____
>
>      * connect to the database you want to analyse ( **mandatory** ).____
>
>      * create the function (execute function definition)____
>
>      * Execute `SELECT * FROM
>     find_useless_postgres_file('your_database_name')`____
>
>
>     This will output a list of files that are on the disk but not used
>     by postgres,____
>
>     and so can be removed.____
>
>     To be extra sure, you should use oid2name programme to check that
>     the useless files are really useless.
>
>
>     For this :
>      * output the list of potential useless files with copy for instance
>       ex :
>       COPY ( SELECT file_name____
>
>              FROM find_useless_postgres_file('your_database_name')____
>
>        ) TO 'path_to_you_database_folder/potential_useless.txt'____
>
>        now you've got a file with a list of potential erroneous files.____
>
>      * Then use oid2name____
>
>       `$su postgres
>        $cd path_to_you_database_folder____
>
>        $while read i; do oid2name -f "$i" -i -S -q -d
>     your_database_name; done < potential_useless.txt
>       `____
>
>       Nothing should show, meaning that every potential erroneous file
>        has not been recognized by oid2name !____
>
>       If you feel unconvinced, you can manually try oid2name on some
>        of the potential erroneous files, to be extra sure.
>       It should not find anything.____
>
>     __ __
>
>      * Now delete all the files in `potential_useless.txt`.____
>
>       It could be wiser to not delete the files but rename those____
>
>       (for instance, adding `.potentially_useless` as a postfix)____
>
>       so if it breaks something, you have an easy way to revert
>     everything.____
>
>     __ __
>
>     Anyway, use *-*extra extra*-* caution if you delete.
>     Except a backup, there would be no easy way to correct a mistake.____
>
>     Cheers,____
>
>     Rémi-C____
>
>     __ __
>
>     2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.morano@payon.com
>     <mailto:johnny.morano@payon.com>>:____
>
>     Hi Remi!____
>
>      ____
>
>     This SQL function you have provided, seems to return all valid
>     files, is that correct? In my case, it returned all my ‘base/’
>     files. Is that normal?____
>
>     If yes, maybe you rename the function to
>     ‘find_useful_postgres_files’ ;-)____
>
>      ____
>
>     Could you explain in steps how to use this function to make a
>     cleanup of bloated data? (like in an example with commands and
>     example output, if possible of course)____
>
>      ____
>
>     Thanks!____
>
>      ____
>
>      ____
>
>     Mit besten Grüßen / With best regards,____
>
>     Johnny Morano____
>
>     ________________________________________________________
>
>      ____
>
>     *Johnny Morano  | Principal Systems Engineer*____
>
>      ____
>
>     PAY.ON GmbH  |  AN ACI WORLDWIDE COMPANY  |  WWW.PAYON.COM
>     <http://www.payon.com/>____
>
>     Jakob-Haringer-Str. 1  |  5020 Salzburg  |  Austria____
>
>     Registered at: LG Salzburg  |  Company number: FN 315081 f  |
>     VAT-ID: ATU64439405____
>
>     Managing Director: Christian Bamberger____
>
>      ____
>
>      ____
>
>     Follow us on: ____
>
>      ____
>
>     cid:image001.jpg@01D126D0.E1AB0670
>     <http://blog.payon.com/>  cid:image002.jpg@01D126D0.E1AB0670
>     <http://www.linkedin.com/company/146260?trk=tyah>  cid:image003.jpg@01D126D0.E1AB0670
>     <https://twitter.com/PAYON_com>  ____
>
>      ____
>
>     This email message and any attachments may contain confidential,
>     proprietary or non-public information. This information is intended
>     solely for the designated recipient(s). If an addressing or
>     transmission error has misdirected this email, please notify the
>     sender immediately and destroy this email. Any review,
>     dissemination, use or reliance upon this information by unintended
>     recipients is prohibited. Any opinions expressed in this email are
>     those of the author personally.____
>
>      ____
>
>     This message and any attachments have been scanned for viruses prior
>     leaving PAY.ON; however, PAY.ON does not guarantee the security of
>     this message and will not be responsible for any damages arising as
>     a result of any virus being passed on or arising from any alteration
>     of this message by a third party. PAY.ON may monitor e-mails sent to
>     and from PAY.ON.____
>
>      ____
>
>      ____
>
>      ____
>
>      ____
>
>     *From:*pgsql-general-owner@postgresql.org
>     <mailto:pgsql-general-owner@postgresql.org>
>     [mailto:pgsql-general-owner@postgresql.org
>     <mailto:pgsql-general-owner@postgresql.org>] *On Behalf Of *Rémi Cura
>     *Sent:* Mittwoch, 2. März 2016 14:58
>     *To:* Alvaro Herrera
>     *Cc:* PostgreSQL General
>     *Subject:* Re: [GENERAL] bloated postgres data folder, clean up____
>
>      ____
>
>     Would gladly do it,____
>
>     but still this "wiki cooloff" stuff,____
>
>     can't create a page____
>
>     Cheers,____
>
>     Rémi-C____
>
>      ____
>
>     2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com
>     <mailto:alvherre@2ndquadrant.com>>:____
>
>     Rémi Cura wrote:
>     > Hey dear list,
>     > after a fex years of experiments and crash,
>     > I ended up with a grossly bloated postgres folder.
>     > I had about 8 Go of useless files.
>
>     Would you add a new page to the wiki with this?
>
>     https://wiki.postgresql.org/wiki/Category:Administrative_Snippets
>
>     --
>     Álvaro Herrera                http://www.2ndQuadrant.com/
>     PostgreSQL Development, 24x7 Support, Remote DBA, Training &
>     Services____
>
>      ____
>
>     __ __
>
>


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: Re: could not migrate 8.0.13 database with large object data to 9.5.1
Next
From: Adrian Klaver
Date:
Subject: Re: PLPythonu for production server