Re: Recommendation to run vacuum FULL in parallel - Mailing list pgsql-general

From Perumal Raj
Subject Re: Recommendation to run vacuum FULL in parallel
Date
Msg-id CALvqh4rATReNKQLmrU0xHe5tpr_EFO5h2-n-w-_QFwYmdGVbbg@mail.gmail.com
Whole thread Raw
In response to RE: Recommendation to run vacuum FULL in parallel  (Kevin Brannen <KBrannen@efji.com>)
Responses Re: Recommendation to run vacuum FULL in parallel  (Ron <ronljohnsonjr@gmail.com>)
List pgsql-general
Thanks Kevin for the inputs,

In my Case there are 500+ Tables and biggest chunk 30GB ( Table only) + its indexes.
So i have created 6 batches and executed in parallel . All my scripts completed in 2 Hours and my DB size came down from 500GB to 300GB.

Yes i do see CPU spike, But i did whole activity with full apps down time.

Going forward i am going to run vacuum daily basis to maintain the DB size.

Also Table/DB Age came down drastically.

Thanks
Raj

On Thu, Apr 4, 2019 at 12:53 PM Kevin Brannen <KBrannen@efji.com> wrote:

From: Perumal Raj <perucinci@gmail.com>

So conclude the requirement here , The only way to parallelism is multiple script. And no need to do REINDEX exclusively.

Question : Do we need to consider  Table dependencies while preparing script in order to avoid table locks during vacuum full ?

 

We have a small bash script (see below) that get the list of tables and their sizes, sorted smallest to largest, and do “vacuum full” one at a time because (as someone else pointed out) this is very I/O intensive. That order also helps to ensure we finish because some of our installs are at the edge of running out of space (an issue we’re dealing with). I probably wouldn’t have a problem doing 2 at a time, but we do this in the middle of the night when activity is lowest and it only takes 1-2 hours, so we’re good with it. It sounds like you have a lot more data though.

 

You might also consider putting the data into different tablespaces which are spread over multiple disks to help I/O. If you can, use SSD drives, they help with speed quite a bit. 😊

 

Don’t worry about table dependencies. This is a physical operation, not a data operation.

 

HTH,

Kevin

 

    $PGPATH/psql -t -c "

        WITH s AS (SELECT nspname || '.' || relname AS TABLE_NAME, pg_total_relation_size(c.oid) AS total_bytes

                  FROM pg_class c

                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace

                  WHERE relkind = 'r' AND nspname NOT IN ( 'pg_catalog', 'information_schema' )

                  ORDER BY 2 )

        SELECT table_name FROM s

        " |

    while read t ; do echo "" ; echo $t; $PGPATH/vacuumdb -w -z -f -t $t ; done

###

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

pgsql-general by date:

Previous
From: Prakash Ramakrishnan
Date:
Subject: Re: os upgrade 7.3 to 7.5 (postgres version 10.5)
Next
From: Ron
Date:
Subject: Re: Recommendation to run vacuum FULL in parallel