Home > mailing lists

Re: Recommendation to run vacuum FULL in parallel - Mailing list pgsql-general

From	Perumal Raj
Subject	Re: Recommendation to run vacuum FULL in parallel
Date	April 11, 2019 04:49:42
Msg-id	CALvqh4rATReNKQLmrU0xHe5tpr_EFO5h2-n-w-_QFwYmdGVbbg@mail.gmail.com Whole thread Raw
In response to	RE: Recommendation to run vacuum FULL in parallel (Kevin Brannen <KBrannen@efji.com>)
Responses	Re: Recommendation to run vacuum FULL in parallel
List	pgsql-general

Tree view

Thanks Kevin for the inputs,

In my Case there are 500+ Tables and biggest chunk 30GB ( Table only) + its indexes.

So i have created 6 batches and executed in parallel . All my scripts completed in 2 Hours and my DB size came down from 500GB to 300GB.

Yes i do see CPU spike, But i did whole activity with full apps down time.

Going forward i am going to run vacuum daily basis to maintain the DB size.

Also Table/DB Age came down drastically.

Thanks

Raj

On Thu, Apr 4, 2019 at 12:53 PM Kevin Brannen <KBrannen@efji.com> wrote:

From: Perumal Raj <perucinci@gmail.com>

So conclude the requirement here , The only way to parallelism is multiple script. And no need to do REINDEX exclusively.
Question : Do we need to consider Table dependencies while preparing script in order to avoid table locks during vacuum full ?

We have a small bash script (see below) that get the list of tables and their sizes, sorted smallest to largest, and do “vacuum full” one at a time because (as someone else pointed out) this is very I/O intensive. That order also helps to ensure we finish because some of our installs are at the edge of running out of space (an issue we’re dealing with). I probably wouldn’t have a problem doing 2 at a time, but we do this in the middle of the night when activity is lowest and it only takes 1-2 hours, so we’re good with it. It sounds like you have a lot more data though.

You might also consider putting the data into different tablespaces which are spread over multiple disks to help I/O. If you can, use SSD drives, they help with speed quite a bit. 😊

Don’t worry about table dependencies. This is a physical operation, not a data operation.

HTH,
Kevin

    $PGPATH/psql -t -c "
        WITH s AS (SELECT nspname || '.' || relname AS TABLE_NAME, pg_total_relation_size(c.oid) AS total_bytes
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE relkind = 'r' AND nspname NOT IN ( 'pg_catalog', 'information_schema' )
                  ORDER BY 2 )
        SELECT table_name FROM s
        " |
    while read t ; do echo "" ; echo $t; $PGPATH/vacuumdb -w -z -f -t $t ; done
###
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

pgsql-general by date:

From: Prakash Ramakrishnan
Date: 11 April 2019, 04:31:02
Subject: Re: os upgrade 7.3 to 7.5 (postgres version 10.5)

From: Ron
Date: 11 April 2019, 07:01:50
Subject: Re: Recommendation to run vacuum FULL in parallel

Re: Recommendation to run vacuum FULL in parallel - Mailing list pgsql-general

Previous

Next