Thread: Answering my own question

Answering my own question

From

gerry.smit@lombard.ca

Date:

15 May 2002, 13:49:37

I found, on http://www.argudo.org/postgresql/soft-tuning-10.html  my
answer. (Excellent site btw)  What I need to do is prefix my SELECT
statements for this table with the statement :

SET enable_seqscan='false';

as in :

SET enable_seqscan='false';SELECT * FROM pol_xref_d WHERE policy_no=1200079
ORDER BY sequence;


This change makes the response sub-second , a great improvement over the
11+ seconds we're currently experiencing.

I'ld love any explanation of why the Query plan was going sequential versus
indexed.

Gerry

Re: Answering my own question

From

gerry.smit@lombard.ca

Date:

15 May 2002, 13:57:47

Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
months ago , and only got this problem after that. I leave "ANALYZE" out of
VACUUM due to the large volume of (I though needless) output that gets
piped to my cron_log.

Does anyone have a feal for how "necessary" VACUUM ANALYZE is over "VACUUM"
??

Gerry

                      ghaverla@freenet.edm

                      onton.ab.ca                  To:      gerry.smit@lombard.ca

                                                   cc:      pgsql-novice@postgresql.org

                      15/05/2002 01:50 PM          Fax to:

                                                   Subject: Re: [NOVICE] Answering my own question

On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> I found, on http://www.argudo.org/postgresql/soft-tuning-10.html  my
> answer. (Excellent site btw)  What I need to do is prefix my SELECT
> statements for this table with the statement :
>
> SET enable_seqscan='false';
>
> as in :
>
> SET enable_seqscan='false';SELECT * FROM pol_xref_d WHERE
policy_no=1200079
> ORDER BY sequence;

Some else mentioned ANALYZE (or is it ANALYSE?).  Anyway,
I seem to remember reading someplace that with newer versions
of PostgreSQL, when you used to VACUUM on a regular basis,
you were supposed to change over to VACUUM ANALYSE (ANALYZE?)
some (all?) of the time.  My memory is fuzzy about this,
so perhaps the docs will have something on this.

Gord

Matter Realisations     http://www.materialisations.com/
Gordon Haverland, B.Sc. M.Eng. President
101  9504 182 St. NW    Edmonton, AB, CA  T5T 3A7
780/481-8019            ghaverla @ freenet.edmonton.ab.ca
780/993-1274 (cell)

Re: Answering my own question

From

Francisco Reyes

Date:

15 May 2002, 20:03:29

On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
> months ago , and only got this problem after that. I leave "ANALYZE" out of
> VACUUM due to the large volume of (I though needless) output that gets
> piped to my cron_log.

Display of activities is controllable through settings.
It is also somewhat silly to not run analyze just because it displays too
much data in a log.

> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over "VACUUM"
> Gerry

Like many other things in live the answer is: it depends.
Vacuum is about been able to reuse space which was used by rows that have
been deleted.

Vacuum Full is about recovering the disk space used by deleted rows.

Analyze is about re-computing statistics that the analyzer uses to
determine best strategies.

The answer to your question will be directly related to how much and how
often new data is inserted in your database.

If you NEVER change your data or make inserts then you NEVER have to run
either vacuum or analyze.

If you have a very heavily loaded environment with hundreds/thousands
transactions per minute, then an hourly vacuum analyze would probably
help.

Why don you tell us more about your particular scenario?

For instance I do ONE big load every morning from a production Foxpro
system to my reporting PostgreSQL system. I do one vacuum analyze just
after the load, then I compute some stats and populate some files, then
another vacuum analyze. At 10pm I do a vacuum full. This works great for
me.

Before I was doing one vacuum analyze after the stats computations and it
was horrible. Something which takes to compute less than an hour if I have
done a vacuum analyze took about 4 hours without vacuum analyze and that
was just ONE of the things that needed to be done. The first vacuum
analyze takes about 50 minutes and the second takes about 8 minutes. A lot
less than the 4+ hours of savings from having run vacuum analyze vs doing
the stats computations without them.

Re: Answering my own question

From

gerry.smit@lombard.ca

Date:

15 May 2002, 20:30:56

Well, having copies of stats from the server.log obscure timing info in the
VACUUM log doesn't help. And the stats are fairly unreadable, at least to
this novice. :-)

vacuum - FULL is new for me, and I've been wondering about how to recycle
disk space. THANKS.

vacuum - ANALYZE - prodcing stats for the QUERY PLANner is also something I
wasn'tt aware of.

Here's the scenario. We have about 8 tables or so, in 4 regional databases.

- two of the tables are updated via on-line transaction from our users.
Probably a couple of hundred records a day , tops.
- records eligible for archiving are "SELECT INTO" and "DELETE"d
about once a quarter. typically 1000 or so.

- the rest of the tables are updated from the Mainframe batch cycle, at
about 3:00 am local time.
- these records NEVER get updated, just more added each day.

- the problem tables have about 800,000 records in them.
- the daily table getting 3000-5000 new records a day.
- once a month, records more than 12 months old are purged, no
archive required.
- the non-daily table is DROPd and rebuilt monthly.
Newly generated records are loaded into it from batch
processes that take about two days time , at the start of each month.
- thus the "high water" mark is highest just before monthend.
- daily volumes are slowly increasing over time, so the "high water
mark" is also going higher.

VACUUM is run nightly, at about 7:00 pm. This is to make the system ready
for UNIX back-ups, at about 9:30 pm. Also, in the past, the VACUUM has
taken hours, and seems to lock the database (not just the table). Thus
we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

The primary reason for the hours of VACUUM was the non-daily table , above,
back when we DELETEd most records, and then reloaded over two days.
Deleting 780,000 records of an 800,000 row table creates an IMMENSELY long
VACUUM. Hence our DROP and rebuild.

I suspect that what I should do is :

1) continue nightly VACUUM's at about 7:00 pm
2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
its ready for Sunday FULL disk back-ups.
3) I should run a VACUUM FULL ANALYZE tonight as a one-shot.
4) Consider adding the statement "SET enable_seqscan='false';" to the TWO
locations in our on-line system where the problem SELECT's are coded.
However, during my testing, SET returns the output "SET VARIABLE" which is
being interpretted as the first line of output from the accompanying SELECT
statement. Any ideas on how to make the SET statement "quiet" ?

Step 4 may not be necessary if Steps 1-3 do the job, btw.

Oh, we're a 16/5 operation, with some people dialing in or coming in on
weekends. 24/7 is NOT a requirement.

Gerry Smit,
Toronto, Ontario

"Francisco

Reyes" To: gerry.smit@lombard.ca

<lists@natserv.c cc: ghaverla@freenet.edmonton.ab.ca, pgsql-novice@postgresql.org

om> Fax to:

Subject: Re: [NOVICE] Answering my own question

15/05/2002 03:29

On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Now THAT's REALLY fascinating. We upgraded from 6.5 to 7.1 about 4-5
> months ago , and only got this problem after that. I leave "ANALYZE" out
of
> VACUUM due to the large volume of (I though needless) output that gets
> piped to my cron_log.

Display of activities is controllable through settings.
It is also somewhat silly to not run analyze just because it displays too
much data in a log.

> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
"VACUUM"
> Gerry

Like many other things in live the answer is: it depends.
Vacuum is about been able to reuse space which was used by rows that have
been deleted.

Vacuum Full is about recovering the disk space used by deleted rows.

Analyze is about re-computing statistics that the analyzer uses to
determine best strategies.

The answer to your question will be directly related to how much and how
often new data is inserted in your database.

If you NEVER change your data or make inserts then you NEVER have to run
either vacuum or analyze.

If you have a very heavily loaded environment with hundreds/thousands
transactions per minute, then an hourly vacuum analyze would probably
help.

Why don you tell us more about your particular scenario?

For instance I do ONE big load every morning from a production Foxpro
system to my reporting PostgreSQL system. I do one vacuum analyze just
after the load, then I compute some stats and populate some files, then
another vacuum analyze. At 10pm I do a vacuum full. This works great for
me.

Before I was doing one vacuum analyze after the stats computations and it
was horrible. Something which takes to compute less than an hour if I have
done a vacuum analyze took about 4 hours without vacuum analyze and that
was just ONE of the things that needed to be done. The first vacuum
analyze takes about 50 minutes and the second takes about 8 minutes. A lot
less than the 4+ hours of savings from having run vacuum analyze vs doing
the stats computations without them.

Re: Answering my own question

From

Manuel Sugawara

Date:

15 May 2002, 20:31:08

gerry.smit@lombard.ca writes:

> Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about
> 4-5 months ago , and only got this problem after that. I leave
> "ANALYZE" out of VACUUM due to the large volume of (I though
> needless) output that gets piped to my cron_log.
>
> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
> "VACUUM" ??

The analyze part records statistical information about your
data. Without it the planner doesn't have information to decide a
better plan.

BTW if you are still in 7.1 you should upgrade. 7.2 is smarter about
the statistical information it records and is able to make better
decisions.

Regards,
Manuel.

ps. yes: mx -> México

Re: Answering my own question

From

"Balazs Wellisch"

Date:

15 May 2002, 21:02:13

Hi all,

I'm embarrassed to even ask this, but this being a novice list here it goes.

What exactly is Vacuum? Is there an equivalent of this in MS SQL Server?

Thanks,
Balazs Wellisch

Re: Answering my own question

From

Chris

Date:

15 May 2002, 21:16:54

>Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
>months ago , and only got this problem after that. I leave "ANALYZE" out of
>VACUUM due to the large volume of (I though needless) output that gets
>piped to my cron_log.
>
>Does anyone have a feal for how "necessary" VACUUM ANALYZE is over "VACUUM"
>??

As you noticed, pretty necessary :)

A vacuum analyze updates the statistics used when the planner decides on
what action to take (whether it's a sequential scan, using an index etc).

A plain vacuum removes dead tuples from the system, not much else.


-----------------
      Chris Smith
http://www.squiz.net/

Re: Answering my own question

From

gerry.smit@lombard.ca

Date:

16 May 2002, 10:42:05

As in another post, VACUUM ANALYZE has fixed my index, optimer, query plan
problem.

A follow-up question I have is about VACUUM FULL.

There's no mention of "FULL" as an arguement to VACUUM in the 7.1.3
doc/html on-line reference manual. Was FULL added to a later release?

Gerry