Thread: slow queries after ANALYZE

slow queries after ANALYZE

From
DW
Date:
Hello,

I'm perplexed. I'm trying to find out why some queries are taking a long
time, and have found that after running analyze,  one particular query
becomes slow.

This query is based on a view that is based on multiple left outer joins
to merge data from lots of tables.

If I drop the database and reload it from a dump, the query result is
instaneous (less than one second).

But after I run analyze, it then takes much longer to run -- about 10
seconds, give or take a few depending on the hardware I'm testing it on.
Earlier today, it was taking almost 30 seconds on the actual production
server -- I restarted pgsql server and the time got knocked down to
about 10 seconds -- another thing I don't understand.

I've run the query a number of times before and after running analyze,
and the problem reproduces everytime. I also ran with "explain", and saw
that the costs go up dramatically after I run analyze.

I'm fairly new to postgresql and not very experienced as a db admin to
begin with, but it looks like I'm going to have to get smarter about
this stuff fast, unless it's something the programmers need to deal with
when constructing their code and queries or designing the databases.

I've already learned that I've commited the cardinal sin of configuring
my new database server with RAID 5 instead of something more sensible
for databases like 0+1, but I've been testing out and replicating this
problem on different hardware, so I know that this issue is not the
direct cause of this.

Thanks for any info. I can supply more info (like config files, schemas,
etc.) if you think it might help. But I though I would just describe the
problem for starters.

-DW


Re: slow queries after ANALYZE

From
Tom Lane
Date:
DW <dwinner-lists@att.net> writes:
> I'm perplexed. I'm trying to find out why some queries are taking a long
> time, and have found that after running analyze,  one particular query
> becomes slow.

This implies that the planner's default choice of plan (without any
statistics) is better than its choice when informed by statistics.
This is undesirable but not unheard of :-(

It would be interesting to see EXPLAIN ANALYZE results in both cases,
plus the contents of the relevant pg_stats rows.  (BTW, you need not
dump and reload to get back to the virgin state --- just delete the
relevant rows from pg_statistic.)  Also we'd want to know exactly what
PG version this is, and on what sort of platform.

You might be able to fix things by increasing the statistics targets or
tweaking planner cost parameters, but it'd be best to investigate before
trying to fix.

            regards, tom lane

Re: slow queries after ANALYZE

From
DW
Date:
Tom Lane wrote:

> It would be interesting to see EXPLAIN ANALYZE results in both cases,
> plus the contents of the relevant pg_stats rows.  (BTW, you need not
> dump and reload to get back to the virgin state --- just delete the
> relevant rows from pg_statistic.)  Also we'd want to know exactly what
> PG version this is, and on what sort of platform.
>

Thanks for replying. I've got a message into to my team asking if I need
to de-identify some of the table names before I go submitting output to
a public mailing list.

In the meantime, again I'm new to this -- I got pg_stats; which rows are
  the relevent ones?

Also, I am running postgresql-server-7.4.9 from FreeBSD port (with
optimized CFLAGS turned on during compiling)

OS: FreeBSD 5.4 p8

Thanks,
DW


Re: slow queries after ANALYZE

From
Tom Lane
Date:
DW <dwinner-lists@att.net> writes:
> In the meantime, again I'm new to this -- I got pg_stats; which rows are
>   the relevent ones?

The ones for columns that are mentioned in the problem query.
I don't think you need to worry about columns used only in the SELECT
output list, but anything used in WHERE, GROUP BY, etc is interesting.

> Also, I am running postgresql-server-7.4.9 from FreeBSD port (with
> optimized CFLAGS turned on during compiling)
> OS: FreeBSD 5.4 p8

The hardware environment (particularly disks/filesystems) is relevant
too.

            regards, tom lane

Re: slow queries after ANALYZE

From
hubert depesz lubaczewski
Date:
On 11/11/05, DW <dwinner-lists@att.net> wrote:
I'm perplexed. I'm trying to find out why some queries are taking a long
time, and have found that after running analyze,  one particular query
becomes slow.

i have had exactly the same problem very recently.
what helped? increasing statistics on come column.
which ones?
make:
explain analyze <your select>;
and check in which situations you gget the biggest change of "estiamted rows" and "actual rows".
then check what this particular part of your statement is touching, and increase appropriate statistics.

depesz

Re: slow queries after ANALYZE

From
DW
Date:
DW wrote:
> Hello,
>
> I'm perplexed. I'm trying to find out why some queries are taking a long
> time, and have found that after running analyze,  one particular query
> becomes slow.
>
> This query is based on a view that is based on multiple left outer joins
> to merge data from lots of tables.
>
> If I drop the database and reload it from a dump, the query result is
> instaneous (less than one second).
>
> But after I run analyze, it then takes much longer to run -- about 10
> seconds, give or take a few depending on the hardware I'm testing it on.
> Earlier today, it was taking almost 30 seconds on the actual production
> server -- I restarted pgsql server and the time got knocked down to
> about 10 seconds -- another thing I don't understand.
>
> I've run the query a number of times before and after running analyze,
> and the problem reproduces everytime. I also ran with "explain", and saw
> that the costs go up dramatically after I run analyze.
>
> I'm fairly new to postgresql and not very experienced as a db admin to
> begin with, but it looks like I'm going to have to get smarter about
> this stuff fast, unless it's something the programmers need to deal with
> when constructing their code and queries or designing the databases.
>
> I've already learned that I've commited the cardinal sin of configuring
> my new database server with RAID 5 instead of something more sensible
> for databases like 0+1, but I've been testing out and replicating this
> problem on different hardware, so I know that this issue is not the
> direct cause of this.
>
> Thanks for any info. I can supply more info (like config files, schemas,
> etc.) if you think it might help. But I though I would just describe the
> problem for starters.
>
> -DW
>
Well, for whatever it's worth, on my test box, I upgraded from postgreql
7.4.9 to 8.1, and that seems to make all the difference in the world.

These complex queries are instantaneous, and the query planner when I
run EXPLAIN ANALYZE both before and after running ANALYZE displays
results more in line with what is expected (< 60ms).

Whatever changes were introduced  in 8.x seems to make a huge improvment
in query performance.




>