Re: Problem after VACUUM ANALYZE - Mailing list pgsql-general

From mljv@planwerk6.de
Subject Re: Problem after VACUUM ANALYZE
Date
Msg-id 200804111640.27044.mljv@planwerk6.de
Whole thread Raw
In response to Re: Problem after VACUUM ANALYZE  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Hi all, dear Richard,

your mail about my configuration parameter were the right hint, but i am still
struggling with the problem. i will appreciate if you or somebody else can
help me even further.

After some investigation i got some new results to my problem. The following
query is not working as it should and is my most important query:

# select * from SpielTipp natural join Tippspiel
# where tippspieltag_id=1254056;

- Spieltipp has about 80.000.000 records
- Tippspiel has about 10.000.000 records
- random_page_cost = 3

Both table have indexes, of course.  So there should be no seqscan in use, but
the planner is using a sequence scan:

# explain analyze
# select * from SpielTipp natural join Tippspiel
# where tippspieltag_id = 817372;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=333.93..1647646.04 rows=1383 width=28) (actual
time=104193.150..104193.150 rows=0 loops=1)
   Hash Cond: ("outer".tippspiel_id = "inner".tippspiel_id)
   ->  Seq Scan on spieltipp  (cost=0.00..1253846.52 rows=78690352 width=16)
(actual time=10.355..69195.235 rows=78690348 loops=1)
   ->  Hash  (cost=333.44..333.44 rows=198 width=16) (actual
time=44.821..44.821 rows=9 loops=1)
         ->  Index Scan using tippspiel_tippspieltag_id_key on tippspiel
(cost=0.00..333.44 rows=198 width=16) (actual time=44.798..44.809 rows=9
loops=1)
               Index Cond: (tippspieltag_id = 817372)
 Total runtime: 104193.209 ms
(7 rows)

------------------------------------------------------------------------------------------------------------------------------------------------------

just to see how wrong the plan is, i disabled seqscan:

# set enable_seqscan to off;
# explain analyze
# select * from SpielTipp natural join Tippspiel
# where tippspieltag_id = 817372;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..1682807.57 rows=1383 width=28) (actual
time=0.186..0.186 rows=0 loops=1)
   ->  Index Scan using tippspiel_tippspieltag_id_key on tippspiel
(cost=0.00..333.44 rows=198 width=16) (actual time=0.072..0.082 rows=9
loops=1)
         Index Cond: (tippspieltag_id = 817372)
   ->  Index Scan using ix_spieltipp_tippspiel_id on spieltipp
(cost=0.00..8458.83 rows=3081 width=16) (actual time=0.010..0.010 rows=0
loops=9)
         Index Cond: (spieltipp.tippspiel_id = "outer".tippspiel_id)
 Total runtime: 0.232 ms
(6 rows)

----------------------------------------------------------------------------------------------------------------------------------------------------

no i tried to lower random_page_cost:

# set enable_seqscan to on;
# set random_page_cost to 1.5;
# explain analyze
# select * from SpielTipp natural join Tippspiel
# where tippspieltag_id = 817372;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..731643.62 rows=1383 width=28) (actual
time=0.089..0.089 rows=0 loops=1)
   ->  Index Scan using tippspiel_tippspieltag_id_key on tippspiel
(cost=0.00..146.17 rows=198 width=16) (actual time=0.017..0.024 rows=9
loops=1)
         Index Cond: (tippspieltag_id = 817372)
   ->  Index Scan using ix_spieltipp_tippspiel_id on spieltipp
(cost=0.00..3655.92 rows=3081 width=16) (actual time=0.005..0.005 rows=0
loops=9)
         Index Cond: (spieltipp.tippspiel_id = "outer".tippspiel_id)
 Total runtime: 0.135 ms
(6 rows)

----------------------------------------------------------------------------------------------------------------------------------------------------

looks fine at first glance. but the total estimated cost of 731643 is still
far to high, right?

so what happened to me with a random_page_cost of 3 at my production server:
the estimated costs between a seq scan and an index scan are not too
different. So sometimes it will use a sequence scan after a fresh ANALYZE and
sometime not as the statistics vary across ANALYZE runs. so i had the problem
that the plan changed by running nightly ANALYZE on the database.

As i have 200-240 connections at peak time, so reading your advices and
annotated postgresql.conf, my conclusion is:

max_connections = 250
shared_buffers = 200000 # 1.6 GB = 20% of avail. RAM
work_mem = 20000
maintenance_work_mem = 160000
effective_cache_size = 600000 # 4.8 GB = 60% of avail. RAM
random_page_cost = 2

Are those settings reasonable for my box?
my box is:
- dedicated
- AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
- 3ware RAID 1 Controller with two rather cheap SATA disks
- 8 GB RAM

kind regards
Janning

Am Dienstag, 8. April 2008 17:40 schrieb Richard Huxton:
> > Here are some of our configuration parameters. We never really tweaked it
> > as it ran fine. We just raised some parameters. The following list should
> > show all parameters changed from the default:
> >
> > max_connections = 300
> > shared_buffers = 30000
> > work_mem = 10240
>
> OK, so that's 30,000 * 8KB = 240MB of shared_buffers
> You have 10MB of work_mem and if all 300 connections were using that
> much you'd have committed 3GB of your RAM for that. Of course they'll
> want more than just that.
>
> Do you really have 300 concurrent connections?
>
> > maintenance_work_mem = 163840
>
> 160MB for vacuums - should be OK given how much memory you have and the
> fact that it's quiet when you vacuum.
>
> > max_fsm_pages = 500000
>
> You can track at most 500,000 pages with free space on them. In 8.2+
> versions VACUUM VERBOSE will show you how many are currently being used.
> Not sure about 8.1
>
> > bgwriter_lru_percent = 10.0
> > bgwriter_lru_maxpages = 100
> > bgwriter_all_percent = 5
> > bgwriter_all_maxpages = 200
> > wal_buffers = 16
> >
> > checkpoint_segments = 10
>
> If you have bursts of write activity you might want to increase this.
>
> > checkpoint_warning = 3600
> >
> > effective_cache_size = 180000
>
> That's 180,000 * 8KB = 180 * 8MB = 1.4GB
> If that's really all you're using as cache, I'd reduce the number of
> concurrent connections. Check free/top and see how much RAM is really
> being used as disk cache.
>
> > random_page_cost = 3
>
> Might be too high - you don't mention what disks you have.
>
> > stats_command_string = off
>
> If you turn this one on, you'll be able to see the queries each backend
> is executing as they happen. Might be useful, but does have some cost.
>
>
> The crucial thing is to find out exactly what is happening when things
> get very slow. Check vmstat and top, look in the pg_locks system-table
> and if needs be we can see what strace says a particular backend is doing.
>
> --
>    Richard Huxton
>    Archonet Ltd



pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BD removed
Next
From: "A. Kretschmer"
Date:
Subject: Re: Date / interval question