Re: extremly low memory usage - Mailing list pgsql-performance
From | Jeremiah Jahn |
---|---|
Subject | Re: extremly low memory usage |
Date | |
Msg-id | 1124470109.27881.152.camel@bluejay.goodinassociates.com Whole thread Raw |
In response to | Re: extremly low memory usage (John Arbash Meinel <john@arbash-meinel.com>) |
Responses |
Re: extremly low memory usage
|
List | pgsql-performance |
Sorry about the formatting. On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote: > Jeremiah Jahn wrote: > > >here's an example standard query. Ireally have to make the first hit go > >faster. The table is clustered as well on full_name as well. 'Smith%' > >took 87 seconds on the first hit. I wonder if I set up may array wrong. > >I remeber see something about DMA access versus something else, and > >choose DMA access. LVM maybe? > > > > > It would be nice if you would format your queries to be a little bit > easier to read before posting them. > However, I believe I am reading it correctly, to say that the index scan > on identity is not your slow point. In fact, as near as I can tell, it > only takes 52ms to complete. > > The expensive parts are the 4915 lookups into the litigant_details (each > one takes approx 4ms for a total of ~20s). > And then you do it again on case_data (average 3ms each * 4906 loops = > ~15s). Is there some way to avoid this? > > So there is no need for preloading your indexes on the identity table. > It is definitely not the bottleneck. > > So a few design bits, which may help your database. > Why is "actor_id" a text field instead of a number? This is simply due to the nature of the data. > You could try creating an index on "litigant_details (actor_id, > count_ori)" so that it can do just an index lookup, rather than an index > + filter. I have one, but it doesn't seem to like to use it. Don't really need it though, I can just drop the court_id out of the query. It's redundant, since each actor_id is also unique in litigant details. I had run vac full and analyze but I ran them again anyway and the planning improved. However, my 14 disk raid 10 array is still slower than my 3 disk raid 5 on my production box. 46sec vs 30sec (with live traffic on the production) One of the strange things is that when I run the cat command on my index and tables that are "HOT" it has no effect on memory usage. Right now I'm running ext3 on LVM. I'm still in a position to redo the file system and everything. Is this a good way to do it or should I switch to something else? What about stripe and extent sizes...? kernel parameters to change? ---------------devel box:----------------------- copa=# EXPLAIN ANALYZE select full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data copa-# from identity copa-# join litigant_details on identity.actor_id = litigant_details.actor_id copa-# join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori = case_data.court_ori copa-# join court on identity.court_ori = court.id copa-# where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by full_name; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=3.29..29482.22 rows=3930 width=86) (actual time=114.060..46001.480 rows=5052 loops=1) -> Nested Loop (cost=3.29..16193.27 rows=3820 width=112) (actual time=93.038..24584.275 rows=5052 loops=1) -> Nested Loop (cost=0.00..16113.58 rows=3820 width=113) (actual time=85.778..24536.489 rows=5052 loops=1) -> Index Scan using name_speed on identity (cost=0.00..824.72 rows=3849 width=82) (actual time=50.284..150.133rows=5057 loops=1) Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text < 'JONET'::charactervarying)) Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text)) -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual time=4.788..4.812rows=1 loops=5057) Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) -> Materialize (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052) -> Seq Scan on court (cost=0.00..3.29 rows=1 width=12) (actual time=7.248..7.257 rows=1 loops=1) Filter: ('IL081025J'::text = (id)::text) -> Index Scan using case_speed on case_data (cost=0.00..3.46 rows=1 width=26) (actual time=4.222..4.230 rows=1 loops=5052) Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text = (case_data.case_id)::text)) Total runtime: 46005.994 ms > > More importantly, though, the planner seems to think the join of > identity to litigant_details will only return 1 row, not 5000. > Do you regularly vacuum analyze your tables? > Just as a test, try running: > set enable_nested_loop to off; not quite acceptable Total runtime: 221486.149 ms > And then run EXPLAIN ANALYZE again, just to see if it is faster. > > You probably need to increase some statistics targets, so that the > planner can design better plans. ---------------------this is the output from the production box------------------ LOG: duration: 27213.068 ms statement: EXPLAIN ANALYZE select full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data from identity join litigant_details on identity.actor_id = litigant_details.actor_id join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori = case_data.court_ori join court on identity.court_ori = court.id where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by full_name; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=3.29..43498.76 rows=2648 width=86) (actual time=17.106..27192.000 rows=5052 loops=1) -> Nested Loop (cost=0.00..43442.53 rows=2647 width=87) (actual time=16.947..27120.619 rows=5052 loops=1) -> Nested Loop (cost=0.00..23061.79 rows=3827 width=113) (actual time=16.801..17390.682 rows=5052 loops=1) -> Index Scan using name_speed on identity (cost=0.00..1277.39 rows=3858 width=82) (actual time=9.842..213.424rows=5057 loops=1) Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text < 'JONET'::charactervarying)) Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text)) -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..5.63 rows=1 width=81) (actual time=3.355..3.364rows=1 loops=5057) Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) -> Index Scan using case_data_pkey on case_data (cost=0.00..5.31 rows=1 width=26) (actual time=1.897..1.904 rows=1loops=5052) Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text = (case_data.case_id)::text)) -> Materialize (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052) -> Seq Scan on court (cost=0.00..3.29 rows=1 width=12) (actual time=0.142..0.165 rows=1 loops=1) Filter: ('IL081025J'::text = (id)::text) Total runtime: 27205.060 ms > > > John > =:-> > -- "I didn't know it was impossible when I did it."
pgsql-performance by date: