Re: extremly low memory usage - Mailing list pgsql-performance
From | Ron |
---|---|
Subject | Re: extremly low memory usage |
Date | |
Msg-id | 6.2.3.4.0.20050818141059.05c2fac0@pop.earthlink.net Whole thread Raw |
In response to | Re: extremly low memory usage (John Arbash Meinel <john@arbash-meinel.com>) |
List | pgsql-performance |
At 01:55 PM 8/18/2005, John Arbash Meinel wrote: >Jeremiah Jahn wrote: > > >here's an example standard query. Ireally have to make the first hit go > >faster. The table is clustered as well on full_name as well. 'Smith%' > >took 87 seconds on the first hit. I wonder if I set up may array wrong. > >I remeber see something about DMA access versus something else, and > >choose DMA access. LVM maybe? > > > > >It would be nice if you would format your queries to be a little bit >easier to read before posting them. >However, I believe I am reading it correctly, to say that the index scan >on identity is not your slow point. In fact, as near as I can tell, it >only takes 52ms to complete. > >The expensive parts are the 4915 lookups into the litigant_details (each >one takes approx 4ms for a total of ~20s). >And then you do it again on case_data (average 3ms each * 4906 loops = >~15s). How big are litigant_details and case_data? If they can fit in RAM, preload them using methods like the "cat to /dev/null" trick and those table lookups will be ~100-1000x faster. If they won't fit into RAM but the machine can be expanded to hold enough RAM to fit the tables, it's well worth the ~$75-$150/GB to upgrade the server so that the tables will fit into RAM. If they can't be made to fit into RAM as atomic entities, you have a few choices: A= Put the data tables and indexes on separate dedicated spindles and put litigant_details and case_data each on their own dedicated spindles. This will lower seek conflicts. Again it this requires buying some more HDs, it's well worth it. B= Break litigant_details and case_data into a set of smaller tables (based on something sane like the first n characters of the primary key) such that the smaller tables easily fit into RAM. Given that you've said only 10GB/60GB is "hot", this could work very well. Combine it with "A" above (put all the litigant_details sub tables on one dedicated spindle set and all the case_data sub tables on another spindle set) for added oomph. C= Buy a SSD big enough to hold litigant_details and case_data and put them there. Again, this can be combined with "A" and "B" above to lessen the size of the SSD needed. >So there is no need for preloading your indexes on the identity >table. It is definitely not the bottleneck. > >So a few design bits, which may help your database. Why is >"actor_id" a text field instead of a number? >You could try creating an index on "litigant_details (actor_id, >count_ori)" so that it can do just an index lookup, rather than an >index+ filter. Yes, that certainly sounds like it would be more efficient. >More importantly, though, the planner seems to think the join of >identity to litigant_details will only return 1 row, not 5000. >Do you regularly vacuum analyze your tables? >Just as a test, try running: >set enable_nested_loop to off; >And then run EXPLAIN ANALYZE again, just to see if it is faster. > >You probably need to increase some statistics targets, so that the >planner can design better plans. > > > -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual > time=12.891..38317.017 rows=4906 loops=1) > > -> Nested Loop (cost=0.00..20406.48 rows=1 width=159)(actual > time=12.826..23232.106 rows=4906 loops=1) > > -> Nested Loop (cost=0.00..20403.18 rows=1 width=138) > (actual time=12.751..22885.439 rows=4906 loops=1) > > Join Filter: (("outer".case_id)::text = > ("inner".case_id)::text) > > -> Index Scan using name_speed on > identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 > > rows=4915 loops=1) > > Index Cond: (((full_name)::text >= > 'MILLER'::character varying) AND ((full_name)::text < > 'MILLES'::character varying)) > > Filter: (((court_ori)::text = > 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) > > -> Index Scan using lit_actor_speed on > litigant_details (cost=0.00..3.96 rows=1 width=81) (actual > > time=4.631..4.635 rows=1 loops=4915) > > Index Cond: (("outer".actor_id)::text = > (litigant_details.actor_id)::text) > > Filter: ('IL081025J'::text = (court_ori)::text) > > -> Seq Scan on court (cost=0.00..3.29 > rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906) > > Filter: ('IL081025J'::text = (id)::text) > > -> Index Scan using case_speed on > case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 > > rows=1 loops=4906) > > Index Cond: (('IL081025J'::text > = (case_data.court_ori)::text) AND ((case_data.case_id)::text = > > ("outer".case_id)::text))
pgsql-performance by date: