Re: extremly low memory usage - Mailing list pgsql-performance
From | Jeremiah Jahn |
---|---|
Subject | Re: extremly low memory usage |
Date | |
Msg-id | 1124564003.27881.229.camel@bluejay.goodinassociates.com Whole thread Raw |
In response to | Re: extremly low memory usage (John A Meinel <john@arbash-meinel.com>) |
Responses |
Re: extremly low memory usage
|
List | pgsql-performance |
On Fri, 2005-08-19 at 16:03 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > On Fri, 2005-08-19 at 12:18 -0500, John A Meinel wrote: > > > >>Jeremiah Jahn wrote: > >> > > > ... > > >> > >>Well, in general, 3ms for a single lookup seems really long. Maybe your > >>index is bloated by not vacuuming often enough. Do you tend to get a lot > >>of updates to litigant_details? > > > > I have vacuumed this already. I get lots of updates, but this data is > > mostly unchanging. > > > > > >>There are a couple possibilities at this point. First, you can REINDEX > >>the appropriate index, and see if that helps. However, if this is a test > >>box, it sounds like you just did a dump and reload, which wouldn't have > >>bloat in an index. > > > > > > I loaded it using slony > > I don't know that slony versus pg_dump/pg_restore really matters. The > big thing is that Updates wouldn't be trashing your index. > But if you are saying that you cluster once/wk your index can't be that > messed up anyway. (Unless CLUSTER messes up the non-clustered indexes, > but that would make cluster much less useful, so I would have guessed > this was not the case) > > > > > > >>Another possibility. Is this the column that you usually use when > >>pulling information out of litigant_details? If so, you can CLUSTER > >>litigant_details on the appropriate index. This will help things be > >>close together that should be, which decreases the index lookup costs. > > > > clustering on this right now. Most of the other things are already > > clustered. name and case_data > > Just as a reality check, they are clustered on the columns in question, > right? (I don't know if this column is a primary key or not, but any > index can be used for clustering). > > > > > > >>However, if this is not the common column, then you probably will slow > >>down whatever other accesses you may have on this table. > >> > >>After CLUSTER, the current data will stay clustered, but new data will > >>not, so you have to continually CLUSTER, the same way that you might > >>VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as > >>expensive as a VACUUM FULL. Be aware of this, but it might vastly > >>improve your performance, so it would be worth it. > > > > I generally re-cluster once a week. > > > >>> > >>>>So there is no need for preloading your indexes on the identity table. > >>>>It is definitely not the bottleneck. > >>>> > >>>>So a few design bits, which may help your database. > >>>>Why is "actor_id" a text field instead of a number? > >>> > >>>This is simply due to the nature of the data. > >>> > >> > >>I'm just wondering if changing into a number, and using a number->name > >>lookup would be faster for you. It may not be. In general, I prefer to > >>use numbers for references. I may be over paranoid, but I know that some > >>locales are bad with string -> string comparisons. And since the data in > >>your database is stored as UNICODE, I'm not sure if it has to do any > >>translating or not. Again, something to consider, it may not make any > >>difference. > > > > I don't believe so. I initialze the DB as 'lang=C'. I used to have the > > problem where things were being inited as en_US. this would prevent any > > text based index from working. This doesn't seem to be the case here, so > > I'm not worried about it. > > > > Sorry, I think I was confusing you with someone else who posted SHOW ALL. > > > > > > >> > > ... > > > it's cached alright. I'm getting a read rate of about 150MB/sec. I would > > have thought is would be faster with my raid setup. I think I'm going to > > scrap the whole thing and get rid of LVM. I'll just do a straight ext3 > > system. Maybe that will help. Still trying to get suggestions for a > > stripe size. > > > > I don't think 150MB/s is out of the realm for a 14 drive array. > How fast is > time dd if=/dev/zero of=testfile bs=8192 count=1000000 time dd if=/dev/zero of=testfile bs=8192 count=1000000 1000000+0 records in 1000000+0 records out real 1m24.248s user 0m0.381s sys 0m33.028s > (That should create a 8GB file, which is too big to cache everything) > And then how fast is: > time dd if=testfile of=/dev/null bs=8192 count=1000000 time dd if=testfile of=/dev/null bs=8192 count=1000000 1000000+0 records in 1000000+0 records out real 0m54.139s user 0m0.326s sys 0m8.916s and on a second run: real 0m55.667s user 0m0.341s sys 0m9.013s > > That should give you a semi-decent way of measuring how fast the RAID > system is, since it should be too big to cache in ram. about 150MB/Sec. Is there no better way to make this go faster...? > > > > >>I can point you to REINDEX and CLUSTER, but if it is caching in ram, I > >>honestly can't say why the per loop would be that much slower. > >>Are both systems running the same postgres version? It sounds like it is > >>different (since you say something about switching to 8.0). > > > > These had little or no effect. > > The production machine is running 7.4 while the devel machine is running > > 8.0 > > > > Well, my concern is that maybe some portion of the 8.0 code actually > slowed things down for you. You could try reverting to 7.4 on the devel > box, though I think playing with upgrading to 8.1 might be more worthwhile. And the level of stability for 8.1? I started with 7.4 and it didn't really feel as fast as it should either. > > ... > > > > > this is a cached version. > > > > I assume that you mean this is the second run of the query. I can't > compare it too much, since this is "smith" rather than "jones". But this > one is 17s rather than the other one being 46s. > > And that includes having 8k rows instead of having 5k rows. > > Have you tried other values with disabled nested loops? Because this > query (at least in cached form) seems to be *way* faster than with > nested loops. > I know that you somehow managed to get 200s in your testing, but it > might just be that whatever needed to be loaded is now loaded, and you > would get better performance. > If this is true, it means you might need to tweak some settings, and > make sure your statistics are decent, so that postgres can actually pick > the optimal plan. > > > > >>copa=> explain analyze select full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data > >>copa-> from identity > >>copa-> join litigant_details on identity.actor_id = litigant_details.actor_id > >>copa-> join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori = case_data.court_ori > >>copa-> join court on identity.court_ori = court.id > >>copa-> where identity.court_ori = 'IL081025J' and full_name like 'SMITH%' order by full_name; > >> QUERY PLAN > >>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> Sort (cost=100502560.72..100502583.47 rows=9099 width=86) (actual time=17843.876..17849.401 rows=8094 loops=1) > >> Sort Key: identity.full_name > >> -> Merge Join (cost=100311378.72..100501962.40 rows=9099 width=86) (actual time=15195.816..17817.847 rows=8094 loops=1) > >> Merge Cond: ((("outer".court_ori)::text = "inner"."?column10?") AND (("outer".case_id)::text = "inner"."?column11?")) > >> -> Index Scan using case_speed on case_data (cost=0.00..170424.73 rows=3999943 width=26) (actual time=0.015..4540.525rows=3018284 loops=1) > >> -> Sort (cost=100311378.72..100311400.82 rows=8839 width=112) (actual time=9594.985..9601.174 rows=8094 loops=1) > >> Sort Key: (litigant_details.court_ori)::text, (litigant_details.case_id)::text > >> -> Nested Loop (cost=100002491.43..100310799.34 rows=8839 width=112) (actual time=6892.755..9555.828rows=8094 loops=1) > >> -> Seq Scan on court (cost=0.00..3.29 rows=1 width=12) (actual time=0.085..0.096 rows=1 loops=1) > >> Filter: ('IL081025J'::text = (id)::text) > > What I don't really understand is the next part. It seems to be doing an > index scan on 3.7M rows, and getting very decent performance (5s), and > then merging against a table which returns only 8k rows. > Why is it having to look through all of those rows? > I may be missing something, but this says it is able to do 600 index > lookups / millisecond. Which seems superfast. (Compared to your earlier > 4ms / lookup) > Makes me a little confused myself... > Something fishy is going on here. > > > >> -> Merge Join (cost=2491.43..310707.66 rows=8839 width=113) (actual time=6892.656..9519.680 rows=8094loops=1) > >> Merge Cond: (("outer".actor_id)::text = "inner"."?column7?") > >> -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..295722.00 rows=4956820width=81) (actual time=0.027..5613.814 rows=3736703 loops=1) > >> -> Sort (cost=2491.43..2513.71 rows=8913 width=82) (actual time=116.071..122.272 rows=8100loops=1) > >> Sort Key: (identity.actor_id)::text > >> -> Index Scan using name_speed on identity (cost=0.00..1906.66 rows=8913 width=82)(actual time=0.133..81.104 rows=8100 loops=1) > >> Index Cond: (((full_name)::text >= 'SMITH'::character varying) AND ((full_name)::text< 'SMITI'::character varying)) > >> Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'SMITH%'::text)) > >> Total runtime: 17859.917 ms > > > > > >>But really, you have worse index speed, and that needs to be figured out. > >> > >>John > >>=:-> > > I'm assuming your data is private (since it looks like legal stuff). > Unless maybe that makes it part of the public record. > Anyway, I'm not able to, but sometimes someone like Tom can profile > stuff to see what is going on. I've had tom on here before..:) not my devel box, but my production box a couple of years ago. > > I might just be messing up my ability to read the explain output. But > somehow things don't seem to be lining up with the cost of a single > index lookup. > On my crappy Celeron 450 box, an index lookup is 0.06ms once things are > cached in ram. > > John > =:-> > > -- Speak softly and carry a +6 two-handed sword.
pgsql-performance by date: