Re: extremly low memory usage - Mailing list pgsql-performance

From Jeremiah Jahn
Subject Re: extremly low memory usage
Date
Msg-id 1124470109.27881.152.camel@bluejay.goodinassociates.com
Whole thread Raw
In response to Re: extremly low memory usage  (John Arbash Meinel <john@arbash-meinel.com>)
Responses Re: extremly low memory usage
List pgsql-performance
Sorry about the formatting.

On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> Jeremiah Jahn wrote:
>
> >here's an example standard query. Ireally have to make the first hit go
> >faster. The table is clustered as well on full_name as well. 'Smith%'
> >took 87 seconds on the first hit. I wonder if I set up may array wrong.
> >I remeber see something about DMA access versus something else, and
> >choose DMA access. LVM maybe?
> >
> >
> It would be nice if you would format your queries to be a little bit
> easier to read before posting them.
> However, I believe I am reading it correctly, to say that the index scan
> on identity is not your slow point. In fact, as near as I can tell, it
> only takes 52ms to complete.
>
> The expensive parts are the 4915 lookups into the litigant_details (each
> one takes approx 4ms for a total of ~20s).
> And then you do it again on case_data (average 3ms each * 4906 loops =
> ~15s).
Is there some way to avoid this?


>
> So there is no need for preloading your indexes on the identity table.
> It is definitely not the bottleneck.
>
> So a few design bits, which may help your database.
> Why is "actor_id" a text field instead of a number?
This is simply due to the nature of the data.

> You could try creating an index on "litigant_details (actor_id,
> count_ori)" so that it can do just an index lookup, rather than an index
> + filter.
I have one, but it doesn't seem to like to use it. Don't really need it
though, I can just drop the court_id out of the query. It's redundant,
since each actor_id is also unique in litigant details. I had run vac
full and analyze but I ran them again anyway and the planning improved.
However, my 14 disk raid 10 array is still slower than my 3 disk raid 5
on my production box. 46sec vs 30sec (with live traffic on the
production) One of the strange things is that when I run the cat command
on my index and tables that are "HOT" it has no effect on memory usage.
Right now I'm running ext3 on LVM. I'm still in a position to redo the
file system and everything. Is this a good way to do it or should I
switch to something else? What about stripe and extent sizes...? kernel
parameters to change?



---------------devel box:-----------------------

copa=# EXPLAIN ANALYZE select
full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data
copa-# from identity
copa-# join litigant_details on identity.actor_id = litigant_details.actor_id
copa-# join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori =
case_data.court_ori
copa-# join court on identity.court_ori = court.id
copa-# where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by  full_name;
                                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3.29..29482.22 rows=3930 width=86) (actual time=114.060..46001.480 rows=5052 loops=1)
   ->  Nested Loop  (cost=3.29..16193.27 rows=3820 width=112) (actual time=93.038..24584.275 rows=5052 loops=1)
         ->  Nested Loop  (cost=0.00..16113.58 rows=3820 width=113) (actual time=85.778..24536.489 rows=5052 loops=1)
               ->  Index Scan using name_speed on identity  (cost=0.00..824.72 rows=3849 width=82) (actual
time=50.284..150.133rows=5057 loops=1) 
                     Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text <
'JONET'::charactervarying)) 
                     Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text))
               ->  Index Scan using lit_actor_speed on litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual
time=4.788..4.812rows=1 loops=5057) 
                     Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text)
         ->  Materialize  (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052)
               ->  Seq Scan on court  (cost=0.00..3.29 rows=1 width=12) (actual time=7.248..7.257 rows=1 loops=1)
                     Filter: ('IL081025J'::text = (id)::text)
   ->  Index Scan using case_speed on case_data  (cost=0.00..3.46 rows=1 width=26) (actual time=4.222..4.230 rows=1
loops=5052)
         Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text =
(case_data.case_id)::text))
 Total runtime: 46005.994 ms



>
> More importantly, though, the planner seems to think the join of
> identity to litigant_details will only return 1 row, not 5000.
> Do you regularly vacuum analyze your tables?
> Just as a test, try running:
> set enable_nested_loop to off;
not quite acceptable
Total runtime: 221486.149 ms

> And then run EXPLAIN ANALYZE again, just to see if it is faster.
>
> You probably need to increase some statistics targets, so that the
> planner can design better plans.

---------------------this is the output from the production box------------------
LOG:  duration: 27213.068 ms  statement: EXPLAIN ANALYZE select
full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data
        from identity
        join litigant_details on identity.actor_id = litigant_details.actor_id
        join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori =
case_data.court_ori
        join court on identity.court_ori = court.id
        where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by  full_name;
                                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3.29..43498.76 rows=2648 width=86) (actual time=17.106..27192.000 rows=5052 loops=1)
   ->  Nested Loop  (cost=0.00..43442.53 rows=2647 width=87) (actual time=16.947..27120.619 rows=5052 loops=1)
         ->  Nested Loop  (cost=0.00..23061.79 rows=3827 width=113) (actual time=16.801..17390.682 rows=5052 loops=1)
               ->  Index Scan using name_speed on identity  (cost=0.00..1277.39 rows=3858 width=82) (actual
time=9.842..213.424rows=5057 loops=1) 
                     Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text <
'JONET'::charactervarying)) 
                     Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text))
               ->  Index Scan using lit_actor_speed on litigant_details  (cost=0.00..5.63 rows=1 width=81) (actual
time=3.355..3.364rows=1 loops=5057) 
                     Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text)
         ->  Index Scan using case_data_pkey on case_data  (cost=0.00..5.31 rows=1 width=26) (actual time=1.897..1.904
rows=1loops=5052) 
               Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text =
(case_data.case_id)::text))
   ->  Materialize  (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052)
         ->  Seq Scan on court  (cost=0.00..3.29 rows=1 width=12) (actual time=0.142..0.165 rows=1 loops=1)
               Filter: ('IL081025J'::text = (id)::text)
 Total runtime: 27205.060 ms

>
>
> John
> =:->
>
--
"I didn't know it was impossible when I did it."


pgsql-performance by date:

Previous
From: "Jeffrey W. Baker"
Date:
Subject: Re: sustained update load of 1-2k/sec
Next
From: "J. Andrew Rogers"
Date:
Subject: Re: sustained update load of 1-2k/sec