Re: extremly low memory usage - Mailing list pgsql-performance

From Ron
Subject Re: extremly low memory usage
Date
Msg-id 6.2.3.4.0.20050818141059.05c2fac0@pop.earthlink.net
Whole thread Raw
In response to Re: extremly low memory usage  (John Arbash Meinel <john@arbash-meinel.com>)
List pgsql-performance
At 01:55 PM 8/18/2005, John Arbash Meinel wrote:
>Jeremiah Jahn wrote:
>
> >here's an example standard query. Ireally have to make the first hit go
> >faster. The table is clustered as well on full_name as well. 'Smith%'
> >took 87 seconds on the first hit. I wonder if I set up may array wrong.
> >I remeber see something about DMA access versus something else, and
> >choose DMA access. LVM maybe?
> >
> >
>It would be nice if you would format your queries to be a little bit
>easier to read before posting them.
>However, I believe I am reading it correctly, to say that the index scan
>on identity is not your slow point. In fact, as near as I can tell, it
>only takes 52ms to complete.
>
>The expensive parts are the 4915 lookups into the litigant_details (each
>one takes approx 4ms for a total of ~20s).
>And then you do it again on case_data (average 3ms each * 4906 loops =
>~15s).

How big are litigant_details and case_data?  If they can fit in RAM,
preload them using methods like the "cat to /dev/null" trick and
those table lookups will be ~100-1000x faster.  If they won't fit
into RAM but the machine can be expanded to hold enough RAM to fit
the tables, it's well worth the ~$75-$150/GB to upgrade the server so
that the tables will fit into RAM.

If they can't be made to fit into RAM as atomic entities, you have a
few choices:
A= Put the data tables and indexes on separate dedicated spindles and
put litigant_details and case_data each on their own dedicated
spindles.  This will lower seek conflicts.  Again it this requires
buying some more HDs, it's well worth it.

B= Break litigant_details and case_data into a set of smaller tables
(based on something sane like the first n characters of the primary key)
such that the smaller tables easily fit into RAM.  Given that you've
said only 10GB/60GB is "hot", this could work very well.  Combine it
with "A" above (put all the litigant_details sub tables on one
dedicated spindle set and all the case_data sub tables on another
spindle set) for added oomph.

C= Buy a SSD big enough to hold litigant_details and case_data and
put them there.  Again, this can be combined with "A" and "B" above
to lessen the size of the SSD needed.


>So there is no need for preloading your indexes on the identity
>table.  It is definitely not the bottleneck.
>
>So a few design bits, which may help your database.  Why is
>"actor_id" a text field instead of a number?
>You could try creating an index on "litigant_details (actor_id,
>count_ori)" so that it can do just an index lookup, rather than an
>index+ filter.

Yes, that certainly sounds like it would be more efficient.


>More importantly, though, the planner seems to think the join of
>identity to litigant_details will only return 1 row, not 5000.
>Do you regularly vacuum analyze your tables?
>Just as a test, try running:
>set enable_nested_loop to off;
>And then run EXPLAIN ANALYZE again, just to see if it is faster.
>
>You probably need to increase some statistics targets, so that the
>planner can design better plans.
>
> > ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173) (actual
> time=12.891..38317.017 rows=4906 loops=1)
> >    ->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159)(actual
> time=12.826..23232.106 rows=4906 loops=1)
> >       ->  Nested Loop  (cost=0.00..20403.18 rows=1 width=138)
> (actual time=12.751..22885.439 rows=4906 loops=1)
> >               Join Filter: (("outer".case_id)::text =
> ("inner".case_id)::text)
> >               ->  Index Scan using name_speed on
> identity  (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538
> > rows=4915 loops=1)
> >                    Index Cond: (((full_name)::text >=
> 'MILLER'::character varying) AND ((full_name)::text <
> 'MILLES'::character varying))
> >                    Filter: (((court_ori)::text =
> 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
> >                    ->  Index Scan using lit_actor_speed on
> litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual
> > time=4.631..4.635 rows=1 loops=4915)
> >                         Index Cond: (("outer".actor_id)::text =
> (litigant_details.actor_id)::text)
> >                         Filter: ('IL081025J'::text = (court_ori)::text)
> >                         ->  Seq Scan on court  (cost=0.00..3.29
> rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906)
> >                              Filter: ('IL081025J'::text = (id)::text)
> >                              ->  Index Scan using case_speed on
> case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058
> > rows=1 loops=4906)
> >                                   Index Cond: (('IL081025J'::text
> = (case_data.court_ori)::text) AND ((case_data.case_id)::text =
>  > ("outer".case_id)::text))




pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: limit number of concurrent callers to a stored proc?
Next
From: David Hodgkinson
Date:
Subject: Re: FW: Tx forecast improving harware capabilities.