Re: Join slow on "large" tables - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Join slow on "large" tables
Date
Msg-id 1086648432.27200.37.camel@localhost.localdomain
Whole thread Raw
In response to Join slow on "large" tables  (Josué Maldonado <josue@lamundial.hn>)
Responses Re: Join slow on "large" tables  (Josué Maldonado <josue@lamundial.hn>)
List pgsql-performance
On Mon, 2004-06-07 at 16:19, Josué Maldonado wrote:
> Hello list,
>
> Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition:
> /dev/sda9              29G  8.9G   20G  31% /home2
> /dev/sda9 on /home2 type jfs (rw)
>
> Version()
> PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
> 20020903 (Red Hat Linux 8.0 3.2-7)
>
> I have a view to join two tables inventory details (pkardex) and
> inventory documents header (pmdoc) this view usually runs pretty slow as
> indicated in the explain analyze, pkardex is 1943465 rows and its size
> aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view
> definition is:
>
> SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto,
>      ((((pdc_custid)::text || ' '::text) ||
> (pdc_custname)::text))::character
>      varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN
> (pkd_es
>      = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada,
> CASE WHEN
>      (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS
> ksalida,
>      pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc
> FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk)));
>
>
> Shared memory is:
> /root: cat /proc/sys/kernel/shmmax
> 1073741824
>
> and postgresql.conf have this settings:
> tcpip_socket = true
> sort_mem = 8190         # min 64, size in KB
> vacuum_mem = 262144             # min 1024, size in KB
> checkpoint_segments = 10
> max_connections = 256
> shared_buffers = 32000
> effective_cache_size = 160000   # typically 8KB each
> random_page_cost = 2            # units are one sequ
>
> The explain analyze is:
> dbmund=# explain analyze select * from vkardex where kprocode='1017';
>   Nested Loop  (cost=0.00..32155.66 rows=5831 width=114) (actual
> time=18.223..47983.157 rows=4553 loops=1)
>     ->  Index Scan using pkd_pcode_idx on pkardex  (cost=0.00..11292.52
> rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1)
>           Index Cond: ((pkd_pcode)::text = '1017'::text)
>     ->  Index Scan using pdc_pk_idx on pmdoc  (cost=0.00..3.55 rows=1
> width=50) (actual time=1.659..1.661 rows=1 loops=5049)
>           Index Cond: (pmdoc.pdc_pk = "outer".doctofk)
>   Total runtime: 47988.067 ms
> (6 rows)

OK, you have to ask yourself a question here.  Do I have enough memory
to let both postgresql and the kernel to cache this data, or enough
memory for only one.  Then, you pick one and try it out.  But there's
some issues here.  PostgreSQL's shared buffer are not, and should not
generally be thought of as "cache".  A cache's job it to hold the whole
working set, or as much as possible, ready for access.  A buffer's job
is to hold all the data we're tossing around right this second.  Once
we're done with the data, the buffers can and do just drop whatever was
in them.  PostgreSQL does not have caching, in the classical sense.
that may or may not change.

The kernel, on the other hand, has both cache and buffer.  Ever notice
that a Linux top shows the cache usually being much bigger than the
buffers?  My 512 Meg home box right now has 252968k for cache, and
43276k for buffers.

Now, you're tossing around enough data to actually maybe have a use for
a huge set of buffers, but this means you'll need to starve your cache
to get enough buffers.  Which means that if one process does this kind
of join, drops connection, and two seconds later, another process
connects and does nearly the same thing, it's likely to have to read it
all from the hard drives again, as it's not in the postgresql buffer,
and not in the kernel cache.

Starting a seperate connection, doing a simple select * from table1;
sekect * from table 2, dropping the result set returned, and staying
connected seems to be enough to get 7.4 to hold onto the data.

PostgreSQL's current buffer management algo is dirt simple.  The ones in
the kernel's cache are quite good.  So you can quickly reach a point
where PostgreSQL is chasing it's tail where the kernel would have done
OK.

Your numbers show that you are tossing 659M and 314M against each other,
but I don't know if you're harvesting the whole set at once, or just a
couple row of each.  Indexing help, or is this always gonna be a big seq
scan of 90% of both tables?

If you are getting the whole thing all the time, and want postgresql to
buffer the whole thing (I recommend against it, although a very few
circumstances seem to support it) you need to have 973M of buffer.  That
would be 124544 or we'll just call it 130000.  This high of a number
means you will be getting more than 50% of the RAM for postgreSQL.  At
that point, it seems you might as well go for broke and grab most of it,
~200000 or so.

If you're not always mushing the two things against each other, and
you've got other datasets to interact with, index it.

Oh, in your reply you might to include an explain analyze of the query,
and maybe an output of top while the query is running.


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Join slow on "large" tables
Next
From: "Dan Langille"
Date:
Subject: Re: seq scan woes