Re: Very big insert/join performance problem (bacula) - Mailing list pgsql-performance

From Marc Cousin
Subject Re: Very big insert/join performance problem (bacula)
Date
Msg-id 200907160941.05619.mcousin@sigma.fr
Whole thread Raw
In response to Re: Very big insert/join performance problem (bacula)  (Marc Cousin <cousinmarc@gmail.com>)
List pgsql-performance
On Thursday 16 July 2009 07:20:18 Marc Cousin wrote:
> Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit :
> > Marc Cousin wrote:
> > > This mail contains the asked plans :
> > > Plan 1
> > > around 1 million records to insert, seq_page_cost 1, random_page_cost 4
> > >
> > >          ->  Hash  (cost=425486.72..425486.72 rows=16746972 width=92)
> > > (actual time=23184.196..23184.196 rows=16732049 loops=1) ->  Seq Scan
> > > on path  (cost=0.00..425486.72 rows=16746972 width=92) (actual
> > > time=0.004..7318.850 rows=16732049 loops=1)
> > >
> > >    ->  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35)
> > > (actual time=210831.840..210831.840 rows=79094418 loops=1) ->  Seq Scan
> > > on filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual
> > > time=46.324..148887.662 rows=79094418 loops=1)
> >
> > This doesn't address the cost driving plan question, but I think it's a
> > bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
> > while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
> > ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
> > some terrible bloat on filename that's not present on path?  If that seq
> > scan time on filename were proportionate to path this plan would
> > complete about two minutes faster (making it only 6 times slower instead
> > of 9 :).
>
> Much simpler than that I think : there is a bigger percentage of path that
> is used all the time than of filename. The database used is the production
> database, so there were other insert queries running a few minutes before I
> got this plan.
>
> But I'll give it a look today and come back with bloat and cache
> information on these 2 tables.

Here are the stats for filename :

SELECT * from pgstattuple('public.filename');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent 

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 5308719104 |    79338344 | 4717466438 |         88.86 |                0 |              0 |                  0 |
11883396|         0.22 

So I guess it's not bloated.

I checked in the cache, the times displayed before were with path in the cache. filename couldn't stay in the cache, as
it'stoo big. 

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Concurrency issue under very heay loads
Next
From: Andres Freund
Date:
Subject: Re: Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1