Home > mailing lists
Re: Very big insert/join performance problem (bacula) - Mailing list pgsql-performance

From	SystemManagement
Subject	Re: Very big insert/join performance problem (bacula)
Date	July 13, 2009 14:45:15
Msg-id	4A5B4692.5070909@fs.wettzell.de Whole thread Raw
In response to	Very big insert/join performance problem (bacula) (Marc Cousin <mcousin@sigma.fr>)
Responses	Re: Very big insert/join performance problem (bacula)
List	pgsql-performance
Tree view
Hi,
just a remark, as the number of entries seems to be very high:
Did you ever activate bacula's program dbcheck Option 16?

Regards

Reiner


Marc Cousin schrieb:
> Hi,
>
> I'm trying to solve big performance issues with PostgreSQL + bacula while
> inserting very big sets of records.
>
> I'm sorry, this email will be a long one, as I've already spent quite a lot of
> time on the issue, I don't want to waste your time speculating on things I
> may already have done, and the problem is (or seems to me) a bit complex. The
> other problem is that I don't have the explain plans to provide with the
> email right now. I'll try to use this as a way to push 8.4 in this setup, to
> dump all these plans with autoexplain (queries are on temporary tables, so a
> bit tricky to get).
>
> Let me first explain or remind how this works. Bacula is a backup solution and
> is trying to insert its metadatas at the end of backups (file name, directory
> name, size, etc ...)
> For what we are interested in, there are 3 tables :
> - file
> - filename
> - path
>
> file is the one containing most records. It's the real metadata. filename and
> path just contain an id and the real file or directory name (to save some
> space with redundant names).
>
> Before explaining the issue, just some information about sizing here :
>
> file is 1.1 billion records for 280GB (with indexes).
>
>    Column   |  Type   |                       Modifiers
> ------------+---------+-------------------------------------------------------
>  fileid     | bigint  | not null default nextval('file_fileid_seq'::regclass)
>  fileindex  | integer | not null default 0
>  jobid      | integer | not null
>  pathid     | integer | not null
>  filenameid | integer | not null
>  markid     | integer | not null default 0
>  lstat      | text    | not null
>  md5        | text    | not null
> Indexes:
>     "file_pkey" UNIQUE, btree (fileid)
>     "file_fp_idx" btree (filenameid, pathid)
>     "file_jpfid_idx" btree (jobid, pathid, filenameid)
>
>
> path is 17 million for 6 GB
>
>  Column |  Type   |                       Modifiers
> --------+---------+-------------------------------------------------------
>  pathid | integer | not null default nextval('path_pathid_seq'::regclass)
>  path   | text    | not null
> Indexes:
>     "path_pkey" PRIMARY KEY, btree (pathid)
>     "path_name_idx" UNIQUE, btree (path)
>
> filename is 80 million for 13GB
>
>    Column   |  Type   |                           Modifiers
> ------------+---------+---------------------------------------------------------------
>  filenameid | integer | not null default
> nextval('filename_filenameid_seq'::regclass)
>  name       | text    | not null
> Indexes:
>     "filename_pkey" PRIMARY KEY, btree (filenameid)
>     "filename_name_idx" UNIQUE, btree (name)
>
>
> There are several queries for each job despooling :
>
> First we fill a temp table with the raw data (filename, pathname, metadata),
> using COPY (no problem here)
>
> Then we insert missing filenames in file, and missing pathnames in path,
> with this query (and the same for file) :
>
> INSERT INTO Path (Path)
>   SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a
>      WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path)
>
> These do nested loops and work very well (after a sort on batch to get rid
> from duplicates). They work reasonably fast (what one would expect when
> looping on millions of records... they do their job in a few minutes).
>
> The problem occurs with the final query, which inserts data in file, joining
> the temp table to both file and filename
>
> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)
>   SELECT batch.FileIndex,
>               batch.JobId,
>               Path.PathId,
>               Filename.FilenameId,
>               batch.LStat,
>               batch.MD5
>   FROM batch
>   JOIN Path ON (batch.Path = Path.Path)
>   JOIN Filename ON (batch.Name = Filename.Name)
>
> This one has two split personnalities, depending on how many records are in
> batch.
> For small batch tables, it does nested loops.
> For big batch tables (more than around one million initially) it decides to
> hash join path (still ok, it's reasonably small) and then filename to batch
> before starting. And that's when the problems begin The behaviour seems
> logicial to me, it should go to hash join when batch gets bigger, but it
> seems to be much too early here, considering the size of filename.
>
> First of all, performance remains much better on nested loops, except for
> extremely big batches (i'd say over 30 million, extrapolating from the times
> I'm seeing with 10 millions records), so if I disable hash/merge joins, I get
> my performance back on these queries (they execute in around the same time as
> the searches in path and filename above). So I found a way to make most of my
> queries do nested loops (I'll come back to this later)
>
> Second, If there is more than one of these big sorts, performance degrades
> drastically (we had 2 of them this weekend, they both took 24 hours to
> complete). This is probably due to our quite bad disk setup (we didn't have a
> big budget for this). There was no swapping of linux
>
>
> So all of this makes me think there is a cost evaluation problem in this
> setup : with the default values, postgresql seems to underestimate the cost
> of sorting here (the row estimates were good, no problem with that).
> PostgreSQL seems to think that at around 1 million records in file it should
> go with a hash join on filename and path, so we go on hashing the 17 million
> records of path, the 80 millions of filename, then joining and inserting into
> file (we're talking about sorting around 15 GB for each of these despools in
> parallel).
>
> Temporarily I moved the problem at a bit higher sizes of batch by changing
> random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
> apprentice sorcerer with this, as I told postgreSQL that fetching rows from
> disk are much cheaper than they are. These values are, I think, completely
> abnormal. Doing this, I got the change of plan at around 8 million. And had 2
> of them at 9 millions at the same time this weekend, and both of the took 24
> hours, while the nested loops before the join (for inserts in path and
> filename) did their work in minutes...
>
> So, finally, to my questions :
> - Is it normal that PostgreSQL is this off base on these queries (sorry I
> don't have the plans, if they are required I'll do my best to get some, but
> they really are the two obvious plans for this kind of query). What could
> make it choose the hash join for too small batch tables ?
> - Is changing the 2 costs the way to go ?
> - Is there a way to tell postgreSQL that it's more costly to sort than it
> thinks ? (instead of telling it that fetching data from disk doesn't cost
> anything).
>
>
>
>
> Here are the other non-default values from my configuration :
>
> shared_buffers = 2GB
> work_mem = 64MB
> maintenance_work_mem = 256MB
> max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> effective_cache_size = 800MB
> default_statistics_target = 1000
>
> PostgreSQL is 8.3.5 on Debian Lenny
>
>
>
> I'm sorry for this very long email, I tried to be as precise as I could, but
> don't hesitate to ask me more.
>
> Thanks for helping.
>
> Marc Cousin
>
>
pgsql-performance by date:
From: Marc Cousin
Date: 13 July 2009, 14:01:14
Subject: Very big insert/join performance problem (bacula)
From: Marc Cousin
Date: 13 July 2009, 14:53:19
Subject: Re: Very big insert/join performance problem (bacula)
Re: Very big insert/join performance problem (bacula) - Mailing list pgsql-performance

Previous

Next