Re: Very big insert/join performance problem (bacula) - Mailing list pgsql-performance
From | SystemManagement |
---|---|
Subject | Re: Very big insert/join performance problem (bacula) |
Date | |
Msg-id | 4A5B4692.5070909@fs.wettzell.de Whole thread Raw |
In response to | Very big insert/join performance problem (bacula) (Marc Cousin <mcousin@sigma.fr>) |
Responses |
Re: Very big insert/join performance problem (bacula)
|
List | pgsql-performance |
Hi, just a remark, as the number of entries seems to be very high: Did you ever activate bacula's program dbcheck Option 16? Regards Reiner Marc Cousin schrieb: > Hi, > > I'm trying to solve big performance issues with PostgreSQL + bacula while > inserting very big sets of records. > > I'm sorry, this email will be a long one, as I've already spent quite a lot of > time on the issue, I don't want to waste your time speculating on things I > may already have done, and the problem is (or seems to me) a bit complex. The > other problem is that I don't have the explain plans to provide with the > email right now. I'll try to use this as a way to push 8.4 in this setup, to > dump all these plans with autoexplain (queries are on temporary tables, so a > bit tricky to get). > > Let me first explain or remind how this works. Bacula is a backup solution and > is trying to insert its metadatas at the end of backups (file name, directory > name, size, etc ...) > For what we are interested in, there are 3 tables : > - file > - filename > - path > > file is the one containing most records. It's the real metadata. filename and > path just contain an id and the real file or directory name (to save some > space with redundant names). > > Before explaining the issue, just some information about sizing here : > > file is 1.1 billion records for 280GB (with indexes). > > Column | Type | Modifiers > ------------+---------+------------------------------------------------------- > fileid | bigint | not null default nextval('file_fileid_seq'::regclass) > fileindex | integer | not null default 0 > jobid | integer | not null > pathid | integer | not null > filenameid | integer | not null > markid | integer | not null default 0 > lstat | text | not null > md5 | text | not null > Indexes: > "file_pkey" UNIQUE, btree (fileid) > "file_fp_idx" btree (filenameid, pathid) > "file_jpfid_idx" btree (jobid, pathid, filenameid) > > > path is 17 million for 6 GB > > Column | Type | Modifiers > --------+---------+------------------------------------------------------- > pathid | integer | not null default nextval('path_pathid_seq'::regclass) > path | text | not null > Indexes: > "path_pkey" PRIMARY KEY, btree (pathid) > "path_name_idx" UNIQUE, btree (path) > > filename is 80 million for 13GB > > Column | Type | Modifiers > ------------+---------+--------------------------------------------------------------- > filenameid | integer | not null default > nextval('filename_filenameid_seq'::regclass) > name | text | not null > Indexes: > "filename_pkey" PRIMARY KEY, btree (filenameid) > "filename_name_idx" UNIQUE, btree (name) > > > There are several queries for each job despooling : > > First we fill a temp table with the raw data (filename, pathname, metadata), > using COPY (no problem here) > > Then we insert missing filenames in file, and missing pathnames in path, > with this query (and the same for file) : > > INSERT INTO Path (Path) > SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a > WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path) > > These do nested loops and work very well (after a sort on batch to get rid > from duplicates). They work reasonably fast (what one would expect when > looping on millions of records... they do their job in a few minutes). > > The problem occurs with the final query, which inserts data in file, joining > the temp table to both file and filename > > INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5) > SELECT batch.FileIndex, > batch.JobId, > Path.PathId, > Filename.FilenameId, > batch.LStat, > batch.MD5 > FROM batch > JOIN Path ON (batch.Path = Path.Path) > JOIN Filename ON (batch.Name = Filename.Name) > > This one has two split personnalities, depending on how many records are in > batch. > For small batch tables, it does nested loops. > For big batch tables (more than around one million initially) it decides to > hash join path (still ok, it's reasonably small) and then filename to batch > before starting. And that's when the problems begin The behaviour seems > logicial to me, it should go to hash join when batch gets bigger, but it > seems to be much too early here, considering the size of filename. > > First of all, performance remains much better on nested loops, except for > extremely big batches (i'd say over 30 million, extrapolating from the times > I'm seeing with 10 millions records), so if I disable hash/merge joins, I get > my performance back on these queries (they execute in around the same time as > the searches in path and filename above). So I found a way to make most of my > queries do nested loops (I'll come back to this later) > > Second, If there is more than one of these big sorts, performance degrades > drastically (we had 2 of them this weekend, they both took 24 hours to > complete). This is probably due to our quite bad disk setup (we didn't have a > big budget for this). There was no swapping of linux > > > So all of this makes me think there is a cost evaluation problem in this > setup : with the default values, postgresql seems to underestimate the cost > of sorting here (the row estimates were good, no problem with that). > PostgreSQL seems to think that at around 1 million records in file it should > go with a hash join on filename and path, so we go on hashing the 17 million > records of path, the 80 millions of filename, then joining and inserting into > file (we're talking about sorting around 15 GB for each of these despools in > parallel). > > Temporarily I moved the problem at a bit higher sizes of batch by changing > random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an > apprentice sorcerer with this, as I told postgreSQL that fetching rows from > disk are much cheaper than they are. These values are, I think, completely > abnormal. Doing this, I got the change of plan at around 8 million. And had 2 > of them at 9 millions at the same time this weekend, and both of the took 24 > hours, while the nested loops before the join (for inserts in path and > filename) did their work in minutes... > > So, finally, to my questions : > - Is it normal that PostgreSQL is this off base on these queries (sorry I > don't have the plans, if they are required I'll do my best to get some, but > they really are the two obvious plans for this kind of query). What could > make it choose the hash join for too small batch tables ? > - Is changing the 2 costs the way to go ? > - Is there a way to tell postgreSQL that it's more costly to sort than it > thinks ? (instead of telling it that fetching data from disk doesn't cost > anything). > > > > > Here are the other non-default values from my configuration : > > shared_buffers = 2GB > work_mem = 64MB > maintenance_work_mem = 256MB > max_fsm_pages = 15000000 # There are quite big deletes with bacula ... > effective_cache_size = 800MB > default_statistics_target = 1000 > > PostgreSQL is 8.3.5 on Debian Lenny > > > > I'm sorry for this very long email, I tried to be as precise as I could, but > don't hesitate to ask me more. > > Thanks for helping. > > Marc Cousin > >
pgsql-performance by date: