Thread: Very big insert/join performance problem (bacula)

Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Hi,

I'm trying to solve big performance issues with PostgreSQL + bacula while
inserting very big sets of records.

I'm sorry, this email will be a long one, as I've already spent quite a lot of
time on the issue, I don't want to waste your time speculating on things I
may already have done, and the problem is (or seems to me) a bit complex. The
other problem is that I don't have the explain plans to provide with the
email right now. I'll try to use this as a way to push 8.4 in this setup, to
dump all these plans with autoexplain (queries are on temporary tables, so a
bit tricky to get).

Let me first explain or remind how this works. Bacula is a backup solution and
is trying to insert its metadatas at the end of backups (file name, directory
name, size, etc ...)
For what we are interested in, there are 3 tables :
- file
- filename
- path

file is the one containing most records. It's the real metadata. filename and
path just contain an id and the real file or directory name (to save some
space with redundant names).

Before explaining the issue, just some information about sizing here :

file is 1.1 billion records for 280GB (with indexes).

   Column   |  Type   |                       Modifiers
------------+---------+-------------------------------------------------------
 fileid     | bigint  | not null default nextval('file_fileid_seq'::regclass)
 fileindex  | integer | not null default 0
 jobid      | integer | not null
 pathid     | integer | not null
 filenameid | integer | not null
 markid     | integer | not null default 0
 lstat      | text    | not null
 md5        | text    | not null
Indexes:
    "file_pkey" UNIQUE, btree (fileid)
    "file_fp_idx" btree (filenameid, pathid)
    "file_jpfid_idx" btree (jobid, pathid, filenameid)


path is 17 million for 6 GB

 Column |  Type   |                       Modifiers
--------+---------+-------------------------------------------------------
 pathid | integer | not null default nextval('path_pathid_seq'::regclass)
 path   | text    | not null
Indexes:
    "path_pkey" PRIMARY KEY, btree (pathid)
    "path_name_idx" UNIQUE, btree (path)

filename is 80 million for 13GB

   Column   |  Type   |                           Modifiers
------------+---------+---------------------------------------------------------------
 filenameid | integer | not null default
nextval('filename_filenameid_seq'::regclass)
 name       | text    | not null
Indexes:
    "filename_pkey" PRIMARY KEY, btree (filenameid)
    "filename_name_idx" UNIQUE, btree (name)


There are several queries for each job despooling :

First we fill a temp table with the raw data (filename, pathname, metadata),
using COPY (no problem here)

Then we insert missing filenames in file, and missing pathnames in path,
with this query (and the same for file) :

INSERT INTO Path (Path)
  SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a
     WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path)

These do nested loops and work very well (after a sort on batch to get rid
from duplicates). They work reasonably fast (what one would expect when
looping on millions of records... they do their job in a few minutes).

The problem occurs with the final query, which inserts data in file, joining
the temp table to both file and filename

INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)
  SELECT batch.FileIndex,
              batch.JobId,
              Path.PathId,
              Filename.FilenameId,
              batch.LStat,
              batch.MD5
  FROM batch
  JOIN Path ON (batch.Path = Path.Path)
  JOIN Filename ON (batch.Name = Filename.Name)

This one has two split personnalities, depending on how many records are in
batch.
For small batch tables, it does nested loops.
For big batch tables (more than around one million initially) it decides to
hash join path (still ok, it's reasonably small) and then filename to batch
before starting. And that's when the problems begin The behaviour seems
logicial to me, it should go to hash join when batch gets bigger, but it
seems to be much too early here, considering the size of filename.

First of all, performance remains much better on nested loops, except for
extremely big batches (i'd say over 30 million, extrapolating from the times
I'm seeing with 10 millions records), so if I disable hash/merge joins, I get
my performance back on these queries (they execute in around the same time as
the searches in path and filename above). So I found a way to make most of my
queries do nested loops (I'll come back to this later)

Second, If there is more than one of these big sorts, performance degrades
drastically (we had 2 of them this weekend, they both took 24 hours to
complete). This is probably due to our quite bad disk setup (we didn't have a
big budget for this). There was no swapping of linux


So all of this makes me think there is a cost evaluation problem in this
setup : with the default values, postgresql seems to underestimate the cost
of sorting here (the row estimates were good, no problem with that).
PostgreSQL seems to think that at around 1 million records in file it should
go with a hash join on filename and path, so we go on hashing the 17 million
records of path, the 80 millions of filename, then joining and inserting into
file (we're talking about sorting around 15 GB for each of these despools in
parallel).

Temporarily I moved the problem at a bit higher sizes of batch by changing
random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
apprentice sorcerer with this, as I told postgreSQL that fetching rows from
disk are much cheaper than they are. These values are, I think, completely
abnormal. Doing this, I got the change of plan at around 8 million. And had 2
of them at 9 millions at the same time this weekend, and both of the took 24
hours, while the nested loops before the join (for inserts in path and
filename) did their work in minutes...

So, finally, to my questions :
- Is it normal that PostgreSQL is this off base on these queries (sorry I
don't have the plans, if they are required I'll do my best to get some, but
they really are the two obvious plans for this kind of query). What could
make it choose the hash join for too small batch tables ?
- Is changing the 2 costs the way to go ?
- Is there a way to tell postgreSQL that it's more costly to sort than it
thinks ? (instead of telling it that fetching data from disk doesn't cost
anything).




Here are the other non-default values from my configuration :

shared_buffers = 2GB
work_mem = 64MB
maintenance_work_mem = 256MB
max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
effective_cache_size = 800MB
default_statistics_target = 1000

PostgreSQL is 8.3.5 on Debian Lenny



I'm sorry for this very long email, I tried to be as precise as I could, but
don't hesitate to ask me more.

Thanks for helping.

Marc Cousin

Re: Very big insert/join performance problem (bacula)

From
SystemManagement
Date:
Hi,
just a remark, as the number of entries seems to be very high:
Did you ever activate bacula's program dbcheck Option 16?

Regards

Reiner


Marc Cousin schrieb:
> Hi,
>
> I'm trying to solve big performance issues with PostgreSQL + bacula while
> inserting very big sets of records.
>
> I'm sorry, this email will be a long one, as I've already spent quite a lot of
> time on the issue, I don't want to waste your time speculating on things I
> may already have done, and the problem is (or seems to me) a bit complex. The
> other problem is that I don't have the explain plans to provide with the
> email right now. I'll try to use this as a way to push 8.4 in this setup, to
> dump all these plans with autoexplain (queries are on temporary tables, so a
> bit tricky to get).
>
> Let me first explain or remind how this works. Bacula is a backup solution and
> is trying to insert its metadatas at the end of backups (file name, directory
> name, size, etc ...)
> For what we are interested in, there are 3 tables :
> - file
> - filename
> - path
>
> file is the one containing most records. It's the real metadata. filename and
> path just contain an id and the real file or directory name (to save some
> space with redundant names).
>
> Before explaining the issue, just some information about sizing here :
>
> file is 1.1 billion records for 280GB (with indexes).
>
>    Column   |  Type   |                       Modifiers
> ------------+---------+-------------------------------------------------------
>  fileid     | bigint  | not null default nextval('file_fileid_seq'::regclass)
>  fileindex  | integer | not null default 0
>  jobid      | integer | not null
>  pathid     | integer | not null
>  filenameid | integer | not null
>  markid     | integer | not null default 0
>  lstat      | text    | not null
>  md5        | text    | not null
> Indexes:
>     "file_pkey" UNIQUE, btree (fileid)
>     "file_fp_idx" btree (filenameid, pathid)
>     "file_jpfid_idx" btree (jobid, pathid, filenameid)
>
>
> path is 17 million for 6 GB
>
>  Column |  Type   |                       Modifiers
> --------+---------+-------------------------------------------------------
>  pathid | integer | not null default nextval('path_pathid_seq'::regclass)
>  path   | text    | not null
> Indexes:
>     "path_pkey" PRIMARY KEY, btree (pathid)
>     "path_name_idx" UNIQUE, btree (path)
>
> filename is 80 million for 13GB
>
>    Column   |  Type   |                           Modifiers
> ------------+---------+---------------------------------------------------------------
>  filenameid | integer | not null default
> nextval('filename_filenameid_seq'::regclass)
>  name       | text    | not null
> Indexes:
>     "filename_pkey" PRIMARY KEY, btree (filenameid)
>     "filename_name_idx" UNIQUE, btree (name)
>
>
> There are several queries for each job despooling :
>
> First we fill a temp table with the raw data (filename, pathname, metadata),
> using COPY (no problem here)
>
> Then we insert missing filenames in file, and missing pathnames in path,
> with this query (and the same for file) :
>
> INSERT INTO Path (Path)
>   SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a
>      WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path)
>
> These do nested loops and work very well (after a sort on batch to get rid
> from duplicates). They work reasonably fast (what one would expect when
> looping on millions of records... they do their job in a few minutes).
>
> The problem occurs with the final query, which inserts data in file, joining
> the temp table to both file and filename
>
> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)
>   SELECT batch.FileIndex,
>               batch.JobId,
>               Path.PathId,
>               Filename.FilenameId,
>               batch.LStat,
>               batch.MD5
>   FROM batch
>   JOIN Path ON (batch.Path = Path.Path)
>   JOIN Filename ON (batch.Name = Filename.Name)
>
> This one has two split personnalities, depending on how many records are in
> batch.
> For small batch tables, it does nested loops.
> For big batch tables (more than around one million initially) it decides to
> hash join path (still ok, it's reasonably small) and then filename to batch
> before starting. And that's when the problems begin The behaviour seems
> logicial to me, it should go to hash join when batch gets bigger, but it
> seems to be much too early here, considering the size of filename.
>
> First of all, performance remains much better on nested loops, except for
> extremely big batches (i'd say over 30 million, extrapolating from the times
> I'm seeing with 10 millions records), so if I disable hash/merge joins, I get
> my performance back on these queries (they execute in around the same time as
> the searches in path and filename above). So I found a way to make most of my
> queries do nested loops (I'll come back to this later)
>
> Second, If there is more than one of these big sorts, performance degrades
> drastically (we had 2 of them this weekend, they both took 24 hours to
> complete). This is probably due to our quite bad disk setup (we didn't have a
> big budget for this). There was no swapping of linux
>
>
> So all of this makes me think there is a cost evaluation problem in this
> setup : with the default values, postgresql seems to underestimate the cost
> of sorting here (the row estimates were good, no problem with that).
> PostgreSQL seems to think that at around 1 million records in file it should
> go with a hash join on filename and path, so we go on hashing the 17 million
> records of path, the 80 millions of filename, then joining and inserting into
> file (we're talking about sorting around 15 GB for each of these despools in
> parallel).
>
> Temporarily I moved the problem at a bit higher sizes of batch by changing
> random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
> apprentice sorcerer with this, as I told postgreSQL that fetching rows from
> disk are much cheaper than they are. These values are, I think, completely
> abnormal. Doing this, I got the change of plan at around 8 million. And had 2
> of them at 9 millions at the same time this weekend, and both of the took 24
> hours, while the nested loops before the join (for inserts in path and
> filename) did their work in minutes...
>
> So, finally, to my questions :
> - Is it normal that PostgreSQL is this off base on these queries (sorry I
> don't have the plans, if they are required I'll do my best to get some, but
> they really are the two obvious plans for this kind of query). What could
> make it choose the hash join for too small batch tables ?
> - Is changing the 2 costs the way to go ?
> - Is there a way to tell postgreSQL that it's more costly to sort than it
> thinks ? (instead of telling it that fetching data from disk doesn't cost
> anything).
>
>
>
>
> Here are the other non-default values from my configuration :
>
> shared_buffers = 2GB
> work_mem = 64MB
> maintenance_work_mem = 256MB
> max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> effective_cache_size = 800MB
> default_statistics_target = 1000
>
> PostgreSQL is 8.3.5 on Debian Lenny
>
>
>
> I'm sorry for this very long email, I tried to be as precise as I could, but
> don't hesitate to ask me more.
>
> Thanks for helping.
>
> Marc Cousin
>
>


Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
We regularly do all of dbcheck. This is our real configuration, there are
really lots of servers and lots of files (500 million files backed up every
month).

But thanks for mentionning that.

The thing is we're trying to improve bacula with postgresql in order to make
it able to bear with this kind of volumes. So we are looking for things to
improve bacula and postgresql tuning to make it cope with the queries
mentionned (or rewrite the queries or the way to do inserts, that may not be
a problem either)

On Monday 13 July 2009 16:37:06 SystemManagement wrote:
> Hi,
> just a remark, as the number of entries seems to be very high:
> Did you ever activate bacula's program dbcheck Option 16?
>
> Regards
>
> Reiner

Re: Very big insert/join performance problem (bacula)

From
Andres Freund
Date:
Hi Marc,

I don't have really extensive comments, but I found two small things...

On Monday 13 July 2009 15:40:18 Marc Cousin wrote:
> I'm trying to solve big performance issues with PostgreSQL + bacula while
> inserting very big sets of records.
>
> I'm sorry, this email will be a long one, as I've already spent quite a lot
> of time on the issue, I don't want to waste your time speculating on things
> I may already have done, and the problem is (or seems to me) a bit complex.
> The other problem is that I don't have the explain plans to provide with
> the email right now. I'll try to use this as a way to push 8.4 in this
> setup, to dump all these plans with autoexplain (queries are on temporary
> tables, so a bit tricky to get).
>
> Let me first explain or remind how this works. Bacula is a backup solution
> and is trying to insert its metadatas at the end of backups (file name,
> directory name, size, etc ...)
> For what we are interested in, there are 3 tables :
> - file
> - filename
> - path
>
> file is the one containing most records. It's the real metadata. filename
> and path just contain an id and the real file or directory name (to save
> some space with redundant names).
>
> Before explaining the issue, just some information about sizing here :
>
> file is 1.1 billion records for 280GB (with indexes).
>
>    Column   |  Type   |                       Modifiers
> ------------+---------+----------------------------------------------------
>--- fileid     | bigint  | not null default
> nextval('file_fileid_seq'::regclass) fileindex  | integer | not null
> default 0
>  jobid      | integer | not null
>  pathid     | integer | not null
>  filenameid | integer | not null
>  markid     | integer | not null default 0
>  lstat      | text    | not null
>  md5        | text    | not null
> Indexes:
>     "file_pkey" UNIQUE, btree (fileid)
>     "file_fp_idx" btree (filenameid, pathid)
>     "file_jpfid_idx" btree (jobid, pathid, filenameid)
>
>
> path is 17 million for 6 GB
>
>  Column |  Type   |                       Modifiers
> --------+---------+-------------------------------------------------------
>  pathid | integer | not null default nextval('path_pathid_seq'::regclass)
>  path   | text    | not null
> Indexes:
>     "path_pkey" PRIMARY KEY, btree (pathid)
>     "path_name_idx" UNIQUE, btree (path)
>
> filename is 80 million for 13GB
>
>    Column   |  Type   |                           Modifiers
> ------------+---------+----------------------------------------------------
>----------- filenameid | integer | not null default
> nextval('filename_filenameid_seq'::regclass)
>  name       | text    | not null
> Indexes:
>     "filename_pkey" PRIMARY KEY, btree (filenameid)
>     "filename_name_idx" UNIQUE, btree (name)
>
>
> There are several queries for each job despooling :
>
> First we fill a temp table with the raw data (filename, pathname,
> metadata), using COPY (no problem here)
>
> Then we insert missing filenames in file, and missing pathnames in path,
> with this query (and the same for file) :
>
> INSERT INTO Path (Path)
>   SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a
>      WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path)
>
> These do nested loops and work very well (after a sort on batch to get rid
> from duplicates). They work reasonably fast (what one would expect when
> looping on millions of records... they do their job in a few minutes).
While this is not your questions, I still noticed you seem to be on 8.3 - it
might be a bit faster to use GROUP BY instead of DISTINCT.

> The problem occurs with the final query, which inserts data in file,
> joining the temp table to both file and filename
>
> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5)
>   SELECT batch.FileIndex,
>               batch.JobId,
>               Path.PathId,
>               Filename.FilenameId,
>               batch.LStat,
>               batch.MD5
>   FROM batch
>   JOIN Path ON (batch.Path = Path.Path)
>   JOIN Filename ON (batch.Name = Filename.Name)
>
> This one has two split personnalities, depending on how many records are in
> batch.
> For small batch tables, it does nested loops.
> For big batch tables (more than around one million initially) it decides to
> hash join path (still ok, it's reasonably small) and then filename to batch
> before starting. And that's when the problems begin The behaviour seems
> logicial to me, it should go to hash join when batch gets bigger, but it
> seems to be much too early here, considering the size of filename.
>
> First of all, performance remains much better on nested loops, except for
> extremely big batches (i'd say over 30 million, extrapolating from the
> times I'm seeing with 10 millions records), so if I disable hash/merge
> joins, I get my performance back on these queries (they execute in around
> the same time as the searches in path and filename above). So I found a way
> to make most of my queries do nested loops (I'll come back to this later)
>
> Second, If there is more than one of these big sorts, performance degrades
> drastically (we had 2 of them this weekend, they both took 24 hours to
> complete). This is probably due to our quite bad disk setup (we didn't have
> a big budget for this). There was no swapping of linux
>
>
> So all of this makes me think there is a cost evaluation problem in this
> setup : with the default values, postgresql seems to underestimate the cost
> of sorting here (the row estimates were good, no problem with that).
> PostgreSQL seems to think that at around 1 million records in file it
> should go with a hash join on filename and path, so we go on hashing the 17
> million records of path, the 80 millions of filename, then joining and
> inserting into file (we're talking about sorting around 15 GB for each of
> these despools in parallel).
>
> Temporarily I moved the problem at a bit higher sizes of batch by changing
> random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
> apprentice sorcerer with this, as I told postgreSQL that fetching rows from
> disk are much cheaper than they are. These values are, I think, completely
> abnormal. Doing this, I got the change of plan at around 8 million. And had
> 2 of them at 9 millions at the same time this weekend, and both of the took
> 24 hours, while the nested loops before the join (for inserts in path and
> filename) did their work in minutes...
>
> So, finally, to my questions :
> - Is it normal that PostgreSQL is this off base on these queries (sorry I
> don't have the plans, if they are required I'll do my best to get some, but
> they really are the two obvious plans for this kind of query). What could
> make it choose the hash join for too small batch tables ?
> - Is changing the 2 costs the way to go ?
> - Is there a way to tell postgreSQL that it's more costly to sort than it
> thinks ? (instead of telling it that fetching data from disk doesn't cost
> anything).

> Here are the other non-default values from my configuration :
>
> shared_buffers = 2GB
> work_mem = 64MB
> maintenance_work_mem = 256MB
> max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> effective_cache_size = 800MB
> default_statistics_target = 1000
Your effective_cache_size is really small for the system you seem to have - its
the size of IO caching your os is doing and uses no resources itself. And
800MB of that on a system with that amount of data seems a bit unlikely ;-)

Using `free` you can see the amount of io caching your OS is doing atm. in the
'cached' column.

That possibly might tip some plans in a direction you prefer.

What kind of machine are you running this on?

Andres

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
>
> While this is not your questions, I still noticed you seem to be on 8.3 -
> it might be a bit faster to use GROUP BY instead of DISTINCT.
It didn't do a big difference, I already tried that before for this query.
Anyway, as you said, it's not the query having problems :)


> Your effective_cache_size is really small for the system you seem to have -
> its the size of IO caching your os is doing and uses no resources itself.
> And 800MB of that on a system with that amount of data seems a bit unlikely
> ;-)
>
> Using `free` you can see the amount of io caching your OS is doing atm. in
> the 'cached' column.
>
> That possibly might tip some plans in a direction you prefer.
>
> What kind of machine are you running this on?

I played with this parameter too, and it didn't influence the plan. Anyway, the
doc says it's the OS cache available for one query, and there may be a lot of
insert queries at the same time, so I chose to be conservative with this
value. I tried it with 8GB too, the plans were the same.

The OS cache is around 8-10GB by the way.

The machine is a dell PE2900, with 6 disks dedicated to this database (raid 10
config)

Re: Very big insert/join performance problem (bacula)

From
Richard Huxton
Date:
Marc Cousin wrote:
>
>> Your effective_cache_size is really small for the system you seem to have -
>> its the size of IO caching your os is doing and uses no resources itself.
>> And 800MB of that on a system with that amount of data seems a bit unlikely
>> ;-)
>>
>> Using `free` you can see the amount of io caching your OS is doing atm. in
>> the 'cached' column.
>>
>> That possibly might tip some plans in a direction you prefer.
>>
>> What kind of machine are you running this on?
>
> I played with this parameter too, and it didn't influence the plan. Anyway, the
> doc says it's the OS cache available for one query,

No they don't. I'm guessing you're getting mixed up with work_mem.

 > and there may be a lot of
> insert queries at the same time, so I chose to be conservative with this
> value. I tried it with 8GB too, the plans were the same.
>
> The OS cache is around 8-10GB by the way.

That's what you need to set effective_cache_size to then.

--
   Richard Huxton
   Archonet Ltd

Re: Very big insert/join performance problem (bacula)

From
Richard Huxton
Date:
Marc Cousin wrote:
>
> Temporarily I moved the problem at a bit higher sizes of batch by changing
> random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel like an
> apprentice sorcerer with this, as I told postgreSQL that fetching rows from
> disk are much cheaper than they are. These values are, I think, completely
> abnormal.

They certainly don't have anything to do with reality. Try putting them
back to (say) seq_page_cost=1 and random_page_cost=2.

> So, finally, to my questions :
> - Is it normal that PostgreSQL is this off base on these queries (sorry I
> don't have the plans, if they are required I'll do my best to get some, but
> they really are the two obvious plans for this kind of query). What could
> make it choose the hash join for too small batch tables ?

No point in speculating without plans.

> - Is changing the 2 costs the way to go ?

Not the way you have.

> - Is there a way to tell postgreSQL that it's more costly to sort than it
> thinks ? (instead of telling it that fetching data from disk doesn't cost
> anything).

That's what the configuration settings do. But if you put a couple way
off from reality it'll be pure chance if it gets any estimates right.

> Here are the other non-default values from my configuration :
>
> shared_buffers = 2GB
> work_mem = 64MB

Set this *much* higher when you are running your bulk imports. You can
do it per-connection. Try 256MB, 512MB, 1GB (but keep an eye on total
memory used).

> maintenance_work_mem = 256MB
> max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> effective_cache_size = 800MB

See other emails on this one.

> default_statistics_target = 1000

Probably don't need this for all columns, but it won't cause problems
with these queries.

--
   Richard Huxton
   Archonet Ltd

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Le Tuesday 14 July 2009 10:15:21, vous avez écrit :
> Marc Cousin wrote:
> >> Your effective_cache_size is really small for the system you seem to
> >> have - its the size of IO caching your os is doing and uses no resources
> >> itself. And 800MB of that on a system with that amount of data seems a
> >> bit unlikely ;-)
> >>
> >> Using `free` you can see the amount of io caching your OS is doing atm.
> >> in the 'cached' column.
> >>
> >> That possibly might tip some plans in a direction you prefer.
> >>
> >> What kind of machine are you running this on?
> >
> > I played with this parameter too, and it didn't influence the plan.
> > Anyway, the doc says it's the OS cache available for one query,
>
> No they don't. I'm guessing you're getting mixed up with work_mem.

I'm not (from the docs) :
effective_cache_size (integer)
 Sets the planner's assumption about the effective size of the disk cache that
is available to a single query

I trust you, of course, but then I think maybe this should be rephrased in the
doc then, because I understand it like I said ... I always had a doubt about
this sentence, and that's why I tried both 800MB and 8GB for this parameter.

>
>  > and there may be a lot of
> >
> > insert queries at the same time, so I chose to be conservative with this
> > value. I tried it with 8GB too, the plans were the same.
> >
> > The OS cache is around 8-10GB by the way.
>
> That's what you need to set effective_cache_size to then.
Ok but that doesn't change a thing for this query (I had a doubt on this
parameter and tried with both 800MB and 8GB)

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Le Tuesday 14 July 2009 10:23:25, Richard Huxton a écrit :
> Marc Cousin wrote:
> > Temporarily I moved the problem at a bit higher sizes of batch by
> > changing random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel
> > like an apprentice sorcerer with this, as I told postgreSQL that fetching
> > rows from disk are much cheaper than they are. These values are, I think,
> > completely abnormal.
>
> They certainly don't have anything to do with reality. Try putting them
> back to (say) seq_page_cost=1 and random_page_cost=2.

That's the first thing I tried (it seemed more sensible), and it didn't work. I
can't put them back to these values for more than one test query, the server
really died before I changed the settings.

>
> > So, finally, to my questions :
> > - Is it normal that PostgreSQL is this off base on these queries (sorry I
> > don't have the plans, if they are required I'll do my best to get some,
> > but they really are the two obvious plans for this kind of query). What
> > could make it choose the hash join for too small batch tables ?
>
> No point in speculating without plans.

Ok, I'll try to have them tomorrow.

>
> > - Is changing the 2 costs the way to go ?
>
> Not the way you have.
That's what I thought, and the reason I posted :)

>
> > - Is there a way to tell postgreSQL that it's more costly to sort than it
> > thinks ? (instead of telling it that fetching data from disk doesn't cost
> > anything).
>
> That's what the configuration settings do. But if you put a couple way
> off from reality it'll be pure chance if it gets any estimates right.
>
> > Here are the other non-default values from my configuration :
> >
> > shared_buffers = 2GB
> > work_mem = 64MB
>
> Set this *much* higher when you are running your bulk imports. You can
> do it per-connection. Try 256MB, 512MB, 1GB (but keep an eye on total
> memory used).

I'll try that. But anyhow, I've got much better performance when not doing the
hash join. I'll get back with the plans as soon as possible.

>
> > maintenance_work_mem = 256MB
> > max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> > effective_cache_size = 800MB
>
> See other emails on this one.
>
> > default_statistics_target = 1000
>
> Probably don't need this for all columns, but it won't cause problems
> with these queries.

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
This mail contains the asked plans :

I've done them with the different configurations, as I had done the effort of setting up the whole thing :)
Stats were updated between all runs. Each time is the first run of the query (that's what we have in production with
bacula)
And I added the executor stats, in case ...

By the way, I think I must mention it, the whole thing runs over DRBD, but with 2 gigabyte links between the master and
theslave. 
And I tried deactivating replication when things got really slow (despooling in 24 hours), it changed nothing (sorts
werea bit faster,  
around 20%). Server is 12 GB ram, 1 quad core xeon E5335.

PostgreSQL starts to hash filename a bit later than what I said in the first mail, because it's become bigger (it was
around30-40 million last time I did the tests). 

This is the query (temp_mc is the table I've created to do my tests ...):

explain ANALYZE SELECT batch.FileIndex,
              batch.JobId,
              Path.PathId,
              Filename.FilenameId,
              batch.LStat,
              batch.MD5
  FROM temp_mc AS batch
  JOIN Path ON (batch.Path = Path.Path)
  JOIN Filename ON (batch.Name = Filename.Name);

++++++++++++++++++++++++++++++++++++++++++++++++++++
Plan 1
around 1 million records to insert, seq_page_cost 1, random_page_cost 4

LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       380.143452 elapsed 79.000938 user 44.386774 system sec
!       [415.785985 user 155.733732 sys total]
!       15848728/12934936 [24352752/50913184] filesystem blocks in/out
!       0/44188 [86/987512] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       93812/40706 [405069/184511] voluntary/involuntary context switches
! buffer usage stats:
!       Shared blocks:     877336 read,          0 written, buffer hit rate = 6.75%
!       Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
!       Direct blocks:          0 read,          0 written

 Hash Join  (cost=3923929.71..5131377.91 rows=1286440 width=91) (actual time=234021.194..380018.709 rows=1286440
loops=1)
   Hash Cond: (batch.name = filename.name)
   ->  Hash Join  (cost=880140.87..1286265.62 rows=1286440 width=102) (actual time=23184.959..102400.782 rows=1286440
loops=1)
         Hash Cond: (batch.path = path.path)
         ->  Seq Scan on temp_mc batch  (cost=0.00..49550.40 rows=1286440 width=189) (actual time=0.007..342.396
rows=1286440loops=1) 
         ->  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual time=23184.196..23184.196 rows=16732049
loops=1)
               ->  Seq Scan on path  (cost=0.00..425486.72 rows=16746972 width=92) (actual time=0.004..7318.850
rows=16732049loops=1) 
   ->  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual time=210831.840..210831.840 rows=79094418
loops=1)
         ->  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual time=46.324..148887.662
rows=79094418loops=1) 
 Total runtime: 380136.601 ms

++++++++++++++++++++++++++++++++++++++++++++++++++++
Plan 2
the same insert, with seq_page_cost to 0.01 and random_page_cost to 0.02

DETAIL:  ! system usage stats:
!       42.378039 elapsed 28.277767 user 12.192762 system sec
!       [471.865489 user 180.499280 sys total]
!       0/4072368 [24792848/59059032] filesystem blocks in/out
!       0/0 [86/989858] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       1061/9131 [429738/200320] voluntary/involuntary context switches
! buffer usage stats:
!       Shared blocks:     251574 read,          0 written, buffer hit rate = 96.27%
!       Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
!       Direct blocks:          0 read,          0 written
LOG:  duration: 42378.373 ms  statement:
                                                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=381840.21..1012047.92 rows=1286440 width=91) (actual time=20284.387..42242.955 rows=1286440 loops=1)
   Hash Cond: (batch.path = path.path)
   ->  Nested Loop  (cost=0.00..583683.91 rows=1286440 width=178) (actual time=0.026..10333.636 rows=1286440 loops=1)
         ->  Seq Scan on temp_mc batch  (cost=0.00..13231.26 rows=1286440 width=189) (actual time=0.008..380.361
rows=1286440loops=1) 
         ->  Index Scan using filename_name_idx on filename  (cost=0.00..0.43 rows=1 width=35) (actual
time=0.006..0.007rows=1 loops=1286440) 
               Index Cond: (filename.name = batch.name)
   ->  Hash  (cost=170049.89..170049.89 rows=16746972 width=92) (actual time=20280.729..20280.729 rows=16732049
loops=1)
         ->  Seq Scan on path  (cost=0.00..170049.89 rows=16746972 width=92) (actual time=0.005..4560.872 rows=16732049
loops=1)
 Total runtime: 42371.362 ms


The thing is that this query is ten times faster, but it's not the main point : this query stays reasonably fast even
whenthere are 
20 of it running simultaneously. Of course, as it's faster, it also has less tendancy to pile up than the other one
does.

When I get 10-20 of the first one running at the same time, the queries get extremely slow (I guess they are fighting
for accessing the sort disk, because I see a lot of smaller IOs instead of the big and nice IOs I see when only one of
these queries runs). The IO subsystem seems to degrade very much when there is a lot of concurrent activity on this
server.
For instance, last weekend, we had to 8 million simultaneous backups, with the hash join plan. It took 24 hours for
themto complete. 
If they had been alone on the server, it would have taken around 1 hour for each of them.


Of course, with these smaller cost values, there is still a batch size when the plans goes back to the first one.

++++++++++++++++++++++++++++++++++++++++++++++++++++
Plan 3
seq_page_cost to 1, random_page_cost to 2. Plan is the same as Plan 1.

-------------------------------------------------------------------------------------
 Hash Join  (cost=3923961.69..5131416.88 rows=1286440 width=91)
   Hash Cond: (batch.name = filename.name)
   ->  Hash Join  (cost=880144.31..1286270.06 rows=1286440 width=102)
         Hash Cond: (batch.path = path.path)
         ->  Seq Scan on temp_mc batch  (cost=0.00..49550.40 rows=1286440 width=189)
         ->  Hash  (cost=425488.36..425488.36 rows=16747036 width=92)
               ->  Seq Scan on path  (cost=0.00..425488.36 rows=16747036 width=92)
   ->  Hash  (cost=1436989.50..1436989.50 rows=79105350 width=35)
         ->  Seq Scan on filename  (cost=0.00..1436989.50 rows=79105350 width=35)
(9 rows)

++++++++++++++++++++++++++++++++++++++++++++++++++++
Plan 4:
seq_page_cost to 1, random_page_cost back to 4, raise work_mem to 512MB. Same as Plan 1
Estimated cost hasn't changed. Is this normal ?

-------------------------------------------------------------------------------------
 Hash Join  (cost=3923961.69..5131416.88 rows=1286440 width=91)
   Hash Cond: (batch.name = filename.name)
   ->  Hash Join  (cost=880144.31..1286270.06 rows=1286440 width=102)
         Hash Cond: (batch.path = path.path)
         ->  Seq Scan on temp_mc batch  (cost=0.00..49550.40 rows=1286440 width=189)
         ->  Hash  (cost=425488.36..425488.36 rows=16747036 width=92)
               ->  Seq Scan on path  (cost=0.00..425488.36 rows=16747036 width=92)
   ->  Hash  (cost=1436989.50..1436989.50 rows=79105350 width=35)
         ->  Seq Scan on filename  (cost=0.00..1436989.50 rows=79105350 width=35)
(9 rows)

Maybe this one would scale a bit better, as there would be less sort files ? I couldn't execute it and get reliable
times(sorry, the production period has started). 
If necessary, I can run it again tomorrow. I had to cancel the query after more than 15 minutes, to let the server do
it'sregular work. 



There are other things I am thinking of : maybe it would be better to have sort space on another (and not DBRD'ded)
raidset ? we have a quite 
cheap setup right now for the database, and I think maybe this would help scale better. I can get a filesystem in
anothervolume group, which is not used that much for now. 

Anyway, thanks for all the ideas you could have on this.

Marc.

Re: Very big insert/join performance problem (bacula)

From
Alvaro Herrera
Date:
Marc Cousin escribió:

> There are other things I am thinking of : maybe it would be better to have sort space on another (and not DBRD'ded)
raidset ? we have a quite 
> cheap setup right now for the database, and I think maybe this would help scale better. I can get a filesystem in
anothervolume group, which is not used that much for now. 

You know, that's the first thing it came to me when I read you're using
DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Le Wednesday 15 July 2009 15:45:01, Alvaro Herrera a écrit :
> Marc Cousin escribió:
> > There are other things I am thinking of : maybe it would be better to
> > have sort space on another (and not DBRD'ded) raid set ? we have a quite
> > cheap setup right now for the database, and I think maybe this would help
> > scale better. I can get a filesystem in another volume group, which is
> > not used that much for now.
>
> You know, that's the first thing it came to me when I read you're using
> DRDB.  Have you tried setting temp_tablespace to a non-replicated disk?

I wish I could easily. I'm not entitled to tune the database, only to give
directives. I've given this one, but I don't know when it will be done. I'll
keep you informed on this one, but I don't have my hopes too high.

As mentionned before, I tried to deactivate DRBD (still using the DRBD device,
but not connected to the other node, so it has almost no effect). It didn't
change much (performance was a bit (around 20% better).

Anyway, the thing is that :
- big sorts kill my machine when there are more that 5 of them. I think it is
a system problem (raid, filesystem, linux tuning, don't really know, I'll have
to dig into this, but it will be complicated, for human reasons :) )
- the plan through nested loops is faster anyway, and I think it's because
there is only a small fraction of filename and path that is used (most files
backed up have the same name or path, as we save 600 machines with mostly 2
OSes, linux and windows), so the hot parts of these 2 tables are extremely
likely to be in the database or linux cache (buffer hit rate was 97% in the
example provided). Moreover, the first two queries of the insert procedure fill
the cache for us...




Re: Very big insert/join performance problem (bacula)

From
Devin Ben-Hur
Date:
Marc Cousin wrote:
> This mail contains the asked plans :
> Plan 1
> around 1 million records to insert, seq_page_cost 1, random_page_cost 4

>          ->  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual time=23184.196..23184.196 rows=16732049
loops=1)
>                ->  Seq Scan on path  (cost=0.00..425486.72 rows=16746972 width=92) (actual time=0.004..7318.850
rows=16732049loops=1) 

>    ->  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual time=210831.840..210831.840 rows=79094418
loops=1)
>          ->  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual time=46.324..148887.662
rows=79094418loops=1) 

This doesn't address the cost driving plan question, but I think it's a
bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
some terrible bloat on filename that's not present on path?  If that seq
scan time on filename were proportionate to path this plan would
complete about two minutes faster (making it only 6 times slower instead
of 9 :).

--
-Devin

Re: Very big insert/join performance problem (bacula)

From
Scott Carey
Date:

On 7/15/09 4:56 PM, "Devin Ben-Hur" <dbenhur@whitepages.com> wrote:

> Marc Cousin wrote:
>> This mail contains the asked plans :
>> Plan 1
>> around 1 million records to insert, seq_page_cost 1, random_page_cost 4
>
>>          ->  Hash  (cost=425486.72..425486.72 rows=16746972 width=92) (actual
>> time=23184.196..23184.196 rows=16732049 loops=1)
>>                ->  Seq Scan on path  (cost=0.00..425486.72 rows=16746972
>> width=92) (actual time=0.004..7318.850 rows=16732049 loops=1)
>
>>    ->  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual
>> time=210831.840..210831.840 rows=79094418 loops=1)
>>          ->  Seq Scan on filename  (cost=0.00..1436976.15 rows=79104615
>> width=35) (actual time=46.324..148887.662 rows=79094418 loops=1)
>
> This doesn't address the cost driving plan question, but I think it's a
> bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
> while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
> ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
> some terrible bloat on filename that's not present on path?  If that seq
> scan time on filename were proportionate to path this plan would
> complete about two minutes faster (making it only 6 times slower instead
> of 9 :).


Bloat is possible.  This can be checked with VACUUM VERBOSE on the table.
Postgres has a habit of getting its table files fragmented too under certain
use cases.
Additionally, some of the table pages may have been cached in one use case
and not in another.
>
> --
> -Devin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit :
> Marc Cousin wrote:
> > This mail contains the asked plans :
> > Plan 1
> > around 1 million records to insert, seq_page_cost 1, random_page_cost 4
> >
> >          ->  Hash  (cost=425486.72..425486.72 rows=16746972 width=92)
> > (actual time=23184.196..23184.196 rows=16732049 loops=1) ->  Seq Scan on
> > path  (cost=0.00..425486.72 rows=16746972 width=92) (actual
> > time=0.004..7318.850 rows=16732049 loops=1)
> >
> >    ->  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35) (actual
> > time=210831.840..210831.840 rows=79094418 loops=1) ->  Seq Scan on
> > filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual
> > time=46.324..148887.662 rows=79094418 loops=1)
>
> This doesn't address the cost driving plan question, but I think it's a
> bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
> while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
> ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
> some terrible bloat on filename that's not present on path?  If that seq
> scan time on filename were proportionate to path this plan would
> complete about two minutes faster (making it only 6 times slower instead
> of 9 :).
Much simpler than that I think : there is a bigger percentage of path that is
used all the time than of filename. The database used is the production
database, so there were other insert queries running a few minutes before I
got this plan.

But I'll give it a look today and come back with bloat and cache information
on these 2 tables.

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
On Thursday 16 July 2009 07:20:18 Marc Cousin wrote:
> Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit :
> > Marc Cousin wrote:
> > > This mail contains the asked plans :
> > > Plan 1
> > > around 1 million records to insert, seq_page_cost 1, random_page_cost 4
> > >
> > >          ->  Hash  (cost=425486.72..425486.72 rows=16746972 width=92)
> > > (actual time=23184.196..23184.196 rows=16732049 loops=1) ->  Seq Scan
> > > on path  (cost=0.00..425486.72 rows=16746972 width=92) (actual
> > > time=0.004..7318.850 rows=16732049 loops=1)
> > >
> > >    ->  Hash  (cost=1436976.15..1436976.15 rows=79104615 width=35)
> > > (actual time=210831.840..210831.840 rows=79094418 loops=1) ->  Seq Scan
> > > on filename  (cost=0.00..1436976.15 rows=79104615 width=35) (actual
> > > time=46.324..148887.662 rows=79094418 loops=1)
> >
> > This doesn't address the cost driving plan question, but I think it's a
> > bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs,
> > while a seqscan of 79M 35-byte rows takes 149secs.  It's about 4:1 row
> > ratio, less than 2:1 byte ratio, but a 20:1 time ratio.  Perhaps there's
> > some terrible bloat on filename that's not present on path?  If that seq
> > scan time on filename were proportionate to path this plan would
> > complete about two minutes faster (making it only 6 times slower instead
> > of 9 :).
>
> Much simpler than that I think : there is a bigger percentage of path that
> is used all the time than of filename. The database used is the production
> database, so there were other insert queries running a few minutes before I
> got this plan.
>
> But I'll give it a look today and come back with bloat and cache
> information on these 2 tables.

Here are the stats for filename :

SELECT * from pgstattuple('public.filename');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent 

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 5308719104 |    79338344 | 4717466438 |         88.86 |                0 |              0 |                  0 |
11883396|         0.22 

So I guess it's not bloated.

I checked in the cache, the times displayed before were with path in the cache. filename couldn't stay in the cache, as
it'stoo big. 

Re: Very big insert/join performance problem (bacula)

From
"Kevin Grittner"
Date:
Marc Cousin <cousinmarc@gmail.com> wrote:

> the hot parts of these 2 tables are extremely likely to be in the
> database or linux cache (buffer hit rate was 97% in the example
> provided). Moreover, the first two queries of the insert procedure
> fill the cache for us...

This would be why the optimizer does the best job estimating the
relative costs of various plans when you set the random_page_cost and
seq_page_cost very low.

-Kevin

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Le Thursday 16 July 2009 22:07:25, Kevin Grittner a écrit :
> Marc Cousin <cousinmarc@gmail.com> wrote:
> > the hot parts of these 2 tables are extremely likely to be in the
> > database or linux cache (buffer hit rate was 97% in the example
> > provided). Moreover, the first two queries of the insert procedure
> > fill the cache for us...
>
> This would be why the optimizer does the best job estimating the
> relative costs of various plans when you set the random_page_cost and
> seq_page_cost very low.
>
> -Kevin


Ok, so to sum it up, should I keep these values (I hate doing this :) ) ?
Would there be a way to approximately evaluate them regarding to the expected
buffer hit ratio of the query ?


Re: Very big insert/join performance problem (bacula)

From
"Kevin Grittner"
Date:
Marc Cousin <cousinmarc@gmail.com> wrote:

> to sum it up, should I keep these values (I hate doing this :) ) ?

Many people need to set the random_page_cost and/or seq_page_cost to
reflect the overall affect of caching on the active portion of the
data.  We set our fully-cached databases to 0.1 for both.  Databases
with less caching usually wind up at 2 and 1.  We have one database
which does best at 0.5 and 0.3.  My advice is to experiment and try to
find a pair of settings which works well for most or all of your
queries.  If you have a few which need a different setting, you can
set a special value right before running the query, but I've always
been able to avoid that (thankfully).

> Would there be a way to approximately evaluate them regarding to
> the expected buffer hit ratio of the query ?

Nothing query-specific except setting them on the connection right
before the query (and setting them back or discarding the connection
afterward).  Well, that and making sure that effective_cache_size
reflects reality.

-Kevin

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
Le Thursday 16 July 2009 23:54:54, Kevin Grittner a écrit :
> Marc Cousin <cousinmarc@gmail.com> wrote:
> > to sum it up, should I keep these values (I hate doing this :) ) ?
>
> Many people need to set the random_page_cost and/or seq_page_cost to
> reflect the overall affect of caching on the active portion of the
> data.  We set our fully-cached databases to 0.1 for both.  Databases
> with less caching usually wind up at 2 and 1.  We have one database
> which does best at 0.5 and 0.3.  My advice is to experiment and try to
> find a pair of settings which works well for most or all of your
> queries.  If you have a few which need a different setting, you can
> set a special value right before running the query, but I've always
> been able to avoid that (thankfully).
>
> > Would there be a way to approximately evaluate them regarding to
> > the expected buffer hit ratio of the query ?
>
> Nothing query-specific except setting them on the connection right
> before the query (and setting them back or discarding the connection
> afterward).  Well, that and making sure that effective_cache_size
> reflects reality.
>
> -Kevin


OK, thanks a lot.

A last thing :

As mentionned in another mail from the thread (from Richard Huxton), I felt
this message in the documentation a bit misleading :

effective_cache_size (integer)
 Sets the planner's assumption about the effective size of the disk cache that
is available to a single query

I don't really know what the 'a single query' means. I interpreted that as
'divide it by the amount of queries typically running in parallel on the
database'. Maybe it should be rephrased ? (I may not be the one
misunderstanding it).

Re: Very big insert/join performance problem (bacula)

From
Devin Ben-Hur
Date:
Marc Cousin wrote:
> Le Thursday 16 July 2009 22:07:25, Kevin Grittner a écrit :
>> Marc Cousin <cousinmarc@gmail.com> wrote:
>>> the hot parts of these 2 tables are extremely likely to be in the
>>> database or linux cache (buffer hit rate was 97% in the example
>>> provided). Moreover, the first two queries of the insert procedure
>>> fill the cache for us...
>
> Ok, so to sum it up, should I keep these values (I hate doing this :) ) ?
> Would there be a way to approximately evaluate them regarding to the expected
> buffer hit ratio of the query ?

cached_buffer_cost = 0.01
effective_page_cost =
   ((1 - expected_cache_hit_ratio) * standard_page_cost)
+ (expected_cache_hit_ratio       * cached_buffer_cost)

If your assumption is only about these queries in particular, rather
than applicable across the board, you should set the page_costs just for
this query and reset them or close the connection after.

--
-Devin

Re: Very big insert/join performance problem (bacula)

From
"Kevin Grittner"
Date:
Marc Cousin <cousinmarc@gmail.com> wrote:

> As mentionned in another mail from the thread (from Richard Huxton),
> I felt this message in the documentation a bit misleading :
>
> effective_cache_size (integer)
>  Sets the planner's assumption about the effective size of the disk
>  cache that is available to a single query
>
> I don't really know what the 'a single query' means. I interpreted
> that as 'divide it by the amount of queries typically running in
> parallel on the database'. Maybe it should be rephrased ? (I may not
> be the one misunderstanding it).

I'm afraid I'll have to let someone else speak to that; I only have a
vague sense of its impact.  I've generally gotten good results setting
that to the available cache space on the machine.  If I'm running
multiple database clusters on one machine, I tend to hedge a little
and set it lower to allow for some competition.

-Kevin

Re: Very big insert/join performance problem (bacula)

From
Robert Haas
Date:
On Thu, Jul 16, 2009 at 6:30 PM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
> Marc Cousin <cousinmarc@gmail.com> wrote:
>
>> As mentionned in another mail from the thread (from Richard Huxton),
>> I felt this message in the documentation a bit misleading :
>>
>> effective_cache_size (integer)
>>  Sets the planner's assumption about the effective size of the disk
>>  cache that is available to a single query
>>
>> I don't really know what the 'a single query' means. I interpreted
>> that as 'divide it by the amount of queries typically running in
>> parallel on the database'. Maybe it should be rephrased ? (I may not
>> be the one misunderstanding it).
>
> I'm afraid I'll have to let someone else speak to that; I only have a
> vague sense of its impact.  I've generally gotten good results setting
> that to the available cache space on the machine.  If I'm running
> multiple database clusters on one machine, I tend to hedge a little
> and set it lower to allow for some competition.

It really has very little impact.  It only affects index scans, and
even then only if effective_cache_size is less than the size of the
table.

Essentially, when this kicks in, it models the effect that if you are
index scanning a table much larger than the size of your cache, you
might have to reread some blocks that you previously read in during
*that same index scan*.

...Robert

Re: Very big insert/join performance problem (bacula)

From
Marc Cousin
Date:
> It really has very little impact.  It only affects index scans, and
> even then only if effective_cache_size is less than the size of the
> table.
>
> Essentially, when this kicks in, it models the effect that if you are
> index scanning a table much larger than the size of your cache, you
> might have to reread some blocks that you previously read in during
> *that same index scan*.


Ok, thanks for clearing that up for me. Still, I think the doc could be
improved on this point (sorry to be a bit obsessed with that, but I'm one of
the french translators, so I like the doc to be perfect :) )

Re: Very big insert/join performance problem (bacula)

From
Robert Haas
Date:
On Fri, Jul 24, 2009 at 1:13 AM, Marc Cousin<cousinmarc@gmail.com> wrote:
>> It really has very little impact.  It only affects index scans, and
>> even then only if effective_cache_size is less than the size of the
>> table.
>>
>> Essentially, when this kicks in, it models the effect that if you are
>> index scanning a table much larger than the size of your cache, you
>> might have to reread some blocks that you previously read in during
>> *that same index scan*.
>
> Ok, thanks for clearing that up for me. Still, I think the doc could be
> improved on this point (sorry to be a bit obsessed with that, but I'm one of
> the french translators, so I like the doc to be perfect :) )

Yes, I agree.  I was confused for quite a long time, too, until I read
the code.  I think many people think this value is much more important
than it really is.

(That having been said, I have no current plans to write such a doc
patch myself.)

...Robert

Re: Very big insert/join performance problem (bacula)

From
"Eric Comeau"
Date:
>>> It really has very little impact. It only affects index scans, and
>>> even then only if effective_cache_size is less than the size of the
>> table.
>>>
>>> Essentially, when this kicks in, it models the effect that if you are
>>> index scanning a table much larger than the size of your cache, you
>>> might have to reread some blocks that you previously read in during
>>> *that same index scan*.
>>
>> Ok, thanks for clearing that up for me. Still, I think the doc could be
>> improved on this point (sorry to be a bit obsessed with that, but I'm one
>> of
>> the french translators, so I like the doc to be perfect :) )
>
>Yes, I agree.  I was confused for quite a long time, too, until I read
>the code.  I think many people think this value is much more important
>than it really is.
>
>(That having been said, I have no current plans to write such a doc
>patch myself.)
>
>...Robert

How about adding a comment to the wiki performance page....
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server