Re: [HACKERS] Re: INSERT/UPDATE waiting (another example) - Mailing list pgsql-sql

From Tom Lane
Subject Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)
Date
Msg-id 2314.926269093@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)  (Wayne Piekarski <wayne@senet.com.au>)
Responses Re: [HACKERS] Re: INSERT/UPDATE waiting (another example)  (Wayne Piekarski <wayne@senet.com.au>)
List pgsql-sql
Wayne Piekarski <wayne@senet.com.au> writes:
> Currently, I start up postmaster with -B 192, which I guess puts it below
> the value of 256 which causes problems. Apart from when I got past 256
> buffers, does the patch fix anything else that might be causing problems?

Yes: if you have more than 256 active locks then you also will have
problems with the unpatched code.  I don't know how to relate that to
any easily-measured external information, but since you have a bunch of
concurrently running backends it seems possible that you are running
into locktable problems.  Anyway I do urge you to apply the patch and
see if things get better.

> however we have 5 tables which get vacuum at midnight each day, we drop
> all the indexes, vacuum, then recreate. If we don't do the index thing,
> the vacuum can take tens of minutes, which is not acceptable

Yup, it takes an unreasonable amount of time to vacuum an index for a
table in which a lot of rows have been deleted.  The drop/recreate hack
is a good workaround for now.  (Has anyone looked into why vacuum is so
slow in this case?)

> For large tables, when I perform joins, I repeatedly get hash table out of
> memory errors. So I have two tables, one called unix, with 20000 rows, and
> another called services, with 80000 rows - I am producing a result which
> contains about 20000 rows in it as well, so there is lots of data moving
> around.

Yes, the hashtable code needs work.  As a short-term workaround, you
might try disabling hashjoins entirely, which you can do by passing
the debug option "-fh" ("forbid hash").  For example,setenv PGOPTIONS "-fh"psql dbase
The optimizer will then usually choose mergejoins, which are reasonable
in performance if you have indexes on the columns being joined by.
(There ought to be a SET variable that controls this, but there isn't.)

> In most cases, the problem occurs when the optimiser mistakenly choses to
> use seq scan rather than index scan. To get around these problems, we
> initially tried increasing the -B value to larger values 

-B doesn't have any direct influence on the optimizer's choices, AFAIR.

> set COST_INDEX = '0'; set COST_HEAP = '99999999';

> The optimiser then used index scan for almost anything where possible, the
> explain output looked really expensive, but the queries actually executed
> properly even with small -B values. So this is what I do to make these big
> queries work. There are a few cases where the above set statements
> actually cause hash table out of memory as well, so you set them back to
> the defaults and then it usually works ok :)

Again, that doesn't directly prevent the optimizer from using hash
joins, it just skews the cost estimates so that index scans will be used
in preference to sequential scans, whereupon you get silly plans like
the one you quoted:

> Hash Join  (cost=30000000.00 size=43361 width=20)
>   ->  Index Scan using unix_snum_inv_index on unix
> (cost=20311001006080.00 size=20311 width=16)
>   ->  Hash  (cost=0.00 size=0 width=0)
>         ->  Index Scan using services_snum_inv_index on services
> (cost=43360999964672.00 size=43361 width=4)

This is silly because hash join doesn't care whether its inputs are
sorted or not --- the extra cost of scanning the index is just being
wasted here.  (Unless you have WHERE conditions that can be combined
with the index to allow not scanning the whole table.  From the
size estimates it looks like that might be happening for services,
but not for the unix table, so the index scan on unix is definitely
a waste of time.)

Using an index scan *is* a good idea for merge join, on the other hand,
because merge join requires sorted inputs.  (So if you don't have a
suitable index, the plan will have to include an explicit sort step
before the join.)

> One other problem related to the shared memory buffers is every so often,
> the postmaster will die with shared memory errors, and device full.

Er, could you quote the exact error reports?  This is too vague to allow
any conclusions.

> Quick question: when you do a query with a join why does the hash table
> code need to use shared memory? Can't it do the join within its own memory
> space?

It doesn't use shared memory.  It's just that for historical reasons,
the amount of private memory allocated for a hashjoin table is the same
as the amount of shared memory allocated for buffers (ie, the -B
switch).  I've been thinking of changing it to be driven by the -S
switch instead, since that seems to make more sense.
        regards, tom lane


pgsql-sql by date:

Previous
From: Herouth Maoz
Date:
Subject: Re: [SQL] TRANSLATE()
Next
From: Tom Lane
Date:
Subject: Re: [SQL] keeping OID's when copying table