Re: - Mailing list pgsql-performance

From
Subject Re:
Date
Msg-id 50235.216.80.95.13.1038861943.squirrel@www.l-i-e.com
Whole thread Raw
In response to Re:  (Rod Taylor <rbt@rbt.ca>)
Responses Re:
List pgsql-performance
>> I don't understand why PostgreSQL sometimes chooses not to use the
>> existing INDEXes to do an index scan instead of sequential scan --
>> Does it really think sequential will be faster, or does it eliminate
>> an index scan
>
> Yes, and it's generally right.
>
>> because there won't be enough hard drive or swap space to do it?
>
> Nope.  Simply because of time it takes to read from the disk.  An index
> scan makes ~ 1 read per tuple and sequential scans make one per page
> (gross simplification).

Hmmm.  An "index" is apparently nothing like I expected it to be...

Here I thought it would be some quick hash-table small data-set lookup
with a reference to the OID -- and that most of the hash tables could just
be loaded in one fell swoop.

Oh well.

>> Currently, full text search queries take on the order of 2 minutes to
>> execute.
>> We need them to be happening in 5 seconds, if at all possible.
>
> How about a couple of explains of the queries.

Explains were posted previously, but I'll do a couple more.

At its simplest, this takes 30 seconds:

explain select article.* from article where lower(text) like '%einstein%';
NOTICE:  QUERY PLAN:

Seq Scan on article  (cost=0.00..1155.01 rows=1 width=216)

Or, slightly more complex:

explain SELECT DISTINCT *, 0 + (0 + 10 * (lower(title) like '%einstein%')
::int + 10 * (lower(author_flattened) like '%einstein%') ::int + 30 *
(lower(subject_flattened) like '%einstein%') ::int + 30 * (lower(text)
LIKE '%einstein%') ::int + 9 * (substring(lower(title), 1, 20) like
'%einstein%') ::int + 25 * (substring(lower(text), 1, 20) LIKE
'%einstein%') ::int ) AS points FROM article WHERE TRUE AND (FALSE OR
(lower(title) like '%einstein%') OR (lower(author_flattened) like
'%einstein%') OR (lower(subject_flattened) like '%einstein%') OR
(lower(text) LIKE '%einstein%') ) ORDER BY points desc, volume, number,
article.article LIMIT 10, 0;
NOTICE:  QUERY PLAN:

Limit  (cost=1418.03..1418.08 rows=1 width=216)
  ->  Unique  (cost=1418.03..1418.08 rows=1 width=216)
        ->  Sort  (cost=1418.03..1418.03 rows=1 width=216)
              ->  Seq Scan on article  (cost=0.00..1418.02 rows=1 width=216)


> What kind of tuning have
> you done in postgresql.conf.

None.  Never really understood what that one memory setting would affect...

And the rest of the options seemed to be about logging output (which I
also can't seem to crank up to the level of getting query analysis out).

I RTFM, but actually comprehending what was written ... :-^

>  Whats your hardware like?

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Pentium(R) III CPU family      1400MHz
stepping        : 1
cpu MHz         : 1406.005
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr sse
bogomips        : 2804.94

        total:    used:    free:  shared: buffers:  cached:
Mem:  921235456 736669696 184565760   749568 75321344 592257024
Swap: 2097143808 15368192 2081775616
MemTotal:       899644 kB
MemFree:        180240 kB
MemShared:         732 kB
Buffers:         73556 kB
Cached:         573896 kB
SwapCached:       4480 kB
Active:         433776 kB
Inact_dirty:    182208 kB
Inact_clean:     36680 kB
Inact_target:   229376 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       899644 kB
LowFree:        180240 kB
SwapTotal:     2047992 kB
SwapFree:      2032984 kB



> Have you
> partitioned the data to separate disks in any way?

No, except when attempting to do the PostgreSQL contrib/fulltextindex we
clustered the _fti table by loading it in word order.

> Are you doing mostly (all?) reads?  Some writes?  Perhaps clustering?

Mostly reads.
Some writes by:
  Admin fixing typos, adding new articles
  Nightly cron jobs to "flatten" large-scale JOINs into text contatenations
  (We could get rid of that and go back to the JOINs, now that we've
figured out that it's really the full text search that's killing us, not
the JOINs)

> Is this on 7.2 or 7.3?

7.1.3

> What is the Locale? C or en_US or something
> else?

AFAIK, we didn't do anything to alter the locale from whatever the default
would be...




pgsql-performance by date:

Previous
From: Rod Taylor
Date:
Subject: Re:
Next
From: Hannu Krosing
Date:
Subject: Re: