Thread: UdmSearch: tables vs indices ...

UdmSearch: tables vs indices ...

From
The Hermit Hacker
Date:
We've almost got UdmSearch up and running, and I'm noticing something odd:

-rw-------  1 pgsql  pgsql   204800 Jan  6 00:05 url
-rw-------  1 pgsql  pgsql  1622016 Jan  6 00:05 word_url

url is:

CREATE TABLE "url" (       "rec_id" int4 DEFAULT nextval('next_url_id') PRIMARY KEY,       "status" int4 NOT NULL
DEFAULT0,       "url" character varying(128) NOT NULL,       "content_type" character varying(32) NOT NULL DEFAULT '',
    "last_modified" character varying(32) NOT NULL DEFAULT '',       "title" character varying(128) NOT NULL DEFAULT
'',      "text" character varying(255) NOT NULL DEFAULT '',       "size" int4 NOT NULL DEFAULT 0,       "indexed" int4
NOTNULL DEFAULT 0,       "last_index_time" datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT',
"next_index_time"datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT',       "referrer" int4 NOT NULL DEFAULT 0,
  "tag" int4 NOT NULL DEFAULT 0,       "hops" int4 NOT NULL DEFAULT 0,       "keywords" character varying(255) NOT NULL
DEFAULT'',       "description" character varying(100) NOT NULL DEFAULT '',       "crc" character varying(33) NOT NULL
DEFAULT'');
 

and word_url is:

CREATE  INDEX "word_url" on "dict" using btree ( "word" "varchar_ops", "url_id" "int4_ops" );

=============

is it just me, or does an index ~6x the size of the data itself look
"odd"?

Its an older v6.5.0 database (haven't had time to upgrade *sigh*), so if
explains it, so be it...I'll do an upgrade ASAP...but if that doesn't?

Thanks...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] UdmSearch: tables vs indices ...

From
Mike Mascari
Date:
The Hermit Hacker wrote:
> 
> We've almost got UdmSearch up and running, and I'm noticing something odd:
> 
> -rw-------  1 pgsql  pgsql   204800 Jan  6 00:05 url
> -rw-------  1 pgsql  pgsql  1622016 Jan  6 00:05 word_url
> 
> url is:
> 
> CREATE TABLE "url" (
>         "rec_id" int4 DEFAULT nextval('next_url_id') PRIMARY KEY,
>         "status" int4 NOT NULL DEFAULT 0,
>         "url" character varying(128) NOT NULL,
>         "content_type" character varying(32) NOT NULL DEFAULT '',
>         "last_modified" character varying(32) NOT NULL DEFAULT '',
>         "title" character varying(128) NOT NULL DEFAULT '',
>         "text" character varying(255) NOT NULL DEFAULT '',
>         "size" int4 NOT NULL DEFAULT 0,
>         "indexed" int4 NOT NULL DEFAULT 0,
>         "last_index_time" datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT',
>         "next_index_time" datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT',
>         "referrer" int4 NOT NULL DEFAULT 0,
>         "tag" int4 NOT NULL DEFAULT 0,
>         "hops" int4 NOT NULL DEFAULT 0,
>         "keywords" character varying(255) NOT NULL DEFAULT '',
>         "description" character varying(100) NOT NULL DEFAULT '',
>         "crc" character varying(33) NOT NULL DEFAULT '');
> 
> and word_url is:
> 
> CREATE  INDEX "word_url" on "dict" using btree ( "word" "varchar_ops", "url_id" "int4_ops" );
> 
> =============
> 
> is it just me, or does an index ~6x the size of the data itself look
> "odd"?
> 
> Its an older v6.5.0 database (haven't had time to upgrade *sigh*), so if
> explains it, so be it...I'll do an upgrade ASAP...but if that doesn't?
> 
> Thanks...


According to your CREATE INDEX statement, word_url is on the
table dict, not url. Is dict a large dictionary of some sort?

Mike


Re: [HACKERS] UdmSearch: tables vs indices ...

From
The Hermit Hacker
Date:
On Thu, 6 Jan 2000, Mike Mascari wrote:

> According to your CREATE INDEX statement, word_url is on the
> table dict, not url. Is dict a large dictionary of some sort?

Damn...ya, thanks for pointing what should have been obvious :(  dict is
~10Meg and growing, word_url is now 7meg *sigh*

okay...ignore that one :(




Re: [HACKERS] UdmSearch: tables vs indices ...

From
Vince Vielhaber
Date:
On Thu, 6 Jan 2000, The Hermit Hacker wrote:

> 
> We've almost got UdmSearch up and running, and I'm noticing something odd:
> 
> -rw-------  1 pgsql  pgsql   204800 Jan  6 00:05 url
> -rw-------  1 pgsql  pgsql  1622016 Jan  6 00:05 word_url

Here's what I have on the test system I'm working with.  The apache
docs are the only thing in it (or that should be in it).

-rw-------   1 postgres  postgres  1671168 Dec 15 08:35 url
-rw-------   1 postgres  postgres   278528 Dec 15 08:35 url_crc
-rw-------   1 postgres  postgres   106496 Dec 15 08:35 url_pkey
-rw-------   1 postgres  postgres   335872 Dec 15 08:35 url_url
-rw-------   1 postgres  postgres  1179648 Dec 15 08:35 word_url

Vince.
-- 
==========================================================================
Vince Vielhaber -- KA8CSH    email: vev@michvhf.com    http://www.pop4.net  128K ISDN: $24.95/mo or less - 56K Dialup:
$17.95/moor less at Pop4       Online Campground Directory    http://www.camping-usa.com      Online Giftshop
Superstore   http://www.cloudninegifts.com
 
==========================================================================





Re: [HACKERS] UdmSearch: tables vs indices ...

From
The Hermit Hacker
Date:
On Thu, 6 Jan 2000, Vince Vielhaber wrote:

> On Thu, 6 Jan 2000, The Hermit Hacker wrote:
> 
> > 
> > We've almost got UdmSearch up and running, and I'm noticing something odd:
> > 
> > -rw-------  1 pgsql  pgsql   204800 Jan  6 00:05 url
> > -rw-------  1 pgsql  pgsql  1622016 Jan  6 00:05 word_url
> 
> Here's what I have on the test system I'm working with.  The apache
> docs are the only thing in it (or that should be in it).
> 
> -rw-------   1 postgres  postgres  1671168 Dec 15 08:35 url
> -rw-------   1 postgres  postgres   278528 Dec 15 08:35 url_crc
> -rw-------   1 postgres  postgres   106496 Dec 15 08:35 url_pkey
> -rw-------   1 postgres  postgres   335872 Dec 15 08:35 url_url
> -rw-------   1 postgres  postgres  1179648 Dec 15 08:35 word_url

Here is *just* http://www.postgresql.org/docs:

-rw-------  1 pgsql  pgsql   3039232 Jan  6 06:02 url
-rw-------  1 pgsql  pgsql  35602432 Jan  6 06:02 dict
-rw-------  1 pgsql  pgsql    303104 Jan  6 06:02 url_pkey
-rw-------  1 pgsql  pgsql    376832 Jan  6 06:02 url_crc
-rw-------  1 pgsql  pgsql   1294336 Jan  6 06:02 url_url
-rw-------  1 pgsql  pgsql  27385856 Jan  6 06:02 word_url
-rw-------  1 pgsql  pgsql      8192 Jan  6 06:01 next_url_id

They are generating what I think is a very very weird looking query on the
tables that appears to be just hanging the whole thing...can someone
explain to me what *this* would do:
    sum(case dict.word when '$t' then 1 else 0 end)

I'm trying to get more details out of Alexander, since I'm guessing that
the query itself could possibly be done cleaner, but they acknowledge that
their PostgreSQL knowledge tends to be rather "sparse", at best :)

More as it becomes available ...

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] UdmSearch: tables vs indices ...

From
Tom Lane
Date:
The Hermit Hacker <scrappy@hub.org> writes:
> explain to me what *this* would do:

>         sum(case dict.word when '$t' then 1 else 0 end)

Looks to me like it generates the same result as
select count(*) where dict.word = '$t';
        regards, tom lane