Thread: UdmSearch: tables vs indices ...
We've almost got UdmSearch up and running, and I'm noticing something odd: -rw------- 1 pgsql pgsql 204800 Jan 6 00:05 url -rw------- 1 pgsql pgsql 1622016 Jan 6 00:05 word_url url is: CREATE TABLE "url" ( "rec_id" int4 DEFAULT nextval('next_url_id') PRIMARY KEY, "status" int4 NOT NULL DEFAULT0, "url" character varying(128) NOT NULL, "content_type" character varying(32) NOT NULL DEFAULT '', "last_modified" character varying(32) NOT NULL DEFAULT '', "title" character varying(128) NOT NULL DEFAULT '', "text" character varying(255) NOT NULL DEFAULT '', "size" int4 NOT NULL DEFAULT 0, "indexed" int4 NOTNULL DEFAULT 0, "last_index_time" datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT', "next_index_time"datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT', "referrer" int4 NOT NULL DEFAULT 0, "tag" int4 NOT NULL DEFAULT 0, "hops" int4 NOT NULL DEFAULT 0, "keywords" character varying(255) NOT NULL DEFAULT'', "description" character varying(100) NOT NULL DEFAULT '', "crc" character varying(33) NOT NULL DEFAULT''); and word_url is: CREATE INDEX "word_url" on "dict" using btree ( "word" "varchar_ops", "url_id" "int4_ops" ); ============= is it just me, or does an index ~6x the size of the data itself look "odd"? Its an older v6.5.0 database (haven't had time to upgrade *sigh*), so if explains it, so be it...I'll do an upgrade ASAP...but if that doesn't? Thanks... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker wrote: > > We've almost got UdmSearch up and running, and I'm noticing something odd: > > -rw------- 1 pgsql pgsql 204800 Jan 6 00:05 url > -rw------- 1 pgsql pgsql 1622016 Jan 6 00:05 word_url > > url is: > > CREATE TABLE "url" ( > "rec_id" int4 DEFAULT nextval('next_url_id') PRIMARY KEY, > "status" int4 NOT NULL DEFAULT 0, > "url" character varying(128) NOT NULL, > "content_type" character varying(32) NOT NULL DEFAULT '', > "last_modified" character varying(32) NOT NULL DEFAULT '', > "title" character varying(128) NOT NULL DEFAULT '', > "text" character varying(255) NOT NULL DEFAULT '', > "size" int4 NOT NULL DEFAULT 0, > "indexed" int4 NOT NULL DEFAULT 0, > "last_index_time" datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT', > "next_index_time" datetime NOT NULL DEFAULT 'Thu Dec 31 20:00:00 1970 GMT', > "referrer" int4 NOT NULL DEFAULT 0, > "tag" int4 NOT NULL DEFAULT 0, > "hops" int4 NOT NULL DEFAULT 0, > "keywords" character varying(255) NOT NULL DEFAULT '', > "description" character varying(100) NOT NULL DEFAULT '', > "crc" character varying(33) NOT NULL DEFAULT ''); > > and word_url is: > > CREATE INDEX "word_url" on "dict" using btree ( "word" "varchar_ops", "url_id" "int4_ops" ); > > ============= > > is it just me, or does an index ~6x the size of the data itself look > "odd"? > > Its an older v6.5.0 database (haven't had time to upgrade *sigh*), so if > explains it, so be it...I'll do an upgrade ASAP...but if that doesn't? > > Thanks... According to your CREATE INDEX statement, word_url is on the table dict, not url. Is dict a large dictionary of some sort? Mike
On Thu, 6 Jan 2000, Mike Mascari wrote: > According to your CREATE INDEX statement, word_url is on the > table dict, not url. Is dict a large dictionary of some sort? Damn...ya, thanks for pointing what should have been obvious :( dict is ~10Meg and growing, word_url is now 7meg *sigh* okay...ignore that one :(
On Thu, 6 Jan 2000, The Hermit Hacker wrote: > > We've almost got UdmSearch up and running, and I'm noticing something odd: > > -rw------- 1 pgsql pgsql 204800 Jan 6 00:05 url > -rw------- 1 pgsql pgsql 1622016 Jan 6 00:05 word_url Here's what I have on the test system I'm working with. The apache docs are the only thing in it (or that should be in it). -rw------- 1 postgres postgres 1671168 Dec 15 08:35 url -rw------- 1 postgres postgres 278528 Dec 15 08:35 url_crc -rw------- 1 postgres postgres 106496 Dec 15 08:35 url_pkey -rw------- 1 postgres postgres 335872 Dec 15 08:35 url_url -rw------- 1 postgres postgres 1179648 Dec 15 08:35 word_url Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 128K ISDN: $24.95/mo or less - 56K Dialup: $17.95/moor less at Pop4 Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
On Thu, 6 Jan 2000, Vince Vielhaber wrote: > On Thu, 6 Jan 2000, The Hermit Hacker wrote: > > > > > We've almost got UdmSearch up and running, and I'm noticing something odd: > > > > -rw------- 1 pgsql pgsql 204800 Jan 6 00:05 url > > -rw------- 1 pgsql pgsql 1622016 Jan 6 00:05 word_url > > Here's what I have on the test system I'm working with. The apache > docs are the only thing in it (or that should be in it). > > -rw------- 1 postgres postgres 1671168 Dec 15 08:35 url > -rw------- 1 postgres postgres 278528 Dec 15 08:35 url_crc > -rw------- 1 postgres postgres 106496 Dec 15 08:35 url_pkey > -rw------- 1 postgres postgres 335872 Dec 15 08:35 url_url > -rw------- 1 postgres postgres 1179648 Dec 15 08:35 word_url Here is *just* http://www.postgresql.org/docs: -rw------- 1 pgsql pgsql 3039232 Jan 6 06:02 url -rw------- 1 pgsql pgsql 35602432 Jan 6 06:02 dict -rw------- 1 pgsql pgsql 303104 Jan 6 06:02 url_pkey -rw------- 1 pgsql pgsql 376832 Jan 6 06:02 url_crc -rw------- 1 pgsql pgsql 1294336 Jan 6 06:02 url_url -rw------- 1 pgsql pgsql 27385856 Jan 6 06:02 word_url -rw------- 1 pgsql pgsql 8192 Jan 6 06:01 next_url_id They are generating what I think is a very very weird looking query on the tables that appears to be just hanging the whole thing...can someone explain to me what *this* would do: sum(case dict.word when '$t' then 1 else 0 end) I'm trying to get more details out of Alexander, since I'm guessing that the query itself could possibly be done cleaner, but they acknowledge that their PostgreSQL knowledge tends to be rather "sparse", at best :) More as it becomes available ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > explain to me what *this* would do: > sum(case dict.word when '$t' then 1 else 0 end) Looks to me like it generates the same result as select count(*) where dict.word = '$t'; regards, tom lane