Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching - Mailing list pgsql-general

From Pierre-Frédéric Caillaud
Subject Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching
Date
Msg-id opsdu638jlcq72hf@musicbox
Whole thread Raw
In response to Index on TEXT versus CHAR(32)... fast exact TEXT matching  (Jon Lapham <lapham@jandr.org>)
List pgsql-general
> CREATE TABLE a (id SERIAL, thetext TEXT, thetext_md5 CHAR(32));
> SELECT id FROM a WHERE thetext_m5d=md5('Some other text');

    From your explanations :
    - you need an exact match text->id
    - you already know the text

    Then, why bother with the TEXT column ?
    I have no idea on the performance of pg text indexes, but I do know that
a table which fits in RAM, or an index which fits in RAM, is faster than
when it does not.

    Try this :

create table my_searches (
    id        serial primary key,
    text_md5    char[16] (or even a binary string)
);

    Then create a table my_text (id,full_text) with appropriate foreign key
(to get a 1<->1 relationship between your tables) because you may still
want to remember the text. Index it on id of course.

    My point is that your search table no longer has the big text field, it
has only constant size fields, (thus constant rowsize) and it is much
smaller. The index will also be smaller and maybe fit in RAM. You then :

    SELECT id FROM my_text WHERE id in (SELECT id FROM my_searches WHERE
text_md5 = (your md5)) AND full_text=(your text);

    Unless a MD5 collision occurs, which should be rare, the subquery will
return 1 result and the first SELECT will just do an index lookup on ID.

    Now, ok, I'm stupid, I reimplemented the Hash index...

    I'd be interested in your benchmarks of your real data with :
    - table with full text + id, with btree index on text
    - table with full text + id, with hash index on text
    - the scheme I just came up with



pgsql-general by date:

Previous
From: Scott Ribe
Date:
Subject: Re: How to determine a database is intact?
Next
From: Pierre-Frédéric Caillaud
Date:
Subject: Re: How to determine a database is intact?