Re: query very slow when enable_seqscan=on - Mailing list pgsql-bugs

From Tomasz Ostrowski
Subject Re: query very slow when enable_seqscan=on
Date
Msg-id 20060704144406.GA21511@batory.org.pl
Whole thread Raw
In response to Re: query very slow when enable_seqscan=on  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: query very slow when enable_seqscan=on
Re: query very slow when enable_seqscan=on
List pgsql-bugs
On Tue, 04 Jul 2006, Tom Lane wrote:

> I think the real problem here is that regex matching is the wrong
> tool for the job.  Have you looked into a full-text index
> (tsearch2)?

So much to do with so little time...

I've briefly looked into it but:

- it's complicated;

- it is not needed - basic scan is good enough for the amount of data
  we have (if a sane query plan is chosen by a database);

- we have data in many languages (including based on cyryllic
  alphabet) - languages which use different forms of the same word
  based on context, for example:
    Warszawa
    Warszawy
    Warszawie
    Warszawê
    Warszaw±
    Warszawo
  All of the above could be translated to "Warsaw". So we need to
  support matching parts of words ("warszaw"), which I haven't seen
  in tsearch2 (maybe I've overlooked). We also have words, which
  different forms look like this: "stó³" "stole" "sto³u" (Polish for
  "table") - when we need to find it we'd need to list every possible
  form (about 10) or use a regex like: 'st[oó][l³]'.

> With something like that, the index operator has at least got the
> correct conceptual model, ie, looking for indexed words.  I'm not sure
> if they have any decent statistical support for it :-( but in theory
> that seems doable, whereas regex estimation will always be a crapshoot.

So why estimate regex expressions if there is no estimation possible?
Let's set this estimate to be pessimistic (match everything or
everything not null) and it will choose better plans. At least until
somebody will figure out better approach.

Pozdrawiam
Tometzky
--
Best of prhn - najzabawniejsze teksty polskiego UseNet-u
http://prhn.dnsalias.org/
  Chaos zawsze pokonuje porz±dek, gdy¿ jest lepiej zorganizowany.
                                              [ Terry Pratchett ]

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: ALTER TYPE ... USING(NULL) / NOT NULL violation
Next
From: tomas@tuxteam.de
Date:
Subject: Re: query very slow when enable_seqscan=on