Thread: Best Fit SQL query statement

Best Fit SQL query statement

From

Kiran

Date:

10 August 2007, 02:30:05

All,

Could anyone  help me in writing Best Fit SQL statement.
Suppose we have table t1 with coloumn t1 (text) with following rows.
98456
98457
9845
9846
984
985
98
99


and if I query on 98456 the result must be 98456,
However if I query on 98455 the result must be 9845
and If I query 9849 the result must be 984

Regards,
Kiran

Re: Best Fit SQL query statement

From

hubert depesz lubaczewski

Date:

10 August 2007, 04:59:57

On Mon, Aug 06, 2007 at 01:57:07AM -0700, Kiran wrote:
> Could anyone  help me in writing Best Fit SQL statement.
> Suppose we have table t1 with coloumn t1 (text) with following rows.
> 98456
> 98457
> 9845
> 9846
> 984
> 985
> 98
> 99
> and if I query on 98456 the result must be 98456,
> However if I query on 98455 the result must be 9845
> and If I query 9849 the result must be 984

select t1.t1 from t1 where '98456' like t1.t1||'%' order by length(t1.t1) desc limit 1;

should be ok.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Best Fit SQL query statement

From

"Fernando Hevia"

Date:

10 August 2007, 16:40:21

Hi Depesz,

I was curious about your solution for Best Fit since I had mine working in a
function with a loop:
 ... FOR v_len IN REVERSE v_max..v_min LOOP   v_prefix := substring(v_destino, 1, v_len);
   SELECT * INTO v_result    FROM numeracion   WHERE prefijo = v_prefix;
   IF FOUND THEN      RETURN :v_result;   END IF; END LOOP; ...

Found your query is shorter and clearer, problem is I couldn't have it use
an index. Thought it was a locale issue but adding a 2nd index with
varchar_pattern_ops made no difference.
In result, it turned out to be too slow in comparison to the function. Am I
missing something?

--- DDL ---

rd=# show lc_collate;lc_collate
-------------en_US.UTF-8
(1 row)

rd=# show client_encoding;client_encoding
-----------------SQL_ASCII
(1 row)

rd=# show server_encoding;server_encoding
-----------------SQL_ASCII
(1 row)

rd=# \d numeracion                Table "public.numeracion"  Column    |            Type             |   Modifiers
-------------+-----------------------------+---------------cod_oper    | integer                     |servicio    |
text                       | not nullmodalidad   | text                        | not nulllocalidad   | text
          | not nullindicativo  | text                        | not nullbloque      | text                        | not
nullresolucion | text                        |fecha       | date                        | not nullprefijo     | text
                   | not nulllargo       | integer                     |fecha_carga | timestamp without time zone |
defaultnow()
 
Indexes:   "pk_numeracion" PRIMARY KEY, btree (prefijo)   "idx_numeracion_prefijo" btree (prefijo varchar_pattern_ops)
Foreign-key constraints:   "fk_numeracion_operadores_cod_oper" FOREIGN KEY (cod_oper) REFERENCES
operadores(cod_oper)

rd=# set enable_seqscan = off;
SET

rd=# explain select prefijo
rd-#     FROM numeracion
rd-#     WHERE '3514269565' LIKE prefijo || '%'
rd-#     ORDER BY LENGTH(prefijo) DESC
rd-#     LIMIT 1;                                QUERY PLAN
----------------------------------------------------------------------------
Limit  (cost=100001077.54..100001077.54 rows=1 width=89)  ->  Sort  (cost=100001077.54..100001077.91 rows=151 width=89)
      Sort Key: length(prefijo)        ->  Seq Scan on numeracion  (cost=100000000.00..100001072.07
 
rows=151 width=89)              Filter: ('3514269565'::text ~~ (prefijo || '%'::text))

Why I am getting these monstrous costs? Table had been vacuumed full just
before running the explain plan. It has ~31k rows.

Any hindsight will be greatly appreciated.
Regards,
Fernando.



-----Mensaje original-----
De: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]
En nombre de hubert depesz lubaczewski
Enviado el: Viernes, 10 de Agosto de 2007 05:00
Para: Kiran
CC: pgsql-sql@postgresql.org
Asunto: Re: [SQL] Best Fit SQL query statement

On Mon, Aug 06, 2007 at 01:57:07AM -0700, Kiran wrote:
> Could anyone  help me in writing Best Fit SQL statement.
> Suppose we have table t1 with coloumn t1 (text) with following rows.
> 98456
> 98457
> 9845
> 9846
> 984
> 985
> 98
> 99
> and if I query on 98456 the result must be 98456,
> However if I query on 98455 the result must be 9845
> and If I query 9849 the result must be 984

select t1.t1 from t1 where '98456' like t1.t1||'%' order by length(t1.t1)
desc limit 1;

should be ok.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
               http://www.postgresql.org/about/donate

Re: Best Fit SQL query statement

From

hubert depesz lubaczewski

Date:

10 August 2007, 20:00:13

On Fri, Aug 10, 2007 at 04:40:34PM -0300, Fernando Hevia wrote:
> Found your query is shorter and clearer, problem is I couldn't have it use
> an index. Thought it was a locale issue but adding a 2nd index with
> varchar_pattern_ops made no difference.
> In result, it turned out to be too slow in comparison to the function. Am I
> missing something?
> rd=# explain select prefijo
> rd-#     FROM numeracion
> rd-#     WHERE '3514269565' LIKE prefijo || '%'
> rd-#     ORDER BY LENGTH(prefijo) DESC
> rd-#     LIMIT 1;

unfortunatelly this query will be hard to optimize.
i guess that functional approach will be the fastest, but you can try
with something like this:

select prefijo
from numeracion
where prefijo in (   select substr('3514269565',1,i)   from generate_series(1, length('3514269565')) i
)
order by length(prefijo) desc LIMIT 1;

it should be faster then the previous approach, but it will most
probably not be as fast as function.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Best Fit SQL query statement

From

"Rodrigo De León"

Date:

10 August 2007, 22:13:51

On 8/10/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
> unfortunatelly this query will be hard to optimize.

Uh, how about

SELECT MAX(t1) FROM t1WHERE '9849' LIKE t1 || '%';

Re: Best Fit SQL query statement

From

hubert depesz lubaczewski

Date:

11 August 2007, 06:06:56

On Fri, Aug 10, 2007 at 08:13:46PM -0500, Rodrigo De León wrote:
> On 8/10/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
> > unfortunatelly this query will be hard to optimize.
> 
> Uh, how about
> 
> SELECT MAX(t1)
>   FROM t1
>  WHERE '9849' LIKE t1 || '%';

it will not help much as the main burden is the where clause, which is
not indexable.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Best Fit SQL query statement

From

"Fernando Hevia"

Date:

14 August 2007, 10:59:47

De: hubert depesz lubaczewski [mailto:depesz@depesz.com] 

>>On Fri, Aug 10, 2007 at 04:40:34PM -0300, Fernando Hevia wrote:
>> Found your query is shorter and clearer, problem is I couldn't have it
use
>> an index. Thought it was a locale issue but adding a 2nd index with
>> varchar_pattern_ops made no difference.
>> In result, it turned out to be too slow in comparison to the function. Am
I
>> missing something?
>> rd=# explain select prefijo
>> rd-#     FROM numeracion
>> rd-#     WHERE '3514269565' LIKE prefijo || '%'
>> rd-#     ORDER BY LENGTH(prefijo) DESC
>> rd-#     LIMIT 1;

> unfortunatelly this query will be hard to optimize.
> i guess that functional approach will be the fastest, but you can try
> with something like this:
>
> select prefijo
> from numeracion
> where prefijo in (
>     select substr('3514269565',1,i)
>     from generate_series(1, length('3514269565')) i
> )
> order by length(prefijo) desc LIMIT 1;
>
>it should be faster then the previous approach, but it will most
>probably not be as fast as function.

Actually, I find this variant nearly as fast as the function. The
generate_series can be limited to known minimum and maximum prefix lengths
in order to speed up the query a bit more.

Works quite well.

Cheers,
Fernando.