Thread: INDEX and JOINs

INDEX and JOINs

From

Reg Me Please

Date:

26 October 2007, 07:34:12

Hi all.
I have a setup like this:

CREATE TABLE T_FIELDS (
  TABL_ID TEXT NOT NULL,
  COLU_ID TEXT NOT NULL,
  FIEL_ID TEXT PRIMARY KEY,
  UNIQUE( TABL_ID,COLU_ID ) ); -- < 200 ROWS

CREATE TABLE T_DATA (
  ITEM_ID INT8 NOT NULL,
  FIEL_ID TEXT NOT NULL REFERENCES T_FIELDS,
  DATA_T TEXT NOT NULL,
  PRIMARY( FIEL_ID,ITEM_ID  ) ); -- > 10M ROWS

When I run

SELECT * FROM T_DATA WHERE FIEL_ID='TABL.FIEL';

it's very fast (of course!). But when I run:

SELECT * FROM T_DATA NATURAL JOIN T_FIELDS WHERE TABL_ID='TABL';

it's very slow.
The EXPLAIN says that in the second case it has to do a sequential
scan on T_DATA. And this explains the timing.
Is there a way to avoid such a behaviour by acting on indexes?

Thanks.

Re: INDEX and JOINs

From

Martijn van Oosterhout

Date:

26 October 2007, 08:05:20

On Fri, Oct 26, 2007 at 12:34:06PM +0200, Reg Me Please wrote:
> it's very fast (of course!). But when I run:
>
> SELECT * FROM T_DATA NATURAL JOIN T_FIELDS WHERE TABL_ID='TABL';
>
> it's very slow.
> The EXPLAIN says that in the second case it has to do a sequential
> scan on T_DATA. And this explains the timing.
> Is there a way to avoid such a behaviour by acting on indexes?

Firstly, have you run ANALYZE recently. Secondly, you'll have to show
us the output of EXPLAIN ANALYZE if you want some useful help.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: INDEX and JOINs

From

Reg Me Please

Date:

26 October 2007, 08:10:50

Il Friday 26 October 2007 13:05:10 Martijn van Oosterhout ha scritto:
> On Fri, Oct 26, 2007 at 12:34:06PM +0200, Reg Me Please wrote:
> > it's very fast (of course!). But when I run:
> >
> > SELECT * FROM T_DATA NATURAL JOIN T_FIELDS WHERE TABL_ID='TABL';
> >
> > it's very slow.
> > The EXPLAIN says that in the second case it has to do a sequential
> > scan on T_DATA. And this explains the timing.
> > Is there a way to avoid such a behaviour by acting on indexes?
>
> Firstly, have you run ANALYZE recently. Secondly, you'll have to show
> us the output of EXPLAIN ANALYZE if you want some useful help.
>
> Have a nice day,

Yes, I'm often runing analyze while trying to sort this kind of
things out.

This is the output:

prove=# explain analyze SELECT * from t_dati natural left join t_campi where
tabe_id='CONTE';
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3.95..382140.91 rows=274709 width=91) (actual
time=1.929..57713.305 rows=92 loops=1)
   Hash Cond: (t_dati.camp_id = t_campi.camp_id)
   ->  Seq Scan on t_dati  (cost=0.00..326851.72 rows=14010172 width=73)
(actual time=0.028..43814.946 rows=14011712 loops=1)
   ->  Hash  (cost=3.91..3.91 rows=3 width=33) (actual time=0.129..0.129
rows=3 loops=1)
         ->  Seq Scan on t_campi  (cost=0.00..3.91 rows=3 width=33) (actual
time=0.040..0.121 rows=3 loops=1)
               Filter: (tabe_id = 'CONTE'::text)
 Total runtime: 57713.449 ms

(I translated the table and column names. The substance is the same.)

Re: INDEX and JOINs

From

Martijn van Oosterhout

Date:

26 October 2007, 08:56:26

On Fri, Oct 26, 2007 at 01:10:42PM +0200, Reg Me Please wrote:
> prove=# explain analyze SELECT * from t_dati natural left join t_campi where
> tabe_id='CONTE';
>                                                           QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=3.95..382140.91 rows=274709 width=91) (actual
> time=1.929..57713.305 rows=92 loops=1)
>    Hash Cond: (t_dati.camp_id = t_campi.camp_id)

Neither of the columns are indexed according to the schema you sent so
that's the problem. Or you broke something while "translating".

> (I translated the table and column names. The substance is the same.)

Try not translating, and we might get somewhere...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: INDEX and JOINs

From

Reg Me Please

Date:

26 October 2007, 09:40:40

Il Friday 26 October 2007 13:56:20 Martijn van Oosterhout ha scritto:
> On Fri, Oct 26, 2007 at 01:10:42PM +0200, Reg Me Please wrote:
> > prove=# explain analyze SELECT * from t_dati natural left join t_campi
> > where tabe_id='CONTE';
> >                                                           QUERY PLAN
> > -------------------------------------------------------------------------
> >----------------------------------------------------- Hash Join
> > (cost=3.95..382140.91 rows=274709 width=91) (actual
> > time=1.929..57713.305 rows=92 loops=1)
> >    Hash Cond: (t_dati.camp_id = t_campi.camp_id)
>
> Neither of the columns are indexed according to the schema you sent so
> that's the problem. Or you broke something while "translating".
>
> > (I translated the table and column names. The substance is the same.)
>
> Try not translating, and we might get somewhere...
>
> Have a nice day,


prove=# \d t_dati
                                       Tabella "public.t_dati"
    Colonna    |           Tipo           |                       Modificatori
---------------+--------------------------+----------------------------------------------------------
 elem_id       | bigint                   | not null
 camp_id       | text                     | not null
 dato_t        | text                     | not null
 dato_r        | double precision         |
 dato_validita | timestamp with time zone | not null
default '-infinity'::timestamp with time zone
 dato_scadenza | timestamp with time zone | not null
default 'infinity'::timestamp with time zone
 dato_flag     | boolean                  | not null default true
 dato_data     | timestamp with time zone | not null default now()
 dato_id       | bigint                   | not null default
nextval('t_dati_dato_id_seq'::regclass)
Indici:
    "t_dati_pkey" PRIMARY KEY, btree (dato_id)
    "i_dati_0" btree (elem_id)
    "i_dati_1" btree (camp_id)
    "i_dati_2" btree (dato_t text_pattern_ops)
    "i_dati_3" btree (dato_flag, dato_validita, dato_scadenza)
    "i_dati_4" btree (dato_data)
Vincoli di integrità referenziale
    "t_dati_camp_id_fkey" FOREIGN KEY (camp_id) REFERENCES t_campi(camp_id)

prove=# \d t_campi
   Tabella "public.t_campi"
 Colonna | Tipo | Modificatori
---------+------+--------------
 tabe_id | text | not null
 colo_id | text | not null
 camp_id | text | not null
Indici:
    "t_campi_pkey" PRIMARY KEY, btree (camp_id)
    "i_t_campi_0" btree (tabe_id)
Vincoli di integrità referenziale
    "t_campi_colo_id_fkey" FOREIGN KEY (colo_id) REFERENCES t_colonne(colo_id)
    "t_campi_tabe_id_fkey" FOREIGN KEY (tabe_id) REFERENCES t_tabelle(tabe_id)

They seems to be indexed.

Re: INDEX and JOINs

From

Martijn van Oosterhout

Date:

26 October 2007, 10:02:34

On Fri, Oct 26, 2007 at 02:39:28PM +0200, Reg Me Please wrote:
> > >----------------------------------------------------- Hash Join
> > > (cost=3.95..382140.91 rows=274709 width=91) (actual
> > > time=1.929..57713.305 rows=92 loops=1)
> > >    Hash Cond: (t_dati.camp_id = t_campi.camp_id)
>
> prove=# \d t_dati

Ok, my suggestion would be to run it with "enable_seqscan=off" and if
that is indeed faster, then try reducing random_page_cost.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: INDEX and JOINs

From

Tom Lane

Date:

26 October 2007, 10:18:16

Reg Me Please <regmeplease@gmail.com> writes:
>>> (cost=3.95..382140.91 rows=274709 width=91) (actual
>>> time=1.929..57713.305 rows=92 loops=1)
>>> Hash Cond: (t_dati.camp_id = t_campi.camp_id)

I think the reason it doesn't want to use an indexed join is the large
estimate of the number of join result rows.  You need to try to get that
number down to something nearer the reality.  Increasing the statistics
target for the larger table might help.

            regards, tom lane

Re: INDEX and JOINs

From

Reg Me Please

Date:

27 October 2007, 03:38:09

Il Friday 26 October 2007 15:18:04 Tom Lane ha scritto:
> Reg Me Please <regmeplease@gmail.com> writes:
> >>> (cost=3.95..382140.91 rows=274709 width=91) (actual
> >>> time=1.929..57713.305 rows=92 loops=1)
> >>> Hash Cond: (t_dati.camp_id = t_campi.camp_id)
>
> I think the reason it doesn't want to use an indexed join is the large
> estimate of the number of join result rows.  You need to try to get that
> number down to something nearer the reality.  Increasing the statistics
> target for the larger table might help.
>
>             regards, tom lane

How can I "Increasing the statistics target for the larger table"?
I'ìm sorry for asking, but I'm not that deep into RDBMS.

Thanks.

Re: INDEX and JOINs

From

Andreas Kretschmer

Date:

27 October 2007, 03:45:29

Reg Me Please <regmeplease@gmail.com> schrieb:

> How can I "Increasing the statistics target for the larger table"?
> I'ìm sorry for asking, but I'm not that deep into RDBMS.

alter table <table> alter column <column> SET STATISTICS <value>;

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: INDEX and JOINs

From

Reg Me Please

Date:

27 October 2007, 04:07:39

Il Saturday 27 October 2007 08:51:09 Andreas Kretschmer ha scritto:
> Reg Me Please <regmeplease@gmail.com> schrieb:
> > How can I "Increasing the statistics target for the larger table"?
> > I'ìm sorry for asking, but I'm not that deep into RDBMS.
>
> alter table <table> alter column <column> SET STATISTICS <value>;
>
> Andreas

How can I display the statistics for a table/column/index/whatever applies?

Thanks again.

Re: INDEX and JOINs

From

Andreas Kretschmer

Date:

27 October 2007, 07:36:19

Reg Me Please <regmeplease@gmail.com> schrieb:

> Il Saturday 27 October 2007 08:51:09 Andreas Kretschmer ha scritto:
> > Reg Me Please <regmeplease@gmail.com> schrieb:
> > > How can I "Increasing the statistics target for the larger table"?
> > > I'ìm sorry for asking, but I'm not that deep into RDBMS.
> >
> > alter table <table> alter column <column> SET STATISTICS <value>;
> >
> > Andreas
>
> How can I display the statistics for a table/column/index/whatever applies?

select attstattarget from pg_attribute where attname = <column> and attrelid=<table>::regclass;

for instance:
select attstattarget from pg_attribute where attname = 'ts' and attrelid='foo'::regclass;

-1 means the default statistics value.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°