Thread: Indexing question

Indexing question

From
"Alexander Priem"
Date:
Hi all,
 
I have some tables (which can get pretty large) in which I want to record 'current' data as well as 'historical' data. This table has fields 'deleted' and 'deleteddate' (among other fields, of course). The field 'deleted' is false be default. Every record that I want to delete gets the value true for 'deleted' and 'deleteddate' is set to the date of deletion.
 
Since these tables are used a lot by queries that only use 'current' data, I have created a view with a where clause 'Where not deleted'. Also, I have indexed field 'deleted'.
 
I did this this because I read somewhere that fields that can contain NULL values will NOT be indexed.
 
Is this true?
 
Or could I ditch the 'deleted' field and just set 'deleteddate' to NULL by default and to a DATE in the case of a deleted record? I could then index the field 'deleteddate' and create a view with where clause 'Where deleteddate is null'.
 
Would this give the same performance as my current solution (with an indexed boolean field 'deleted') ?
 
I cannot test this myself at the moment as I am still in a design phase and do not have a real server available yet...
 
Thanks in advance,

Alexander Priem

Re: Indexing question

From
"Christopher Kings-Lynne"
Date:
Remember to consider partial indexes:
 
eg. CREATE INDEX ON table (col) WHERE deletedate IS NOT NULL
 
Chris
----- Original Message -----
Sent: Friday, August 29, 2003 2:52 PM
Subject: [PERFORM] Indexing question

Hi all,
 
I have some tables (which can get pretty large) in which I want to record 'current' data as well as 'historical' data. This table has fields 'deleted' and 'deleteddate' (among other fields, of course). The field 'deleted' is false be default. Every record that I want to delete gets the value true for 'deleted' and 'deleteddate' is set to the date of deletion.
 
Since these tables are used a lot by queries that only use 'current' data, I have created a view with a where clause 'Where not deleted'. Also, I have indexed field 'deleted'.
 
I did this this because I read somewhere that fields that can contain NULL values will NOT be indexed.
 
Is this true?
 
Or could I ditch the 'deleted' field and just set 'deleteddate' to NULL by default and to a DATE in the case of a deleted record? I could then index the field 'deleteddate' and create a view with where clause 'Where deleteddate is null'.
 
Would this give the same performance as my current solution (with an indexed boolean field 'deleted') ?
 
I cannot test this myself at the moment as I am still in a design phase and do not have a real server available yet...
 
Thanks in advance,

Alexander Priem

Re: Indexing question

From
Tomasz Myrta
Date:
> Hi all,
>
> I have some tables (which can get pretty large) in which I want to
> record 'current' data as well as 'historical' data. This table has
> fields 'deleted' and 'deleteddate' (among other fields, of course). The
> field 'deleted' is false be default. Every record that I want to delete
> gets the value true for 'deleted' and 'deleteddate' is set to the date
> of deletion.
>
> Since these tables are used a lot by queries that only use 'current'
> data, I have created a view with a where clause 'Where not deleted'.
> Also, I have indexed field 'deleted'.

<cut>
I think the best choice for your case is using conditional indexes. It
should be much better than indexing 'deleted' field. I don't know on
which exactly fields you have to create this index - you have to check
it by yourself - what do you have in "where" clause?

Example:
create index some_index on your_table(id_field) where not deleted;


Regards,
Tomasz Myrta


Re: Indexing question

From
"Alexander Priem"
Date:
So if I understand correctly I could ditch the 'deleted' field entirely and
use just the 'deleteddate' field. This 'deleteddate' field would be NULL by
default. It would contain a date value if the record is considered
'deleted'.

The index would be 'create index a on tablename(deleteddate) where
deleteddate is null'.

I could then access 'current' records with a view like 'create view x_view
as select * from tablename where deleteddate is null'.

Is that correct? This would be the best performing solution for this kind of
thing, I think (theoretically at least)?

Kind regards,
Alexander Priem.



----- Original Message -----
From: "Tomasz Myrta" <jasiek@klaster.net>
To: "Alexander Priem" <ap@cict.nl>
Cc: <pgsql-performance@postgresql.org>
Sent: Friday, August 29, 2003 9:03 AM
Subject: Re: [PERFORM] Indexing question


> > Hi all,
> >
> > I have some tables (which can get pretty large) in which I want to
> > record 'current' data as well as 'historical' data. This table has
> > fields 'deleted' and 'deleteddate' (among other fields, of course). The
> > field 'deleted' is false be default. Every record that I want to delete
> > gets the value true for 'deleted' and 'deleteddate' is set to the date
> > of deletion.
> >
> > Since these tables are used a lot by queries that only use 'current'
> > data, I have created a view with a where clause 'Where not deleted'.
> > Also, I have indexed field 'deleted'.
>
> <cut>
> I think the best choice for your case is using conditional indexes. It
> should be much better than indexing 'deleted' field. I don't know on
> which exactly fields you have to create this index - you have to check
> it by yourself - what do you have in "where" clause?
>
> Example:
> create index some_index on your_table(id_field) where not deleted;
>
>
> Regards,
> Tomasz Myrta
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Re: Indexing question

From
Tomasz Myrta
Date:
> So if I understand correctly I could ditch the 'deleted' field entirely and
> use just the 'deleteddate' field. This 'deleteddate' field would be NULL by
> default. It would contain a date value if the record is considered
> 'deleted'.
>
> The index would be 'create index a on tablename(deleteddate) where
> deleteddate is null'.
>
> I could then access 'current' records with a view like 'create view x_view
> as select * from tablename where deleteddate is null'.
>
> Is that correct? This would be the best performing solution for this kind of
> thing, I think (theoretically at least)?
>
> Kind regards,
> Alexander Priem.

Near, but not exactly. You don't need field deleted - it's true.

Your example:
create index a on tablename(deleteddate) where deleteddate is null
we can translate to:
create index a on tablename(NULL) where deleteddate is null
which doesn't make too much sense.

Check your queries. You probably have something like this:
select * from tablename where not deleted and xxx

Create your index to match xxx clause - if xxx is "some_id=13", then
create your index as:
create index on tablename(some_id) where deleteddate is null;

Regards,
Tomasz Myrta


Re: Indexing question

From
"Christopher Kings-Lynne"
Date:
> So if I understand correctly I could ditch the 'deleted' field entirely
and
> use just the 'deleteddate' field. This 'deleteddate' field would be NULL
by
> default. It would contain a date value if the record is considered
> 'deleted'.
>
> The index would be 'create index a on tablename(deleteddate) where
> deleteddate is null'.
>
> I could then access 'current' records with a view like 'create view x_view
> as select * from tablename where deleteddate is null'.
>
> Is that correct? This would be the best performing solution for this kind
of
> thing, I think (theoretically at least)?

Yes, I think it would be best.  Definitely better than your current
solution.

Cheers,

Chris


Re: Indexing question

From
"Alexander Priem"
Date:
I think I understand what you mean   :)

Let's see if that's true :

The entire table WAS like this: (just one example table, I have many more)

create table orderadvice (
orad_id serial primary key,
orad_name varchar(25) unique not null,
orad_description varchar(50) default null,
orad_value integer not null default 0,
orad_value_quan integer references quantity (quan_id) not null default 0,
orad_deleted boolean not null default false,
orad_deleteddate date default null,
orad_deletedby integer references systemuser (user_id) default null )
without oids;

Indexes were like this:

create index orad_deleted_index on orderadvice (orad_deleted);
(orad_id and orad_name indexed implicitly in the create table statement)

A view on this table:

create view orderadvice_edit as select
orad_id,orad_name,orad_description,orad_value,orad_value_quan from
orderadvice where not orad_deleted;

Most queries on this view would be like 'select * from orderadvice_edit
where orad_id=100' or 'select * from orderadvice_edit order by orad_name'.

How about the following script. Would it be better, given the type of
queries that would take place on this table?

create table orderadvice (
orad_id serial primary key,
orad_name varchar(25) not null,
orad_description varchar(50) default null,
orad_value integer not null default 0,
orad_value_quan integer references quantity (quan_id) not null default 0,
orad_deleteddate date default null,
orad_deletedby integer references systemuser (user_id) default null )
without oids;

create index orad_id_index on orderadvice (orad_id) where orad_deleteddate
is null;
create index orad_name_index on orderadvice (orad_name) where
orad_deleteddate is null;

create view orderadvice_edit as select
orad_id,orad_name,orad_description,orad_value,orad_value_quan from
orderadvice where orad_deleteddate is null;

Would queries like 'select * from orderadvice_edit where orad_id=100' or
'select * from orderadvice_edit order by orad_name' both use one of these
two partial indexes, given enough records are present in the table?

There would be a double index on the primary key this way, right?

Thanks for your advice so far,
Alexander Priem.






----- Original Message -----
From: "Tomasz Myrta" <jasiek@klaster.net>
To: "Alexander Priem" <ap@cict.nl>
Cc: <pgsql-performance@postgresql.org>
Sent: Friday, August 29, 2003 9:59 AM
Subject: Re: [PERFORM] Indexing question


> > So if I understand correctly I could ditch the 'deleted' field entirely
and
> > use just the 'deleteddate' field. This 'deleteddate' field would be NULL
by
> > default. It would contain a date value if the record is considered
> > 'deleted'.
> >
> > The index would be 'create index a on tablename(deleteddate) where
> > deleteddate is null'.
> >
> > I could then access 'current' records with a view like 'create view
x_view
> > as select * from tablename where deleteddate is null'.
> >
> > Is that correct? This would be the best performing solution for this
kind of
> > thing, I think (theoretically at least)?
> >
> > Kind regards,
> > Alexander Priem.
>
> Near, but not exactly. You don't need field deleted - it's true.
>
> Your example:
> create index a on tablename(deleteddate) where deleteddate is null
> we can translate to:
> create index a on tablename(NULL) where deleteddate is null
> which doesn't make too much sense.
>
> Check your queries. You probably have something like this:
> select * from tablename where not deleted and xxx
>
> Create your index to match xxx clause - if xxx is "some_id=13", then
> create your index as:
> create index on tablename(some_id) where deleteddate is null;
>
> Regards,
> Tomasz Myrta
>


Re: Indexing question

From
Tomasz Myrta
Date:
> create index orad_id_index on orderadvice (orad_id) where orad_deleteddate
> is null;
> create index orad_name_index on orderadvice (orad_name) where
> orad_deleteddate is null;
>
> create view orderadvice_edit as select
> orad_id,orad_name,orad_description,orad_value,orad_value_quan from
> orderadvice where orad_deleteddate is null;
>
> Would queries like 'select * from orderadvice_edit where orad_id=100' or
> 'select * from orderadvice_edit order by orad_name' both use one of these
> two partial indexes, given enough records are present in the table?
>
> There would be a double index on the primary key this way, right?

It looks much better now. I'm not sure about the second index. Probably
it will be useless, because you sort ALL records with deleteddtata is
null. Maybe the first index will be enough.

I'm not sure what to do with doubled index on a primary key field.

Regards,
Tomasz Myrta


Re: Indexing question

From
"Alexander Priem"
Date:
The first index is for sorting on orad_id, the second one for sorting on
orad_name. The first one would be useful for queries like 'select * from
orderadvice_edit where orad_id=100', the second one for queries like 'select
* from orderadvice_edit order by orad_name'. Right?

Does anyone know whether it is bad practise to have two indexes on the
primary key of a table? (one 'primary key' index and one partial index)




----- Original Message -----
From: "Tomasz Myrta" <jasiek@klaster.net>
To: "Alexander Priem" <ap@cict.nl>
Cc: <pgsql-performance@postgresql.org>
Sent: Friday, August 29, 2003 10:57 AM
Subject: Re: [PERFORM] Indexing question


> > create index orad_id_index on orderadvice (orad_id) where
orad_deleteddate
> > is null;
> > create index orad_name_index on orderadvice (orad_name) where
> > orad_deleteddate is null;
> >
> > create view orderadvice_edit as select
> > orad_id,orad_name,orad_description,orad_value,orad_value_quan from
> > orderadvice where orad_deleteddate is null;
> >
> > Would queries like 'select * from orderadvice_edit where orad_id=100' or
> > 'select * from orderadvice_edit order by orad_name' both use one of
these
> > two partial indexes, given enough records are present in the table?
> >
> > There would be a double index on the primary key this way, right?
>
> It looks much better now. I'm not sure about the second index. Probably
> it will be useless, because you sort ALL records with deleteddtata is
> null. Maybe the first index will be enough.
>
> I'm not sure what to do with doubled index on a primary key field.
>
> Regards,
> Tomasz Myrta
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend


Re: Indexing question

From
Tom Lane
Date:
"Alexander Priem" <ap@cict.nl> writes:
> Does anyone know whether it is bad practise to have two indexes on the
> primary key of a table? (one 'primary key' index and one partial index)

It's a little unusual, but if you get enough performance boost from it
to justify the maintenance cost of the extra index, then I can't see
anything wrong with it.

The "if" is worth checking though.  I missed the start of this thread,
but what percentage of your rows do you expect to have null deleteddate?
Unless it's a pretty small percentage, I'm unconvinced that the extra
indexes will be worth their cost.

            regards, tom lane

Re: Indexing question

From
"Alexander Priem"
Date:
Well, the intention is to hold every record that ever existed in the table.
Therefore, records do not get deleted, but they get a date in the
deleteddate field. This way, we can track what changes were made to the
table(s).

So if a record gets 'deleted', the field 'deleted' is set to today's date.
If a record gets 'updated', a new record is made containing the new data,
and the old record is marked as 'deleted'.

So the percentage of 'deleted' records will grow with time, if you
understand what I mean.



----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Alexander Priem" <ap@cict.nl>
Cc: "Tomasz Myrta" <jasiek@klaster.net>; <pgsql-performance@postgresql.org>
Sent: Friday, August 29, 2003 4:00 PM
Subject: Re: [PERFORM] Indexing question


> "Alexander Priem" <ap@cict.nl> writes:
> > Does anyone know whether it is bad practise to have two indexes on the
> > primary key of a table? (one 'primary key' index and one partial index)
>
> It's a little unusual, but if you get enough performance boost from it
> to justify the maintenance cost of the extra index, then I can't see
> anything wrong with it.
>
> The "if" is worth checking though.  I missed the start of this thread,
> but what percentage of your rows do you expect to have null deleteddate?
> Unless it's a pretty small percentage, I'm unconvinced that the extra
> indexes will be worth their cost.
>
> regards, tom lane


Re: Indexing question

From
Richard Ellis
Date:
On Fri, Aug 29, 2003 at 05:13:52PM +0200, Alexander Priem wrote:
> Well, the intention is to hold every record that ever existed in the table.
> Therefore, records do not get deleted, but they get a date in the
> deleteddate field. This way, we can track what changes were made to the
> table(s).
>
> So if a record gets 'deleted', the field 'deleted' is set to today's date.
> If a record gets 'updated', a new record is made containing the new data,
> and the old record is marked as 'deleted'.
>
> So the percentage of 'deleted' records will grow with time, if you
> understand what I mean.

Did you consider a two table implimentation.  1 table "live_table"
containing the non-deleted records, a second table "deleted_table"
containing the deleted records, along with the "deleted_date" field.  Keep
the two in sync column type/number wise, and use a before delete trigger
function on "live_table" to actually insert a copy of the deleted row plus
"deleted_date" into "deleted_table" before performing the delete on
"live_table".

You could also use a before update trigger to keep old copies of updated
records in the same way.

Then you would only incur the performance loss of scanning/etc. the deleted
records when you actually need to pull up deleted plus live records.


Re: Indexing question

From
"Rudi Starcevic"
Date:
Hi,

> I have some tables (which can get pretty large) in which I want to
> record 'current' data as well as 'historical' data.

Another solution can be using a trigger and function to record every
transaction to a 'logging' table.
This way you'll have one 'current' table and one 'historical' table
The 'historical' table will contain every transaction recorded from
the current table.

Regards
Rudi.