Thread: Indexing question
Hi all,
I have some tables (which can get pretty large) in which I want to record 'current' data as well as 'historical' data. This table has fields 'deleted' and 'deleteddate' (among other fields, of course). The field 'deleted' is false be default. Every record that I want to delete gets the value true for 'deleted' and 'deleteddate' is set to the date of deletion.
Since these tables are used a lot by queries that only use 'current' data, I have created a view with a where clause 'Where not deleted'. Also, I have indexed field 'deleted'.
I did this this because I read somewhere that fields that can contain NULL values will NOT be indexed.
Is this true?
Or could I ditch the 'deleted' field and just set 'deleteddate' to NULL by default and to a DATE in the case of a deleted record? I could then index the field 'deleteddate' and create a view with where clause 'Where deleteddate is null'.
Would this give the same performance as my current solution (with an indexed boolean field 'deleted') ?
I cannot test this myself at the moment as I am still in a design phase and do not have a real server available yet...
Thanks in advance,
Alexander Priem
Remember to consider partial indexes:
eg. CREATE INDEX ON table (col) WHERE deletedate IS NOT NULL
Chris
----- Original Message -----From: Alexander PriemSent: Friday, August 29, 2003 2:52 PMSubject: [PERFORM] Indexing questionHi all,I have some tables (which can get pretty large) in which I want to record 'current' data as well as 'historical' data. This table has fields 'deleted' and 'deleteddate' (among other fields, of course). The field 'deleted' is false be default. Every record that I want to delete gets the value true for 'deleted' and 'deleteddate' is set to the date of deletion.Since these tables are used a lot by queries that only use 'current' data, I have created a view with a where clause 'Where not deleted'. Also, I have indexed field 'deleted'.I did this this because I read somewhere that fields that can contain NULL values will NOT be indexed.Is this true?Or could I ditch the 'deleted' field and just set 'deleteddate' to NULL by default and to a DATE in the case of a deleted record? I could then index the field 'deleteddate' and create a view with where clause 'Where deleteddate is null'.Would this give the same performance as my current solution (with an indexed boolean field 'deleted') ?I cannot test this myself at the moment as I am still in a design phase and do not have a real server available yet...Thanks in advance,
Alexander Priem
> Hi all, > > I have some tables (which can get pretty large) in which I want to > record 'current' data as well as 'historical' data. This table has > fields 'deleted' and 'deleteddate' (among other fields, of course). The > field 'deleted' is false be default. Every record that I want to delete > gets the value true for 'deleted' and 'deleteddate' is set to the date > of deletion. > > Since these tables are used a lot by queries that only use 'current' > data, I have created a view with a where clause 'Where not deleted'. > Also, I have indexed field 'deleted'. <cut> I think the best choice for your case is using conditional indexes. It should be much better than indexing 'deleted' field. I don't know on which exactly fields you have to create this index - you have to check it by yourself - what do you have in "where" clause? Example: create index some_index on your_table(id_field) where not deleted; Regards, Tomasz Myrta
So if I understand correctly I could ditch the 'deleted' field entirely and use just the 'deleteddate' field. This 'deleteddate' field would be NULL by default. It would contain a date value if the record is considered 'deleted'. The index would be 'create index a on tablename(deleteddate) where deleteddate is null'. I could then access 'current' records with a view like 'create view x_view as select * from tablename where deleteddate is null'. Is that correct? This would be the best performing solution for this kind of thing, I think (theoretically at least)? Kind regards, Alexander Priem. ----- Original Message ----- From: "Tomasz Myrta" <jasiek@klaster.net> To: "Alexander Priem" <ap@cict.nl> Cc: <pgsql-performance@postgresql.org> Sent: Friday, August 29, 2003 9:03 AM Subject: Re: [PERFORM] Indexing question > > Hi all, > > > > I have some tables (which can get pretty large) in which I want to > > record 'current' data as well as 'historical' data. This table has > > fields 'deleted' and 'deleteddate' (among other fields, of course). The > > field 'deleted' is false be default. Every record that I want to delete > > gets the value true for 'deleted' and 'deleteddate' is set to the date > > of deletion. > > > > Since these tables are used a lot by queries that only use 'current' > > data, I have created a view with a where clause 'Where not deleted'. > > Also, I have indexed field 'deleted'. > > <cut> > I think the best choice for your case is using conditional indexes. It > should be much better than indexing 'deleted' field. I don't know on > which exactly fields you have to create this index - you have to check > it by yourself - what do you have in "where" clause? > > Example: > create index some_index on your_table(id_field) where not deleted; > > > Regards, > Tomasz Myrta > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
> So if I understand correctly I could ditch the 'deleted' field entirely and > use just the 'deleteddate' field. This 'deleteddate' field would be NULL by > default. It would contain a date value if the record is considered > 'deleted'. > > The index would be 'create index a on tablename(deleteddate) where > deleteddate is null'. > > I could then access 'current' records with a view like 'create view x_view > as select * from tablename where deleteddate is null'. > > Is that correct? This would be the best performing solution for this kind of > thing, I think (theoretically at least)? > > Kind regards, > Alexander Priem. Near, but not exactly. You don't need field deleted - it's true. Your example: create index a on tablename(deleteddate) where deleteddate is null we can translate to: create index a on tablename(NULL) where deleteddate is null which doesn't make too much sense. Check your queries. You probably have something like this: select * from tablename where not deleted and xxx Create your index to match xxx clause - if xxx is "some_id=13", then create your index as: create index on tablename(some_id) where deleteddate is null; Regards, Tomasz Myrta
> So if I understand correctly I could ditch the 'deleted' field entirely and > use just the 'deleteddate' field. This 'deleteddate' field would be NULL by > default. It would contain a date value if the record is considered > 'deleted'. > > The index would be 'create index a on tablename(deleteddate) where > deleteddate is null'. > > I could then access 'current' records with a view like 'create view x_view > as select * from tablename where deleteddate is null'. > > Is that correct? This would be the best performing solution for this kind of > thing, I think (theoretically at least)? Yes, I think it would be best. Definitely better than your current solution. Cheers, Chris
I think I understand what you mean :) Let's see if that's true : The entire table WAS like this: (just one example table, I have many more) create table orderadvice ( orad_id serial primary key, orad_name varchar(25) unique not null, orad_description varchar(50) default null, orad_value integer not null default 0, orad_value_quan integer references quantity (quan_id) not null default 0, orad_deleted boolean not null default false, orad_deleteddate date default null, orad_deletedby integer references systemuser (user_id) default null ) without oids; Indexes were like this: create index orad_deleted_index on orderadvice (orad_deleted); (orad_id and orad_name indexed implicitly in the create table statement) A view on this table: create view orderadvice_edit as select orad_id,orad_name,orad_description,orad_value,orad_value_quan from orderadvice where not orad_deleted; Most queries on this view would be like 'select * from orderadvice_edit where orad_id=100' or 'select * from orderadvice_edit order by orad_name'. How about the following script. Would it be better, given the type of queries that would take place on this table? create table orderadvice ( orad_id serial primary key, orad_name varchar(25) not null, orad_description varchar(50) default null, orad_value integer not null default 0, orad_value_quan integer references quantity (quan_id) not null default 0, orad_deleteddate date default null, orad_deletedby integer references systemuser (user_id) default null ) without oids; create index orad_id_index on orderadvice (orad_id) where orad_deleteddate is null; create index orad_name_index on orderadvice (orad_name) where orad_deleteddate is null; create view orderadvice_edit as select orad_id,orad_name,orad_description,orad_value,orad_value_quan from orderadvice where orad_deleteddate is null; Would queries like 'select * from orderadvice_edit where orad_id=100' or 'select * from orderadvice_edit order by orad_name' both use one of these two partial indexes, given enough records are present in the table? There would be a double index on the primary key this way, right? Thanks for your advice so far, Alexander Priem. ----- Original Message ----- From: "Tomasz Myrta" <jasiek@klaster.net> To: "Alexander Priem" <ap@cict.nl> Cc: <pgsql-performance@postgresql.org> Sent: Friday, August 29, 2003 9:59 AM Subject: Re: [PERFORM] Indexing question > > So if I understand correctly I could ditch the 'deleted' field entirely and > > use just the 'deleteddate' field. This 'deleteddate' field would be NULL by > > default. It would contain a date value if the record is considered > > 'deleted'. > > > > The index would be 'create index a on tablename(deleteddate) where > > deleteddate is null'. > > > > I could then access 'current' records with a view like 'create view x_view > > as select * from tablename where deleteddate is null'. > > > > Is that correct? This would be the best performing solution for this kind of > > thing, I think (theoretically at least)? > > > > Kind regards, > > Alexander Priem. > > Near, but not exactly. You don't need field deleted - it's true. > > Your example: > create index a on tablename(deleteddate) where deleteddate is null > we can translate to: > create index a on tablename(NULL) where deleteddate is null > which doesn't make too much sense. > > Check your queries. You probably have something like this: > select * from tablename where not deleted and xxx > > Create your index to match xxx clause - if xxx is "some_id=13", then > create your index as: > create index on tablename(some_id) where deleteddate is null; > > Regards, > Tomasz Myrta >
> create index orad_id_index on orderadvice (orad_id) where orad_deleteddate > is null; > create index orad_name_index on orderadvice (orad_name) where > orad_deleteddate is null; > > create view orderadvice_edit as select > orad_id,orad_name,orad_description,orad_value,orad_value_quan from > orderadvice where orad_deleteddate is null; > > Would queries like 'select * from orderadvice_edit where orad_id=100' or > 'select * from orderadvice_edit order by orad_name' both use one of these > two partial indexes, given enough records are present in the table? > > There would be a double index on the primary key this way, right? It looks much better now. I'm not sure about the second index. Probably it will be useless, because you sort ALL records with deleteddtata is null. Maybe the first index will be enough. I'm not sure what to do with doubled index on a primary key field. Regards, Tomasz Myrta
The first index is for sorting on orad_id, the second one for sorting on orad_name. The first one would be useful for queries like 'select * from orderadvice_edit where orad_id=100', the second one for queries like 'select * from orderadvice_edit order by orad_name'. Right? Does anyone know whether it is bad practise to have two indexes on the primary key of a table? (one 'primary key' index and one partial index) ----- Original Message ----- From: "Tomasz Myrta" <jasiek@klaster.net> To: "Alexander Priem" <ap@cict.nl> Cc: <pgsql-performance@postgresql.org> Sent: Friday, August 29, 2003 10:57 AM Subject: Re: [PERFORM] Indexing question > > create index orad_id_index on orderadvice (orad_id) where orad_deleteddate > > is null; > > create index orad_name_index on orderadvice (orad_name) where > > orad_deleteddate is null; > > > > create view orderadvice_edit as select > > orad_id,orad_name,orad_description,orad_value,orad_value_quan from > > orderadvice where orad_deleteddate is null; > > > > Would queries like 'select * from orderadvice_edit where orad_id=100' or > > 'select * from orderadvice_edit order by orad_name' both use one of these > > two partial indexes, given enough records are present in the table? > > > > There would be a double index on the primary key this way, right? > > It looks much better now. I'm not sure about the second index. Probably > it will be useless, because you sort ALL records with deleteddtata is > null. Maybe the first index will be enough. > > I'm not sure what to do with doubled index on a primary key field. > > Regards, > Tomasz Myrta > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
"Alexander Priem" <ap@cict.nl> writes: > Does anyone know whether it is bad practise to have two indexes on the > primary key of a table? (one 'primary key' index and one partial index) It's a little unusual, but if you get enough performance boost from it to justify the maintenance cost of the extra index, then I can't see anything wrong with it. The "if" is worth checking though. I missed the start of this thread, but what percentage of your rows do you expect to have null deleteddate? Unless it's a pretty small percentage, I'm unconvinced that the extra indexes will be worth their cost. regards, tom lane
Well, the intention is to hold every record that ever existed in the table. Therefore, records do not get deleted, but they get a date in the deleteddate field. This way, we can track what changes were made to the table(s). So if a record gets 'deleted', the field 'deleted' is set to today's date. If a record gets 'updated', a new record is made containing the new data, and the old record is marked as 'deleted'. So the percentage of 'deleted' records will grow with time, if you understand what I mean. ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Alexander Priem" <ap@cict.nl> Cc: "Tomasz Myrta" <jasiek@klaster.net>; <pgsql-performance@postgresql.org> Sent: Friday, August 29, 2003 4:00 PM Subject: Re: [PERFORM] Indexing question > "Alexander Priem" <ap@cict.nl> writes: > > Does anyone know whether it is bad practise to have two indexes on the > > primary key of a table? (one 'primary key' index and one partial index) > > It's a little unusual, but if you get enough performance boost from it > to justify the maintenance cost of the extra index, then I can't see > anything wrong with it. > > The "if" is worth checking though. I missed the start of this thread, > but what percentage of your rows do you expect to have null deleteddate? > Unless it's a pretty small percentage, I'm unconvinced that the extra > indexes will be worth their cost. > > regards, tom lane
On Fri, Aug 29, 2003 at 05:13:52PM +0200, Alexander Priem wrote: > Well, the intention is to hold every record that ever existed in the table. > Therefore, records do not get deleted, but they get a date in the > deleteddate field. This way, we can track what changes were made to the > table(s). > > So if a record gets 'deleted', the field 'deleted' is set to today's date. > If a record gets 'updated', a new record is made containing the new data, > and the old record is marked as 'deleted'. > > So the percentage of 'deleted' records will grow with time, if you > understand what I mean. Did you consider a two table implimentation. 1 table "live_table" containing the non-deleted records, a second table "deleted_table" containing the deleted records, along with the "deleted_date" field. Keep the two in sync column type/number wise, and use a before delete trigger function on "live_table" to actually insert a copy of the deleted row plus "deleted_date" into "deleted_table" before performing the delete on "live_table". You could also use a before update trigger to keep old copies of updated records in the same way. Then you would only incur the performance loss of scanning/etc. the deleted records when you actually need to pull up deleted plus live records.
Hi, > I have some tables (which can get pretty large) in which I want to > record 'current' data as well as 'historical' data. Another solution can be using a trigger and function to record every transaction to a 'logging' table. This way you'll have one 'current' table and one 'historical' table The 'historical' table will contain every transaction recorded from the current table. Regards Rudi.