Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views - Mailing list pgsql-general
From | Benjamin Arai |
---|---|
Subject | Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views |
Date | |
Msg-id | 22B8D5DC-456F-4337-A1D9-B2C9125CA71E@benjaminarai.com Whole thread Raw |
In response to | Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views ("Joshua D. Drake" <jd@commandprompt.com>) |
List | pgsql-general |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Since I am using tsearch2 on the table I think there is going to be a significant performance hit - e.g., I partition by batch (batches are not separated by date, they are essentially random subsets of a much larger data-set). I am querying this database using tsearch2, so I am assuming all tables are going to be queried each time since the text is not partition by any specific constraint - e.g., >R goes to table 1 and <=R goes to table 2. Benjamin On Aug 25, 2007, at 11:18 AM, Joshua D. Drake wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Benjamin Arai wrote: >> As stated in the previous email if I use partitioning then queries >> will >> be executed sequentially - i.e., instead of log(n) it would be (# >> partitions) * log(n). Right? > > The planner will consider every relevant partition during the > execution. > Which may be a performance hit, it may not be. It depends on many > factors. In general however, partitioning when done correctly is a > performance benefit and a maintenance benefit. > > Sincerely, > > Joshua D. Drake > > >> >> Benjamin >> >> On Aug 25, 2007, at 9:18 AM, Joshua D. Drake wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >> >>> Benjamin Arai wrote: >>>> This kind of disappointing, I was hoping there was more that >>>> could be >>>> done. >>>> >>>> There has to be another way to do incremental indexing without >>>> loosing >>>> that much performance. >> >>> What makes you think you are loosing performance by using >>> partitioning? >> >>> Joshua D. Drake >> >>>> >>>> Benjamin >>>> >>>> On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote: >>>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>> >>>>> Brandon Shalton wrote: >>>>>> Benjamin, >>>>>> >>>>>> >>>>>>> >>>>>>> In order to avoid the re-indexing I was thinking of instead >>>>>>> creating >>>>>>> a new >>>>>>> table each month (building its indexes and etc) and accessing >>>>>>> them >>>>>>> all >>>>>>> through a view. This way I only have to index the new data each >>>>>>> month. >>>>>>> >>>>>> >>>>>> Take a look at bizgres.org (based on postgres). >>>>>> >>>>>> They have a parent-child structure. >>>>>> >>>>>> The way i use it, is I have about 30M records a day that are >>>>>> inserted >>>>>> into the database. >>>>>> >>>>>> Each day is a "child" table to the "parent". >>>>>> >>>>>> so example: >>>>>> >>>>>> the parent table is called "logfile" >>>>>> >>>>>> each day, is a child, with the structure like "logfile_YYMMDD" >>>>>> >>>>>> the "child" inherits the table structure of the parent, such >>>>>> that you >>>>>> could query the child table name directly, or you run the query >>>>>> against >>>>>> the parent (ie. logfile table) and get all the data. >>>>>> >>>>>> the indexes are done on a per table basis, so new data that comes >>>>>> in, is >>>>>> a lesser amount, and doesn't require re-indexing. >>>> >>>> >>>>> PostgreSQL can do all of this too. >>>> >>>>> Sincerely, >>>> >>>>> Joshua D. Drake >>>> >>>>>> >>>>>> >>>>>> example: >>>>>> >>>>>> select * from logfile_070825 where datafield = 'foo' >>>>>> >>>>>> if i knew i wanted to specifically go into that child, or: >>>>>> >>>>>> select * from logfile where datafield = 'foo' >>>>>> >>>>>> and all child tables are searched and results merged. You can >>>>>> perform >>>>>> any kind of sql query and field structures are you normally do. >>>>>> >>>>>> the downside is that the queries are run sequentially. >>>>>> >>>>>> so if you had 100 child tables, each table is queried via >>>>>> indexes, >>>>>> then >>>>>> results are merged. >>>>>> >>>>>> but, this approach does allow me to dump alot of data in, without >>>>>> having >>>>>> the re-indexing issues you are facing. >>>>>> >>>>>> at some point, you could roll up the days, in to weekly child >>>>>> tables, >>>>>> then monthly tables, etc. >>>>>> >>>>>> I believe Bizgres has a new version of their system that does >>>>>> parallel >>>>>> queries which would certainly speed things up. >>>>>> >>>>>> For your documents, you can do it by the day it was checked >>>>>> in, or >>>>>> maybe >>>>>> you have some other way of logically grouping, but the parent/ >>>>>> child >>>>>> table structure really helped to solve my problem of adding in >>>>>> millions >>>>>> of records each day. >>>>>> >>>>>> The closest thing in mysql is using merge tables, which is not >>>>>> really >>>>>> practical when it comes time to do the joins to the tables. >>>>>> >>>>>> -brandon >>>>>> >>>>>> http://www.t3report.com - marketing intelligence for online >>>>>> marketing >>>>>> and affiliate programs >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ---------------------------(end of >>>>>> broadcast)--------------------------- >>>>>> TIP 6: explain analyze is your friend >>>>>> >>>> >>>> >>>>> - -- >>>> >>>>> === The PostgreSQL Company: Command Prompt, Inc. === >>>>> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 >>>>> PostgreSQL solutions since 1997 http://www.commandprompt.com/ >>>>> UNIQUE NOT NULL >>>>> Donate to the PostgreSQL Project: >>>>> http://www.postgresql.org/about/donate >>>>> PostgreSQL Replication: http://www.commandprompt.com/products/ >>>> >>>>> >> >> >>> - ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 6: explain analyze is your friend >> >> >> >>> - -- >> >>> === The PostgreSQL Company: Command Prompt, Inc. === >>> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 >>> PostgreSQL solutions since 1997 http://www.commandprompt.com/ >>> UNIQUE NOT NULL >>> Donate to the PostgreSQL Project: http://www.postgresql.org/about/ >>> donate >>> PostgreSQL Replication: http://www.commandprompt.com/products/ >> >>> > > > - -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 > PostgreSQL solutions since 1997 http://www.commandprompt.com/ > UNIQUE NOT NULL > Donate to the PostgreSQL Project: http://www.postgresql.org/about/ > donate > PostgreSQL Replication: http://www.commandprompt.com/products/ > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFG0HJ5ATb/zqfZUUQRAuEdAJwNwsr/XCsr85tElSVbRVMUHME+PACglbJK > gj5cZgOtgEEjUPph0jpsOcw= > =u7Ox > -----END PGP SIGNATURE----- > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQIVAwUBRtCHvPyqRf6YpodNAQLDuBAAp+dg1MHZy+hjZ8Zk9OQTbeADgJqGPpi9 G7+y3iyaaqOF66TC52P7OaqO6nPhoNCNMCxwztnASyxpftGD5yJ4AZTSGcbAsWB9 cO5mE1mgbngZNPnRLypeJ81hyE6bniNNL7xXSq9LB8wRMczFZwGVZT66+lMIFjvv 0OrbAcSNUFqddky7EFm8gx6A2FNIzSdFB0dNbzpKwEOTnCHKvh+O99sAr/LB7mmL Hj/wzeQKrWbDAB3+N9rczivZ03DvYAGbUY9qBfNj7Y9YL3iu/Q+Oy4bHtI6d/a7B wepol2xe1sYEtQ+R3yMPXFte0483n8XIdXxa412ZSIEBfLxHzV6M7JTbPtgWwE+9 7xvyYbO7xQL9N/P8ZGg75eEqXtUrepGmJG0Y30qF5sNdMG0pWoz1bzDjSLNCnylq JwsO8p1EHNPnPRqotwZZSfLUW16eREqLaOrSC84gIw5Q6zAMZe/k2ckzzHKPGB1c sckaQROcgK4Lu9ywjRjBjNqclOMasf0MCrsDVMQE/wnh4GoDL/PAyEOqnlpvJ+cx k4kmOrEz5GRZQehHUI7CdejFwZ32sAB+nV2r8zDW9FSxgoRoFvtE2hooJ9orv0IU 1F8TeBdifVP/Ef/lHAHs6IqEH45y72WqrWFZsIdU1PDe0MyfgMaOBwdwXNeZqky/ IF5SMKbl9yA= =F9Oq -----END PGP SIGNATURE-----
pgsql-general by date: