Thread: Convert literature string via Regular Expressions
Hi all,
I'm having difficulties getting the following literature strings ripped to prepare it to be inserted into the database.
Here 2 example strings:
Pattern is like this:
Author(s) (year): Title in German or English. [If filled than former title was a German one and this one is the English translation.] - Source issue, pages. City.
Author:
Year:
Title EN or DE:
Title EN:
Source:
Issue:
Pages:
Press City:
I tried something like this:
Any idea how to split the string in the appropriate parts?
Many thanks,
Bastiaan
I'm having difficulties getting the following literature strings ripped to prepare it to be inserted into the database.
Here 2 example strings:
Hauser, M., Geller-Grimm, F. (1995): Bestimmungsschlüssel für die Weibchen der deutschen Sphegina-Arten (Diptera, Syrphidae). [Key to distinguish the females of the Sphegina species known from Germany (Diptera, Syrphidae).] - Entomology 2(1/2), 3-19. London. Mazánek, L., Láska, P., Bicik, V. (1999): Two new Palaearctic species of Eupeodes similar to E. bucculatus (Diptera, Syrphidae) [] - Volucella 4, 1-9. Stuttgart.
Pattern is like this:
Author(s) (year): Title in German or English. [If filled than former title was a German one and this one is the English translation.] - Source issue, pages. City.
Author:
Year:
Title EN or DE:
Title EN:
Source:
Issue:
Pages:
Press City:
I tried something like this:
preg_match ("/^[..something..]+/", $string, $regs); echo ("Author: ".$regs[1]."<br />");echo ("Year: ".$regs[2]."<br />");echo ("Title EN or DE: ".$regs[3]."<br />");echo ("Title EN: ".$regs[4]."<br />");echo ("Source: ".$regs[5]."<br />");echo ("Issue: ".$regs[6]."<br />");echo ("Pages: ".$regs[7]."<br />");echo ("Press City: ".$regs[8]."<br />");But I'm having problems with the spaces and the parentheses that I somehow can't use in the matching...
Any idea how to split the string in the appropriate parts?
Many thanks,
Bastiaan
-- Bastiaan Wakkie <bastiaaw@dds.nl> www.syrphidae.com |
Hey,
I Found it out at the end! Thanks anyway.
Here the code to find out for someone that is interested:
Cool he! I'm starting to like regular expressions. So now I can happily import 1000 new rows without any problem.
bye,
Bastiaan
On Mon, 2003-11-10 at 16:50, Bastiaan Wakkie wrote:
I Found it out at the end! Thanks anyway.
Here the code to find out for someone that is interested:
<?php $filename="Literature.txt"; $handle = fopen ($filename, "r"); while (!feof ($handle)) { $buffer = fgets($handle, 1024); echo ("<hr><p>".$buffer."<>"); //.....................Author.................Year.................Title DE or EN.............EN Title..............................Source.....issue...........pages......Press $match=preg_match ("/^([\wÜÖäüßéñóíá,.-\s]*)\(([\d]{4})\)*:\s([\)\(ÜÖäüßéñóíá,.:&?!-\w\s]*)\[([\)\(ÜÖäüßéñóíá,.:&?!-é\w\s]*)\]\s-\s([\s\w]*)\s([\(\/\)\d]*),\s([\d-]*).\s([\w]*)/", $buffer, $regs); if ($match){ echo ("<p>Matched till now in: $line <br>------------------------> <i>".$regs[0]."</i>"); echo ("<table border=\"2\"><tr><td>Author:</td><td><i>".$regs[1]."</i></td></tr>"); echo ("<tr><td>Year:</td><td><i>".$regs[2]."</i></td></tr>"); echo ("<tr><td>Title EN or DE:</td><td><i>".$regs[3]."</i></td></tr>"); echo ("<tr><td>Title EN:</td><td><i>".$regs[4]."</i></td></tr>"); echo ("<tr><td>Source:</td><td><i>".$regs[5]."</i></td></tr>"); echo ("<tr><td>Issue:</td><td><i>".$regs[6]."</i></td></tr>"); echo ("<tr><td>Pages:</td><td><i>".$regs[7]."</i></td></tr>"); echo ("<tr><td>Press City:</td><td><i>".$regs[8]."</i></td></tr></table>"); } else{ echo "<div style=\"color:red\">String did not match!</div></p>"; } } fclose ($handle); ?>
Cool he! I'm starting to like regular expressions. So now I can happily import 1000 new rows without any problem.
bye,
Bastiaan
On Mon, 2003-11-10 at 16:50, Bastiaan Wakkie wrote:
Hi all,
I'm having difficulties getting the following literature strings ripped to prepare it to be inserted into the database.
Here 2 example strings:Hauser, M., Geller-Grimm, F. (1995): Bestimmungsschlüssel für die Weibchen der deutschen Sphegina-Arten (Diptera, Syrphidae). [Key to distinguish the females of the Sphegina species known from Germany (Diptera, Syrphidae).] - Entomology 2(1/2), 3-19. London. Mazánek, L., Láska, P., Bicik, V. (1999): Two new Palaearctic species of Eupeodes similar to E. bucculatus (Diptera, Syrphidae) [] - Volucella 4, 1-9. Stuttgart.Pattern is like this:
Author(s) (year): Title in German or English. [If filled than former title was a German one and this one is the English translation.] - Source issue, pages. City.
Author:
Year:
Title EN or DE:
Title EN:
Source:
Issue:
Pages:
Press City:
I tried something like this:preg_match ("/^[..something..]+/", $string, $regs); echo ("Author: ".$regs[1]."<br />"); echo ("Year: ".$regs[2]."<br />"); echo ("Title EN or DE: ".$regs[3]."<br />"); echo ("Title EN: ".$regs[4]."<br />"); echo ("Source: ".$regs[5]."<br />"); echo ("Issue: ".$regs[6]."<br />"); echo ("Pages: ".$regs[7]."<br />"); echo ("Press City: ".$regs[8]."<br />");But I'm having problems with the spaces and the parentheses that I somehow can't use in the matching...
Any idea how to split the string in the appropriate parts?
Many thanks,
Bastiaan
--
Bastiaan Wakkie <bastiaaw@dds.nl>
www.syrphidae.com
-- Bastiaan Wakkie <bastiaaw@dds.nl> www.syrphidae.com |