I am open to any scripting language that can do the job. Have a bunch of html files that need data extracted and deleted from them. number of characters is unknown transverses several lines start and end can be in the middle of a line data is delimited with <TPANE ... /TPANE> Let the suggestions begin
You can do this with PHP - I wrote a little script. In this example I assume that the script, the source folder (that contains your html files) and the destination folder (there you'll find the cleaned files and the extracted content) are in the same directory. Fit the script to your needs. Code: <?php // Basic configuration $path['source'] = getcwd()."/source/"; $path['destination'] = getcwd()."/destination/"; $delimiter['start'] = "<TPANE>"; $delimiter['end'] = "<\/TPANE>"; $delimiter['extracted'] = "\n"; $extension['clean'] = ".clean"; $extension['extracted'] = ".extracted"; // Check directories !is_dir($path['source']) ? exit("The source directory does not exist!") : ""; !is_dir($path['destination']) ? exit("The destination directory does not exist!") : ""; !is_writeable($path['destination']) ? exit("The destination directory is not writeable - check the permissions!") : ""; if ($dir = opendir($path['source'])) { while ($file = readdir($dir)) { if (is_file($path['source'].$file)) { // Get the contents $content['original'] = file_get_contents($path['source'].$file); // Find out what to extract preg_match_all("/({$delimiter['start']}[^({$delimiter['end']})]*{$delimiter['end']})/s", $content['original'], $content['extracted']); if(isset($content['extracted'][0][0])) { // Clean the contents $content['clean'] = str_replace($content['extracted'][0], "", $content['original']); // Write the cleaned content into the destination directory file_put_contents($path['destination'].$file.$extension['clean'], $content['clean']); // Write the extracted content into the destination directory file_put_contents($path['destination'].$file.$extension['extracted'], implode($delimiter['extracted'], $content['extracted'][0])); } else { // Nothing to clean - write the original content into the destination directory file_put_contents($path['destination'].$file.$extension['clean'], $content['original']); // Write the extracted content into the destination directory file_put_contents($path['destination'].$file.$extension['extracted'], "There was nothing to extract"); } } } closedir($dir); } else exit("The source directory could not be opened!"); ?> You can execute the script via command line Code: php %scriptname% Best regards, Olli EDIT: Fixed a few bugs - it was a bit late last night
Thanks, I will give it a try. Didn't occur to me that php could be run from a command line, I'll see what is out there for msVista .