deleting characters from many lines

Discussion in 'Programming/Scripts' started by chipsafts, Apr 16, 2008.

  1. chipsafts

    chipsafts New Member

    I am open to any scripting language that can do the job.

    Have a bunch of html files that need data extracted and deleted from them.
    • number of characters is unknown
    • transverses several lines
    • start and end can be in the middle of a line
    • data is delimited with <TPANE ... /TPANE>

    Let the suggestions begin :)
     
  2. o.meyer

    o.meyer New Member Moderator

    You can do this with PHP - I wrote a little script. In this example I assume that the script, the source folder (that contains your html files) and the destination folder (there you'll find the cleaned files and the extracted content) are in the same directory. Fit the script to your needs.

    Code:
    <?php
    
    // Basic configuration
    $path['source'] 	= getcwd()."/source/";
    $path['destination'] 	= getcwd()."/destination/";
    $delimiter['start'] 	= "<TPANE>";
    $delimiter['end'] 	= "<\/TPANE>";
    $delimiter['extracted'] = "\n";
    $extension['clean']	= ".clean";
    $extension['extracted']	= ".extracted";
    
    // Check directories
    !is_dir($path['source']) ? exit("The source directory does not exist!") : "";
    !is_dir($path['destination']) ? exit("The destination directory does not exist!") : "";
    !is_writeable($path['destination']) ? exit("The destination directory is not writeable - check the permissions!") : "";
    
    if ($dir = opendir($path['source'])) {
    
    	while ($file = readdir($dir)) {
    		
    		if (is_file($path['source'].$file)) {
    		
    		// Get the contents
    		$content['original'] = file_get_contents($path['source'].$file);
    		
    		// Find out what to extract
    		preg_match_all("/({$delimiter['start']}[^({$delimiter['end']})]*{$delimiter['end']})/s", $content['original'], $content['extracted']);
    		
    			if(isset($content['extracted'][0][0])) {
    
    				// Clean the contents
    				$content['clean'] = str_replace($content['extracted'][0], "", $content['original']);
    
    				// Write the cleaned content into the destination directory
    				file_put_contents($path['destination'].$file.$extension['clean'], $content['clean']);
    
    				// Write the extracted content into the destination directory
    				file_put_contents($path['destination'].$file.$extension['extracted'], implode($delimiter['extracted'], $content['extracted'][0]));
    			}
    			else {
    				// Nothing to clean - write the original content into the destination directory
    				file_put_contents($path['destination'].$file.$extension['clean'], $content['original']);
    
    				// Write the extracted content into the destination directory
    				file_put_contents($path['destination'].$file.$extension['extracted'], "There was nothing to extract");
    			}
    		}
    	}
    	closedir($dir);
    }
    else exit("The source directory could not be opened!");
    
    ?>
    You can execute the script via command line

    Code:
    php %scriptname%
    Best regards,

    Olli

    EDIT: Fixed a few bugs - it was a bit late last night :rolleyes:
     
    Last edited: Apr 17, 2008
  3. chipsafts

    chipsafts New Member

    Thanks, I will give it a try.
    Didn't occur to me that php could be run from a command line, I'll see what is out there for msVista .
     

Share This Page