PDA

View Full Version : deleting characters from many lines


chipsafts
16th April 2008, 20:23
I am open to any scripting language that can do the job.

Have a bunch of html files that need data extracted and deleted from them.

number of characters is unknown
transverses several lines
start and end can be in the middle of a line
data is delimited with <TPANE ... /TPANE>


Let the suggestions begin :)

o.meyer
17th April 2008, 05:19
You can do this with PHP - I wrote a little script. In this example I assume that the script, the source folder (that contains your html files) and the destination folder (there you'll find the cleaned files and the extracted content) are in the same directory. Fit the script to your needs.

<?php

// Basic configuration
$path['source'] = getcwd()."/source/";
$path['destination'] = getcwd()."/destination/";
$delimiter['start'] = "<TPANE>";
$delimiter['end'] = "<\/TPANE>";
$delimiter['extracted'] = "\n";
$extension['clean'] = ".clean";
$extension['extracted'] = ".extracted";

// Check directories
!is_dir($path['source']) ? exit("The source directory does not exist!") : "";
!is_dir($path['destination']) ? exit("The destination directory does not exist!") : "";
!is_writeable($path['destination']) ? exit("The destination directory is not writeable - check the permissions!") : "";

if ($dir = opendir($path['source'])) {

while ($file = readdir($dir)) {

if (is_file($path['source'].$file)) {

// Get the contents
$content['original'] = file_get_contents($path['source'].$file);

// Find out what to extract
preg_match_all("/({$delimiter['start']}[^({$delimiter['end']})]*{$delimiter['end']})/s", $content['original'], $content['extracted']);

if(isset($content['extracted'][0][0])) {

// Clean the contents
$content['clean'] = str_replace($content['extracted'][0], "", $content['original']);

// Write the cleaned content into the destination directory
file_put_contents($path['destination'].$file.$extension['clean'], $content['clean']);

// Write the extracted content into the destination directory
file_put_contents($path['destination'].$file.$extension['extracted'], implode($delimiter['extracted'], $content['extracted'][0]));
}
else {
// Nothing to clean - write the original content into the destination directory
file_put_contents($path['destination'].$file.$extension['clean'], $content['original']);

// Write the extracted content into the destination directory
file_put_contents($path['destination'].$file.$extension['extracted'], "There was nothing to extract");
}
}
}
closedir($dir);
}
else exit("The source directory could not be opened!");

?>

You can execute the script via command line

php %scriptname%

Best regards,

Olli

EDIT: Fixed a few bugs - it was a bit late last night :rolleyes:

chipsafts
9th May 2008, 19:20
Thanks, I will give it a try.
Didn't occur to me that php could be run from a command line, I'll see what is out there for msVista .