I am scraping a site, and it will scrape two address boxes - each of which may have minor differences.
One of the addresses is like this:
ONE MICROSOFT WAY
REDMOND WA 98052-6399
425-882-8080
And the other is like this:
ONE MICROSOFT WAY
REDMOND WA 98052-6399
I save the entire string for both (there's HTML tags around the开发者_开发知识库m in the original but I didn't think it was necessary to illustrate my point), and then separate based on those HTML tags. This means it processes each newline (i.e. ONE MICROSOFT WAY) as a separate variable.
What I want to do is to see if there are duplicates between the addresses, the problem is that they're separate values in the first array, and then in the internal array, it is separating each of the two addresses line by line.
So basically, is there a way to check for duplicate values?
Here is sample data:
<div class="mailer">
Mailing Address
<span class="mailerAddress">ONE MICROSOFT WAY</span>
<span class="mailerAddress">REDMOND WA 98052-6399</span>
<div class="mailer">
Business Address
<span class="mailerAddress">ONE MICROSOFT WAY</span>
<span class="mailerAddress">REDMOND WA 98052-6399</span>
<span class="mailerAddress">425-882-8080</span>
I used the following method to clean data..
first, find the pattern, maybe like if the array A is array('Hello', 'World')
and array B is array('Hello World')
you can merge by saying if(count(array) > 1) array = array[0] . ' ' .array[1]
as for your case, lets say, each line is wrapped by HTML tags, but each line is stored within different array, am i wrong?
I would be glad if you can give sample data.. I'll use whatever for my sample code below...
<?
$sampleData = array(
array('<p>ONE MICROSOFT WAY</p>', 'REDMOND'),
array('<p>ONE MICROSOFT WAY</p>', 'REDMOND', 'Number')
);
foreach($data as $value) {
unset($newKey);
$newKey = trim(strip_tags($value[0])).trim(strip_tags($value[1]));
$cleanData[$newKey] = $value;
}
?>
the point is, same keys will overwrite, ending up outputting unique keys (which stores unique values)...
another sample is that if you wanted to clear similar email address out of data stored in csv/array..
<?
foreach($data as $value) {
$cleanData[$value['email']] = $value;
}
?>
as simple as that.. $cleanData
should now contain no data with similar email addresses..
精彩评论