开发者

Most efficient way to compare/match two large arrays?

开发者 https://www.devze.com 2023-01-18 19:48 出处:网络
I am writing a very process-intensive function in PHP that needs to be as optimized as it can get for speed, as it can take up to 60 seconds to complete in extreme cases. This is my situation:

I am writing a very process-intensive function in PHP that needs to be as optimized as it can get for speed, as it can take up to 60 seconds to complete in extreme cases. This is my situation:

I am trying to match an array of people to an XML list of jobs. The array of people have keywords that I have already analyzed, delimited by spaces. The jobs are from a large XML file.

It's currently setup like this:

$matches = new array();
foreach($people as $person){
    foreach($jobs as $job){
        foreach($person['keywords'] as $keyword){
            $count = substr_count($job->title, $keyword);
            if($count > 0) $matches[$job->title] = $count;
        }
    }
开发者_如何学Python}

I do the keywords loop a few times with different categories. It does what I need it to do, but it feels very sloppy and the process can take a very, very long time depending on the number of people/jobs.

Is there a more efficient, or faster, way of doing this?


$matches = new array();
foreach($people as $person){
    foreach($jobs as $job){
        foreach($person['keywords'] as $keyword){
            $count = substr_count($job->title, $keyword);
            if($count > 0) $matches[$job->title] = $count;
        }
    }
}

Truthfully, your method is a bit sloppy, but I assume that's because you have some specially formatted data that you have to work around? Although other than just being sloppy, I see a bit of lost data in the way you're processing things that I don't think was intentional.

I see that you're not just checking "is the keyword in the job title", but "how many times is the keyword in the job title" and then you're storing this. This means for the job title friendly friend of the friend company, the "keyword" friend shows up 3 times, and thus $matches["friendly friend of the friend company"] = 3. Since you're declaring $matches before you being your $people foreach loop, though, this means you keep over-writing this value any time a new person has that keyword. In other words, if the first person has the keyword "friend" then $matches["friendly friend of the friend company"] is set to 3. Then if the second person has the keyword "friendly", this value is over-written and $matches["friendly friend of the friend company"] now equals 1.

I think what you wanted to do was count how many people have a keyword which is contained in the job title. In this case, rather than counting how many times $keyword appears in $job->title, you should just see if it appears, and respond accordingly.

$matches = new array();
foreach($people as $person){
    foreach($jobs as $job){
        foreach($person['keywords'] as $keyword){
            if(strpos($job->title, $keyword) !== FALSE) /* "If $keyword exists in $job->title" */
                $matches[$job->title]++; /* Increment "number of people who match" */
        }
    }
}

Another possibility is that you wanted to know how many keywords a given person had which matched a given job title. In this case you'd want a separate array per person. This is done with a slight modification.

$matches = new array();
foreach($people as $person){
    $matches[$person] = new array();
    foreach($jobs as $job){
        foreach($person['keywords'] as $keyword){
            if(strpos($job->title, $keyword) !== FALSE) /* "If $keyword exists in $job->title" */
                $matches[$person][$job->title]++; /* Increment "number of keywords which match" */
        }
    }
}

Or, alternatively, you could return to counting how many times a keyword matches now since per-person this is actually a meaningful value ("how well does the job match")

$matches = new array();
foreach($people as $person){
    $matches[$person] = new array();
    foreach($jobs as $job){
        foreach($person['keywords'] as $keyword){
            if($count = substr_count($job->title, $keyword)) /* if(0) = false */
                $matches[$person][$job->title] += $count; /* Increase "number of keywords which match" by $count */
        }
    }
}

Essentially, before tackling the problem of making your loop for efficient, you need to figure out what it is your loop is really trying to accomplish. Figure this out and then your best bet for increasing the efficiency is to just decrease the number of iterations of the loop to a minimum and use as many built-in functions as possible since these are implemented in C (a non-interpreted and therefore quicker-running language).


You could use an index of the words in the job titles to make the lookup more efficient:

$jobsByWords = array();
foreach ($jobs as &$job) {
    preg_match_all('/\w+/', strtolower($jobs->title), $words);
    foreach ($words[0] as $word) {
        if (!isset($jobsByWords[$word])) $jobsByWords[$word] = array();
        $jobsByWords[$word][] = &$job;
    }
}

Then you just iterate the people and check if the keywords are in the index:

foreach ($people as $person) {
    foreach ($person['keywords'] as $keyword) {
        $keyword = strtolower($keyword);
        if (isset($jobsByWords[$keyword])) {
            foreach ($jobsByWords[$keyword] as &$job) {
                $matches[$job->title] = true;
            }
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消