开发者

How does bing video search extract video from so many different websites?

开发者 https://www.devze.com 2022-12-29 21:44 出处:网络
Are t开发者_JAVA百科hey decompiling the flash or something like this? I can\'t imagine how they have done it.Just speculation, but they could see what the Flash SWF file is connecting to (ie. Finding

Are t开发者_JAVA百科hey decompiling the flash or something like this? I can't imagine how they have done it.


Just speculation, but they could see what the Flash SWF file is connecting to (ie. Finding the FLV url, based on the HTTP request made by the SWF file). Once they do that, they could do one of two things:

1) Queue the url to a process which: i) Downloads the FLV, ii) Snips the FLV to be 10 seconds long, iii) Adds fade in/fade out, iv) Saves.

or

2) They could just connect directly to the FLV each time using the original url, and play only 10 seconds. They could then add effects like the fade in/fade out over top the video. Though,

I'm doubtful they'd use the second method, as it could cause annoying spikes to people's servers, and it could potentially increase lag. The first method allows the Bing servers to cache the videos, and host them in one reliable location that's dedicated to video streaming.

Update

Come to think of it, there's another method to do this:

I know that in PHP you can decompile a compiled SWF on the fly. It's rather quick, and this would be an easy way of extracting any urls. Of course, Microsoft wouldn't be using PHP, but I'm pretty sure they have an equivalent library written in C++ (I'm fairly sure they use C++).

But even if they were looking for HTTP requests to an FLV, they'd probably have a crawler running in a light-weight "browser." The browser would need to render the flash so that it then makes the HTTP request, and it would then log all out bound requests. This isn't too difficult a task if you're running your own server, you can just have a background process that sits there and scours the logs looking for FLV requests. Creating your own browser to do this may sound daunting, but it's actually pretty simple(ish): In C# you could make an HttpRequest to a URL, scan the document for any links, queue the links, request each link, and loop that way (making sure you don't visit links you've already visited). In PHP, you could curl the URL and do the same. Anytime you find a SWF link, you then add that to a different queue, one which could render the flash (or decompile it), and find any links to FLV urls, and then you queue those as needed.

0

精彩评论

暂无评论...
验证码 换一张
取 消