开发者

Detect and parse embedded video in html?

开发者 https://www.devze.com 2023-01-05 09:01 出处:网络
I am working on a project which requires me to detect and extract the embed code of videos on a web page.

I am working on a project which requires me to detect and extract the embed code of videos on a web page.

I know the <object> t开发者_JAVA百科ag is used to embed videos, however, the specification says that it can also be used for other things like images.

So how do i deterministically know that an <object> tag contains a video within? or is there some other way to find this out?


Historically, the <object> tag was intended to be used as a way to embed media such as video and audio in an HTML document. But as web video evolved, it turned out you can't provide a reasonable user experience without integrating video controls to your web app, and the de-facto standard for embedding video in an HTML was to embed a flash player (using <embed> or <object>) and to access the video from within that flash presentation. (In HTML5, you have the <video> object for that purpose, but I guess you don't have such control on the HTML files you need to process).

So usually, when you see an <object> element used for playing video, the object being referenced is actually an SWF - a flash presentation - which runs its own code that links to the video file. But a flash presentation may or may not contain a video, as well as many other things. So if you want to detect videos in <object>s, your options are

  1. Have a list of all SWF files/URLs that are in fact video players. This method is easiest but bear in mind that you will have a lot of false negatives.
  2. Programmatically evaluate the HTML you're parsing in a sandboxed browser, and detect the video from the screen capture. This is probably a huge effort but will solve your problem perfectly.
  3. Download and decompile the SWF files referenced by the object tags, and implement a heuristic to figure out whether they contain an embedded video. I'm saying heuristic because an SWF is basically a program, and if you can figure out a deterministic method to know if a program plays video, you might as well try to figure out whether the program halts.
0

精彩评论

暂无评论...
验证码 换一张
取 消