开发者

How do I extract values from a html page stored as string using curl function

开发者 https://www.devze.com 2023-01-24 17:40 出处:网络
I am using PHP / curl to get a HTML into a string and then i need to extract the following data and then project a graph out of it .

I am using PHP / curl to get a HTML into a string and then i need to extract the following data and then project a graph out of it .

The data I want looks like :

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />

  <title></title>
</head>

<body>
  <table>
    <tbody>
      <tr>
        <td>
          <h3>Income</h3>
        </td>
      </tr>

      <tr>
        <td>Operating income</td>

        <td class="numericalColumn">22,922.00</td>

        <td class="numericalColumn">21,507.30</td>

        <td class="numericalColumn">17,492.60</td>

        <td class="numericalColumn">13,683.90</td>

        <td class="numericalColumn">10,227.12</td>
      </tr>

      <tr>
        <td>
          <h3>Expenses</h3>
        </td>
      </tr>

      <tr>
        <td>Material consumed</td>

        <td class="numericalColumn">4,029.40</td>

        <td class="numericalColumn">3,442.60</td>

        <td class="numericalColumn">2,952.30</td>

        <td class="numericalColumn">1,889.00</td>

        <td class="numericalColumn">1,367.67</td>
      </tr>

      <tr>
        <td>Manufacturing expenses&nbsp;</td>

        <td class="numericalColumn">2,213.20</td>

        <td class="numericalColumn">1,841.80</td>

        <td class="numericalColumn">299.80</td>

        <td class="numericalColumn">120.50</td>

        <td class="numericalColumn">1,020.70</td>
      </tr>

      <tr>
        <td>Personnel expenses</td>

        <td class="numericalColumn">9,062.80</td>

        <td class="numericalColumn">9,249.80</td>

        <td class="numericalColumn">7,409.10</td>

        <td class="numericalColumn">5,768.20</td>

        <td class="numericalColumn">4,279.03</td>
      </tr>

      <tr>
        <td>Selling expenses</td>

        <td class="numericalColumn">378.10</td>

        <td class="numericalColumn">308.40</td>

        <td class="numericalColumn">532.10</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">171.05</td>
      </tr>

      <tr>
        <td>Adminstrative expenses</td>

        <td class="numericalColumn">1,737.00</td>

        <td class="numericalColumn">1,906.00</td>

        <td class="numericalColumn">2,583.70</td>

        <td class="numericalColumn">2,651.70</td>

开发者_运维知识库        <td class="numericalColumn">904.78</td>
      </tr>

      <tr>
        <td>Expenses capitalised</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>
      </tr>

      <tr>
        <td>Cost of sales</td>

        <td class="numericalColumn">17,420.50</td>

        <td class="numericalColumn">16,748.60</td>

        <td class="numericalColumn">13,777.00</td>

        <td class="numericalColumn">10,429.40</td>

        <td class="numericalColumn">7,743.22</td>
      </tr>

      <tr>
        <td>Operating profit</td>

        <td class="numericalColumn">5,501.50</td>

        <td class="numericalColumn">4,758.70</td>

        <td class="numericalColumn">3,715.60</td>

        <td class="numericalColumn">3,254.50</td>

        <td class="numericalColumn">2,483.90</td>
      </tr>

      <tr>
        <td>Other recurring income</td>

        <td class="numericalColumn">434.20</td>

        <td class="numericalColumn">468.20</td>

        <td class="numericalColumn">326.90</td>

        <td class="numericalColumn">288.70</td>

        <td class="numericalColumn">113.59</td>
      </tr>

      <tr>
        <td>Adjusted PBDIT</td>

        <td class="numericalColumn">5,935.70</td>

        <td class="numericalColumn">5,226.90</td>

        <td class="numericalColumn">4,042.50</td>

        <td class="numericalColumn">3,543.20</td>

        <td class="numericalColumn">2,597.49</td>
      </tr>

      <tr>
        <td>Financial expenses</td>

        <td class="numericalColumn">108.40</td>

        <td class="numericalColumn">196.80</td>

        <td class="numericalColumn">116.80</td>

        <td class="numericalColumn">7.20</td>

        <td class="numericalColumn">3.13</td>
      </tr>

      <tr>
        <td>Depreciation&nbsp;</td>

        <td class="numericalColumn">579.60</td>

        <td class="numericalColumn">533.60</td>

        <td class="numericalColumn">456.00</td>

        <td class="numericalColumn">359.80</td>

        <td class="numericalColumn">292.26</td>
      </tr>

      <tr>
        <td>Other write offs</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>
      </tr>

      <tr>
        <td>Adjusted PBT</td>

        <td class="numericalColumn">5,247.70</td>

        <td class="numericalColumn">4,496.50</td>

        <td class="numericalColumn">3,469.70</td>

        <td class="numericalColumn">3,176.20</td>

        <td class="numericalColumn">2,302.10</td>
      </tr>

      <tr>
        <td>Tax charges&nbsp;</td>

        <td class="numericalColumn">790.80</td>

        <td class="numericalColumn">574.10</td>

        <td class="numericalColumn">406.40</td>

        <td class="numericalColumn">334.10</td>

        <td class="numericalColumn">286.10</td>
      </tr>

      <tr>
        <td>Adjusted PAT</td>

        <td class="numericalColumn">4,456.90</td>

        <td class="numericalColumn">3,922.40</td>

        <td class="numericalColumn">3,063.30</td>

        <td class="numericalColumn">2,842.10</td>

        <td class="numericalColumn">2,016.00</td>
      </tr>

      <tr>
        <td>Non recurring items</td>

        <td class="numericalColumn">441.10</td>

        <td class="numericalColumn">-948.60</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">38.33</td>
      </tr>

      <tr>
        <td>Other non cash adjustments</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-33.85</td>
      </tr>

      <tr>
        <td>Reported net profit</td>

        <td class="numericalColumn">4,898.00</td>

        <td class="numericalColumn">2,973.80</td>

        <td class="numericalColumn">3,063.30</td>

        <td class="numericalColumn">2,842.10</td>

        <td class="numericalColumn">2,020.48</td>
      </tr>

      <tr>
        <td>Earnigs before appropriation</td>

        <td class="numericalColumn">4,898.00</td>

        <td class="numericalColumn">2,973.80</td>

        <td class="numericalColumn">3,063.30</td>

        <td class="numericalColumn">2,842.10</td>

        <td class="numericalColumn">2,020.48</td>
      </tr>

      <tr>
        <td>Equity dividend</td>

        <td class="numericalColumn">880.90</td>

        <td class="numericalColumn">586.00</td>

        <td class="numericalColumn">876.50</td>

        <td class="numericalColumn">873.70</td>

        <td class="numericalColumn">712.88</td>
      </tr>

      <tr>
        <td>Preference dividend</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>

        <td class="numericalColumn">-</td>
      </tr>

      <tr>
        <td>Dividend tax</td>

        <td class="numericalColumn">128.30</td>

        <td class="numericalColumn">99.60</td>

        <td class="numericalColumn">148.90</td>

        <td class="numericalColumn">126.80</td>

        <td class="numericalColumn">99.98</td>
      </tr>

      <tr>
        <td>Retained earnings</td>

        <td class="numericalColumn">3,888.80</td>

        <td class="numericalColumn">2,288.20</td>

        <td class="numericalColumn">2,037.90</td>

        <td class="numericalColumn">1,841.60</td>

        <td class="numericalColumn">1,207.62</td>
      </tr>
    </tbody>
  </table>
</body>
</html>

I want to extract each value like Manufacturing Data and the values of all the years mentioned in that line. How do I go about this?

I found something like preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match); but that doesn't get the values I want.


If i understood you question well you want something like this to be done. this was written by me so if you need clarifications i'd love to help.

cheers !


You can use libraries like PHP Simple HTML DOM Parser to extract data from HTML/XHTML.
http://simplehtmldom.sourceforge.net/manual.htm

An example:

$pageDom = str_get_html( $rawHtmlData );
foreach( $pageDom->find( 'td' ) as $tblElem )
{
    if( FALSE !== stristr( $tblElem->innertext, 'Manufacturing expenses' ) )
    {
        // Do stuff
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消