I am stuck with idea on creating proper CSV from an html table. I am using HTMLAgilityPack to read the html from string and create a HTMLDocument. Then I am using XPATH to loop through rows and columns.
The problem is that I am unable to determine the correct row and cell(x,y) for a particular cell.
Example HTML:
<html>
<body>
<table border="1">
<tr>
<td rowspan="2">
100
</td>
<td>
200
</td>
<td colspan="2">
300
</td>
</tr>
<tr>
<td colspan="2">
400
</td>
<td>
600
</td>
</tr>
<tr>
<td>
400
</td>
<td>
500
</td>
<td>
600
</td>
</tr>
</table>
</body>
</html>
Image of Table
When I open it in excel and save as CSV, I do get t开发者_JS百科he desired output, which is:
100,200,300,
,400,,600
400,500,600,
Can someone help me create the same output in .Net respecting the rowpan and colspan?
Thanks! Dex
You don't need to know which row and column are you on. All you need to do is add a "," for each new column you found and a breakline every time you reach the end of a row.
If you navigate through the document considering it an xml document all you have to do is go through all TR nodes adding a breakline when you reach the end of the child nodes list. And iterate through all TD nodes on each TR node adding a "," when necessary.
精彩评论