Articles

HTML Manipulation using html agility pack

by Technology Crowds Web & Mobile Application Developer
 In the world of dynamic HTML requirements, it is very much a part of the job to be able to manipulate the HTML content according to the demography and needs of the clients. There are four important properties of an HTML with which we can modify or change the complete contents on the fly. Each of them have been described below along with few methods that are also available to use.
Inner HTML
This is a public method, which means it could be accessed from anywhere. Using this, you could either set or get the HTML content present within the boundaries of opening and closing tags of the mentioned HTML object. If getting the content is your objective, then you would be obtaining it in a string data type. One thing to note is that the InnerHtml in html agility pack is indeed a member of the HtmlAgilityPack.
HtmlNode.
12345678910111213141516171819202122var html =
@"<body>
<h1>.Net Core</h1>
This is <b>C#, ASP.Net</b> paragraph
   <h1>
.Net Core with Angular</h1>
This is <b>HTML Agility Pack</b> sample


  </body>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/p");

foreach (var node in htmlNodes)
{

 Console.WriteLine(node.InnerHtml);
}

Ouput

This is <b>C#, ASP.Net</b> paragraph This is <b>HTML Agility Pack</b> sample

Inner Text

This method is also a public one and returns string if you are going to access it for getting the contents. The InnerText in html agility pack is your choice if all you want is just the text between the opening and closing tags of the desired HTML object. You could get the text present within the elements and thus is an easy task for you to perform the read operation dynamically. This method is also a part of the member of the HtmlAgilityPack.HtmlNode.
1234567891011121314151617181920var html =
@"<body>
<h1>
.Net Core</h1>
This is <b>C#, ASP.Net</b> paragraph
   <h1>
.Net Core with Angular</h1>
This is <b>HTML Agility Pack</b> sample
  </body>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/p");

foreach (var node in htmlNodes)
{
 Console.WriteLine(node.InnerText);
}

Output

This is C#, ASP.Net paragraph This is HTML Agility Pack sample

Outer Html

This method lets you to get the object as well as the contents inside the one that you have mentioned to it. Seemingly, it could have a resemblance with the innerHTML but there is quite a big difference when using the OuterHtml in html agility pack as with the OuterHTML you have straightaway access to the HTML object. Again, this method is a public one and returns the output in the form of a string. Needless to say, it is a part of the HtmlAgilityPack.HtmlNode.
12345678910111213141516171819var html =
@"<body>
<h1>.Net Core</h1>
<p>This is <b>C#, ASP.Net</b> paragraph</p>
   
<h1>.Net Core with Angular</h1>
<p>This is <b>HTML Agility Pack</b> sample</p>
</body>";

 var htmlDoc = new HtmlDocument();
 htmlDoc.LoadHtml(html);

 var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/p");

 foreach (var node in htmlNodes)
 {
  Console.WriteLine(node.OuterHtml);
 }

Output

<h1>.Net Core</h1> <h1>.Net Core with Angular</h1>

Parent Node

We have yet another useful feature in the form of ParentNode in html agility pack where we can obtain the handle of the parent node of the mentioned HTML object. Few times, it is necessary to know the parent node and this method fits into the right category of use. It returns the parent node and hence the method has the return type as HtmlNode. Thus, one can finally conclude that even this method is also a part of the HtmlAgilityPack.HtmlNode.
1234567891011121314151617var html =
@"<body>
<h1>.Net Core</h1>
<p>This is <b>C#, ASP.Net</b> paragraph</p>   
<h1>.Net Core with Angular</h1>
<p>This is <b>HTML Agility Pack</b> sample</p>
</body>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var node = htmlDoc.DocumentNode.SelectSingleNode("//body/h1");

HtmlNode parentNode = node.ParentNode;
Console.WriteLine(parentNode.Name);

Output

body

Sponsor Ads


About Technology Crowds Junior   Web &amp; Mobile Application Developer

0 connections, 0 recommendations, 6 honor points.
Joined APSense since, July 8th, 2020, From Delhi, India.

Created on Aug 7th 2020 08:46. Viewed 122 times.

Comments

No comment, be the first to comment.
Please sign in before you comment.