Analyzing Web Page Content

by Nov 8, 2018

PowerShell comes with a built-in web client which can retrieve HTML content for you. For a simple web page analysis, use the -UseBasicParsing parameter. This gets you the raw HTML content as well as, for example, a list of all embedded links and images:

$url = "http://powershellmagazine.com"
$page = Invoke-WebRequest -URI $url -UseBasicParsing

$page.Content | Out-GridView -Title Content
$page.Links | Select-Object href, OuterHTML | Out-GridView -Title Links
$page.Images | Select-Object src, outerHTML | Out-GridView -Title Images

If you omit the -UseBasicParsing parameter, the cmdlet internally uses the Internet Explorer document object model and can return more detailed information:

$url = "http://powershellmagazine.com"
$page = Invoke-WebRequest -URI $url 

$page.Links | Select-Object InnerText, href | Out-GridView -Title Links 

Note that Invoke-WebRequest requires that you have set up and at least opened once the Internet Explorer unless you specify -UseBasicParsing.

Twitter This Tip! ReTweet this Tip!