Accessing Website Content

by Oct 30, 2018

Typically, it is trivial for PowerShell to retrieve raw HTML website content by using Invoke-WebRequest. A script can then take the HTML content and do whatever it wants with it, for example scrape information from it using regular expressions:

$url = "www.tagesschau.de"
$w = Invoke-WebRequest -Uri $url -UseBasicParsing
$w.Content

However, sometimes a website content is created dynamically using client-side script code. Then, Invoke-WebRequest does not return the full HTML that is seen in a browser. To still get to the HTML content, in these cases you need to employ a true web browser. One simple approach is to use the built-in Internet Explorer:

$ie = New-Object -ComObject InternetExplorer.Application
$ie.Navigate($url)
do
{
   Start-Sleep -Milliseconds 200
} while ($ie.ReadyState -ne 4)

$ie.Document.building.innerHTML

Twitter This Tip! ReTweet this Tip!