[Delphi] Get the Raw HTML Source from TWebBrowser

TWebBrowser is Delphi’s wrapper for IWebBrowser2, so the technique covered here also applies to other languages that supports COM.

IWebBrowser2.Document implements IHTMLDocument, IHTMLDocument2 and IHTMLDocument3. Those interfaces, however, did not provide any method to get the source. The closest you can get is by getting the outterHTML property of the IHTMLElement that presents the ‘html’ tag. i.e.

source := ((((WebBrowser1.Document as IHTMLDocument2).all.tags('html') as IHTMLElementCollection).item(0, '')) as IHTMLElement).outerHTML;

However via this method the HTML you get is generated by MSHTML, via the DOM tree, not the original HTML file you downloaded from the server. And all the tags become ugly UPPERCASE.

To get the raw source, we have to turn to another interface, IPersistStreamInit, which provide support for stream-based persistence. Using the Save method of IPersistStreamInit, we can save the raw source to a stream and read it to a string. (Here I took advantage of Delphi’s various wrapper and helper class. Sometimes you can’t help but feel sorry that VCL and Delphi’s nearly dead)

function GetRawHTML(WB: TWebBrowser);
var
stream: IPersistStreamInit;
buffer: TStringStream;
begin
   buffer := TStringStream.Create('');
   try
      stream := WB.Document as IPersistStreamInit;
      if not Assigned(stream) then
         result := ''
      else
         if Succeeded(stream.Save(TStreamAdapter.Create(buffer), true)) then
            result := buffer.DataString;
   finally
        FreeAndNil(buffer);
   end;
end;

More about IPersistStreamInit on MSDN :
http://msdn.microsoft.com/en-us/library/ms682273(VS.85).aspx