Reading the internet

Web queries are one of the most useful and coolest things for me in Excel. I use them *a lot*, because they allow me to combine the front end of Excel with back end information stored in different web servers.

However, sometimes they can be a little limiting. There’s the option of automating Internet Explorer (which I try to use as the ultimate resource, because it’s the slowest option from what I’ve seen), but there are some different alternatives as well.

The XMLHTTP object (which you can use in early binding by adding a reference to ‘Microsoft XML, vX.X’, where X.X is one of the available versions (I have versions 2.6, 3.0 and 5.0 installed on this computer). I’ve used 3.0 (for no particular reason), and I’m not sure what the difference between the versions is, but that one seems to work for what I’ve needed so far.

Ok, so, let’s get the links from a web page using a POST method. One page that I know uses this is MrExcel’s message board (I tried using this blog’s search page, but I got an error…). The sub that I used is this:

Sub TestReadHTML()
   ‘We’ll search the MrExcel board here
  Dim Url As String
   Dim HtmlBody As String
   Dim Col As Collection

   ‘The url of the page
  Url = “http://www.mrexcel.com/board2/search.php”
   Url = Url & “?search_keywords=querytable+refresh”
   
   ‘Get the html document
  HtmlBody = UseXMLHTTP(Url)
   
   ‘You can parse the HTML here, for example, retrieve all the links
  Set Col = New Collection
   Do While InStr(1, HtmlBody, “<a href”) > 0
      HtmlBody = Mid$(HtmlBody, InStr(1, HtmlBody, “</a><a href”))
     
      Col.Add Left$(HtmlBody, InStr(1, HtmlBody, “</a>”) + 3)
     
      HtmlBody = Mid$(HtmlBody, InStr(1, HtmlBody, “</a>”) + 3)
   Loop
   
   MsgBox “Found “ & Col.Count & ” links !”
   
End Sub

Of course, the function that really matters here is the UseXMLHTTP, which follows:

Function UseXMLHTTP(ByVal Url As String) As String
   Dim oHttp As Object
   Dim Parameters As String
   Dim i As Long
   On Error Resume Next
   ‘create the object, error out if not found in the machine
  Set oHttp = CreateObject(“Microsoft.XMLHTTP”)

   If Not oHttp Is Nothing Then
      ‘Make sure we have some parameters in the URL
     i = InStr(1, Url, “?”)
      If i > 0 Then
         ‘Extract the parameters from the URL
        Parameters = Mid$(Url, i + 1)
         
         ‘Now remove them from the URL, so we can send it to the XMLHTTP object
        ‘I guess we could receive two variables, but I’m lazy tonight <g>
        Url = Left$(Url, i – 1)
         
         ‘Use GET or POST as the first parameter of the .Open method (depends on the webpage)
        oHttp.Open “POST”, Url, False
         
         ‘Using the .setRequestHeader you can add more headers as needed
        oHttp.setRequestHeader “Content-Type”, “application/x-www-form-urlencoded”
         
         ‘Send the parameters (similar to the .Refresh method of the querytable)
        ‘it’s here where the page is actually loaded
        oHttp.Send CStr(Parameters)
         
         ‘Get the response back
        ‘You can check the .status or the .statusText of the oHttp object, to see if the
        ‘request was succesful or not
        If oHttp.statusText = “OK” Then
            UseXMLHTTP = oHttp.responseText
         End If
      End If
     
      ‘Destroy the object
     Set oHttp = Nothing
   End If
End Function

Reading the HTML body is one of the things that you can do (most useful to me when the webpage returns a text file, because you don’t have to mess around with the HTML), and is very simple to do so. But there are two other things that you can do. One, is read an XML document directly from the request, using the .responseXML method, like I showed here. I’ve been using this feature (since I finally decided to jump in the XML train…) and I really enjoy the simplicity that this provides. You get everything in one single object.

The other thing is that you can (at least in theory, because I haven’t tried it live…) is that you can retrieve a file using the .responseStream method, that could be sent from an ASP’s IO.Stream (keep in mind that I’m starting to talk about stuff that I don’t really know here, ok ?)

Posted in Uncategorized

10 thoughts on “Reading the internet

  1. Juan
    Very cool, Your explanination was spot on and simple enough for an xlm challanged person such as my self
    Thanks for taking the time to post this.

  2. Very cool Jake, thanks!

    This “should” mean too that I could download the file asynchronously as well ? (if I set the third parameter of the .Open to True ?

    Right now I’m using the URLDownloadToFile, which I understand supports asynch downloads as well, but the implementation seems to be impossible.

    I guess I’ll try that ! thanks again!

  3. Hi,

    Just trying to follow this, looks very useful but I can’t yet get it to function…When I paste the code into a module I’m getting “an invalid procedure call” error at this line:

    HtmlBody = Mid$(HtmlBody, InStr(1, HtmlBody, “

  4. Graham

    Replace this line

    HtmlBody = Mid$(HtmlBody, InStr(1, HtmlBody, “</a><a href”))

    with

    HtmlBody = Mid$(HtmlBody, InStr(1, HtmlBody, “<a href”))

  5. Another method to download a file from the internet with XMLHTTP – but without ADO – below

    Sub GetXLs()
    Dim WebUrlStr As String, LocalFile As String
    Dim oXMLHTTP As Object, bArray() As Byte, hfile As Integer
    Dim newWB As Workbook

    ‘credit to Matt Vidas from Expert-Exchange & VBAeXpress

    Set oXMLHTTP = CreateObject(“Microsoft.XMLHTTP”)

    ‘download the file from the web to the hardrive
    WebUrlStr = “http://members.iinet.net.au/~brettdj/EE/Ltt.xls”
    LocalFile = “c:Ltt.xls”
    oXMLHTTP.Open “GET”, WebUrlStr, False
    oXMLHTTP.send
    bArray = oXMLHTTP.ResponseBody
    hfile = 1
    Open LocalFile For Binary As #hfile
    Put #hfile, , bArray
    Close #hfile

    Set newWB = Workbooks.Open(LocalFile)
    End Sub

    Cheers

    Dave

  6. Does anyone know how to simply download an html file from the internet? I just want to be able to download html from any website to use in excel… The methods posted above don’t seem to work for me when I change the URL to something like Yahoo.com

Leave a Reply

Your email address will not be published. Required fields are marked *