How to Extract Information From a Website Using AppleScript

In previous tutorials we learned how to make applescript open a web page, how to use AppleScript to fill out forms on a web page, and how to click buttons on web pages with AppleScript. Today we are going to learn how to extract data from web pages using AppleScript!

In a later tutorial I will teach how to put all of this information together to make a fully automated application that collects and/or inputs multiple bits of data from a website or websites.

For this first example we are going to use Google Chrome’s inspect element tool and grab the first line of a Google search result.

If have not read my previous two tutorials on clicking and inputting data, this will not make much sense. Please view these first!

 

As in the previous examples we are going to need either an ID, Class, Name, or if all else fails a the tag that contains the information that we want.

First go to the web page that you would like to grab information from…Screen Shot 2014-12-10 at 12.05.25 PM

We are going to grab the first headline of how to peel a banana and pull it into applescript. Right click on the element you want to grab and click on inspect element to bring up the source code.

If you are following along the code should look something like this…Screen Shot 2014-12-10 at 12.07.39 PM

It looks like we do not have an ID or Name to go off of, so we will have to use Class.

Grabbing Data from a Website Using Class

First, paste this code into the top of your AppleScript Doc…

 

to getInputByClass(theClass, num) -- defines a function with two inputs, theClass and num

tell application "Safari" --tells AS that we are going to use Safari

set input to do JavaScript "

document.getElementsByClassName('" & theClass & "')[" & num & "].innerHTML;" in document 1 -- uses JavaScript to set the variable input to the information we want

end tell

return input --tells the function to return the value of the variable input

end getInputByClass

 

Now that we have our data scraper function we can take our first stab at pulling the info. Enter the following code in your AppleScript doc to get the data…

 

getInputByClass("r", 0)

In this instance the 0 would allude to which headline we would like to pull.

The first result would be 0, the second 1, third 2, etc…

When we try out our code we get…Screen Shot 2014-12-10 at 1.44.41 PM

Hmm, this is good that we have the information that we want, but we also picked up a lot of the HTML. To get rid of the HTML we are going to use AppleScripts Text Delimiter functions.

Enter the following code into the top of your AppleScript doc…

to extractText(searchText, startText2, endText)

 set tid to AppleScript's text item delimiters

set startText1 to "x"

 set searchText to ("x" & searchText)

 set AppleScript's text item delimiters to startText1

 set endItems to text item -1 of SearchText

 set AppleScript's text item delimiters to endText

 set beginningToEnd to text item 1 of endItems

 set AppleScript's text item delimiters to startText2

set finalText to (text items 2 thru -1 of beginningToEnd) as text

 set AppleScript's text item delimiters to tid

 return finalText

end extractText

We can set up this function to pull out what is between the lines of code. First we need to set our grabbed text to a variable.

set theText to getInputByClass("r", 0)

Next we set up the call to our function. This function takes 3 parameters.

The first is searchText, this is going to be what we retrieved from our getInputByClass Function, or theText above.

Second is the startText2 parameter. In order to get this we need to look at our code for what comes right before the information we want to extract and what comes directly after.  I’ll explain…

This is the result of our getInputByClass function:

“<a href=”http://www.instructables.com/id/The-correct-way-to-peel-a-banana/” onmousedown=”return rwt(this,”,”,”,’2′,’AFQjCNEPeA-Fa4A9BF4tiZnULdYQoDAvmA’,’tWzuP2eryJt6miXyiNuHRQ’,’0CCIQFjAB’,”,”,event)”>The correct way to peel a banana – Instructables</a>”

We want to extract “The correct way to peel a banana – Instructables”, which is between

<a href=”http://www.instructables.com/id/The-correct-way-to-peel-a-banana/” onmousedown=”return rwt(this,”,”,”,’2′,’AFQjCNEPeA-Fa4A9BF4tiZnULdYQoDAvmA’,’tWzuP2eryJt6miXyiNuHRQ’,’0CCIQFjAB’,”,”,event)”>

and

</a>

We are going to set startText to “> which is the last part of:

<a href=”http://www.instructables.com/id/The-correct-way-to-peel-a-banana/” onmousedown=”return rwt(this,”,”,”,’2′,’AFQjCNEPeA-Fa4A9BF4tiZnULdYQoDAvmA’,’tWzuP2eryJt6miXyiNuHRQ’,’0CCIQFjAB’,”,”,event)“>

And then we set endText to:

</a>

Which comes directly after the information that we want.

Now we enter  this into our AppleScript…

set theResult to extractText(theText, "">", "</a>")

and get the result

Screen Shot 2014-12-10 at 2.04.36 PM

See below for how to use other methods to grab data. If you read the previous tutorials mentioned at the beginning of this article you will understand how to use the below code.

Grabbing Data from a Website Using ID

 

to getInputById(theId)

tell application "Safari"

 set input to do JavaScript "

document.getElementById('" & theId & "').innerHTML;" in document 1

end tell

return input

end getInputById

Grabbing Data from a Website Using Tag

 

to getInputByTag(theTag, num) -- defines a function with two inputs, theTag and num

 tell application "Safari" --tells AS that we are going to use Safari

set input to do JavaScript "

document.getElementsByTagName('" & theTag & "')[" & num & "].innerHTML;" in document 1

end tell

return input

end getInputByTag

 

Grabbing Data from a Website Using Name

 

to getInputByName(theName, num) -- defines a function with two inputs, theName and num

 tell application "Safari" --tells AS that we are going to use Safari

set input to do JavaScript "

document.getElementsByName('" & theName & "')[" & num & "].innerHTML;" in document 1

end tell

return input

end getInputByName

 

 

 

 

Samuel

19 Responses to “How to Extract Information From a Website Using AppleScript

  • Hi,

    So I managed to do the getinputbyid section, But when I tried to copy and paste the extracttext code, it came up with an error:

    Syntax error
    expected “,” but found identifier.

    I believe that this was at the “ofbeginningtoend” area. Any suggestions?

    • Hey Kay,

      If you send me your applescript, or even that snippet of code you believe is the issue I can take a look for you.

      Or more specifically, what are you putting here:

      extractText(?,?,?)

  • How can i use applescript to get informAtion from a list of urlS? I Need to retrieve a specific information, the url of The img in a for each specific url, thEy are mAny thousands so i can’t proceed manually.

    • Hey Michael,

      Can you send me an example of what you are looking to do? I sent you an email so we can talk more in depth.

  • Hello there,

    Would it be possible to go more in depth with this? This would be very useful for work but I’m trying to extract text from a simple table and output it in a certain format so that I can paste it into another program.

    • Hey Ryan,

      I sent you an email. Shoot me an example and I’ll see how we can make it work for you.

    • Hey Samuel, THis is brilliant, I have pretty much combined all this information together and have it all working. I was hoping if there is a way to use the extracted information to add it to an input box on the site. Sounds spammy I know but to explain a little. We joined a wedding photographer sort of directory where couples enquire for availability. To inform of our availability we have to load up a site and input all our date over and over again including a randomly generated number. Help very much apprecaited!

  • Great scripts! Exactly what I´ve been looking for.
    i had many problems with extracttext , so i change it to this:
    to extractText2(searchInText, textPre, textPost)
    set tid to AppleScript’s text item delimiters
    set AppleScript’s text item delimiters to textPre
    set searchInText to second text item of searchInText
    set AppleScript’s text item delimiters to textPost
    set searchInText to first text item of searchInText
    set finalText to searchInText
    set AppleScript’s text item delimiters to tid
    return finalText
    end extractText2

    • Thanks Ignacio! That part of the script can be very finicky… I’m glad you figured this out and shared it.

  • Hi, great Tutorial.
    Is it possible to get the data for multiple nums?

    • Hey Has thanks! Yes just run the code twice for example:

      set numOne to getInputByClass(“r”, 0)
      set numTwo to getInputByClass(“r”, 1)

      or you can do a repeat function:

      set x to 0

      repeat 3 times
      set someVar to getInputByClass(“r”, x)

      –Enter code here what you want to do with someVar…

      set x to x + 1 — moves to the next number
      end repeat

  • when i do the first part it says missing value why?

    thanks!

    • There could be a number of different reasons. Please email me your code and the source code and I’ll take a look.

  • Hi Samuel,

    I’m new to javascript and am having trouble getting this code to work on El Capitan & Chrome 52. When I step through the code, it looks Safari isn’t finding any elements with the classname. However if I manually call getElementsByTagName with the appropriate tag and index in Chrome’s Javascript console, I get exactly what I’m looking for. Any thoughts? Here’s the entire AppleScript document:

    use AppleScript version “2.4” — Yosemite (10.10) or later
    use scripting additions

    to getInputByClass(theClass, num) — defines a function with two inputs, theClass and num
    tell application “Google Chrome” –tells AS that we are going to use Safari
    activate first tab of first window
    set input to execute javascript “document.body.getElementsByClassName(‘” & theClass & “‘)[” & num & “].innerHTML;” — uses JavaScript to set the variable input to the information we want
    end tell
    return input –tells the function to return the value of the variable input
    end getInputByClass

    getInputByClass(“moduletable”, 1)

    Thanks for any insight you can provide!

    • sam@cubemg.com
      8 years ago

      Hey Brad,
      You have to use Safari, this code will not work with Chrome.

Trackbacks & Pings

  • How to send a Text Message with AppleScript :

    […] this awesome AppleScript that pulls the data that you want from one of our previous tutorials like How to Extract Information From a Webpage Using AppleSript. What do you do with the data now? You could save the info to an Excel doc… or you could send it […]

    9 years ago

Leave a Reply Text

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.