CS 1705 Library

cs1705.web
Class OldWebBot

java.lang.Object
  extended by cs1705.web.OldWebBot

Deprecated. This class is being replaced by WebBot, which you should use instead.

public class OldWebBot
extends Object

This class represents a robot that knows how to walk through a web page and identify headings and links.

Version:
2007.07.13
Author:
Stephen Edwards

Nested Class Summary
static class OldWebBot.HeadingDescriptor
          Deprecated. Represents the content of an HTML heading entry.
 
Constructor Summary
OldWebBot()
          Deprecated. Creates a new WebBot that is not yet viewing any web page.
OldWebBot(File file)
          Deprecated. Creates a new WebBot for a given file.
OldWebBot(String url)
          Deprecated. Creates a new WebBot for a given URL.
OldWebBot(URI uri)
          Deprecated. Creates a new WebBot for a given URI.
OldWebBot(URL url)
          Deprecated. Creates a new WebBot for a given URL.
 
Method Summary
 void advanceToNextHeading()
          Deprecated. Advance the robot forward in the current document until it is looking at (or standing on) the next HTML heading element it can find.
 void advanceToNextLink()
          Deprecated. Advance the robot forward in the current document until it is looking at (or standing on) the next HTML anchor containing an href attribute that it can find.
 void echoCurrentElementText()
          Deprecated. Echo the text of the current HTML element (heading, link, etc.) to the robot's default output channel.
 void echoPageTitle()
          Deprecated. Echo the current web page title to the robot's default output channel.
 String getCurrentElementText()
          Deprecated. Get the text of the current HTML element on this web page--i.e., the title of a heading or the text associated with a link.
 int getHeadingLevel()
          Deprecated. Get the heading level (1-6) of the current heading on this web page.
 List<OldWebBot.HeadingDescriptor> getHeadings()
          Deprecated. Get an iterator over all headings in the current document.
 List<OldWebBot.HeadingDescriptor> getHeadingsToLevel(int level)
          Deprecated. Get an iterator over all headings in the current document with a level less than or equal to the value specified.
 List<URI> getLinks()
          Deprecated. Get an iterator over all links in the current document.
 List<URI> getLinksOffServer()
          Deprecated. Get an iterator over all links in the current document that refer to pages on other servers.
 List<URI> getLinksToOtherPages()
          Deprecated. Get an iterator over all links in the current document that refer to other web pages.
 URI getLinkURI()
          Deprecated. Get the URI of the current link on this web page.
 PrintWriterWithHistory getOutputChannel()
          Deprecated. Get the output channel where this bot is sending its output.
 String getPageContent()
          Deprecated. Get the current web page's entire content as a string.
 int getPagePhraseCount()
          Deprecated. Get a count of the number of times the set phrase of interest occurs in the current page.
 double getPagePhraseFrequency()
          Deprecated. Get the frequency of the phrase of interest in the current page.
 String getPageTitle()
          Deprecated. Get the title the current web page.
 URL getPageURL()
          Deprecated. Get the URL for the current web page.
 boolean hasPreviousPage()
          Deprecated. Check to see if this bot previously visited a different page that it can now return to.
 boolean hasVisitedPage(File file)
          Deprecated. Check whether this robot has visited this page before.
 boolean hasVisitedPage(URI uri)
          Deprecated. Check whether this robot has visited this page before.
 boolean hasVisitedPage(URL url)
          Deprecated. Check whether this robot has visited this page before.
 boolean isLookingAtEndOfPage()
          Deprecated. Has the robot advanced through all the contents (headings and links) on the current page? Will also return true if isViewingWebPage() returns false.
 boolean isLookingAtHeading()
          Deprecated. Is the robot looking at (or standing on) an HTML heading element on the current page?
 boolean isLookingAtLink()
          Deprecated. Is the robot looking at (or standing on) an HTML anchor containing an href attribute (that is, a link to another web page) on the current page?
 boolean isViewingWebPage()
          Deprecated. Is the robot currently viewing a real web page with readable contents? Normally, this would be true, but may be false if the bot has not been given a web page to start on, or if it has been given a malformed or nonexistent URL address, or even if the server for the targeted page is not available.
 void jumpToLinkedPage()
          Deprecated. Causes the bot to temporarily leave the current page and hop over to the page at the end of the current link.
 void jumpToPage(File file)
          Deprecated. Causes the bot to temporarily leave the current page and hop over to the specified file.
 void jumpToPage(String url)
          Deprecated. Causes the bot to temporarily leave the current page and hop over to the page specified by the URL (as a string).
 void jumpToPage(URI uri)
          Deprecated. Causes the bot to temporarily leave the current page and hop over to the page specified by the URL.
 void jumpToPage(URL url)
          Deprecated. Causes the bot to temporarily leave the current page and hop over to the page specified by the URL.
 void jumpToThisHTML(String html)
          Deprecated. Causes the bot to temporarily leave the current page and hop over to a specific HTML string provided as a parameter.
 boolean linkGoesToAnotherPage()
          Deprecated. Check whether the URL of the current link on this web page refers to a different page, or just another location within the current page.
 boolean linkGoesToAnotherServer()
          Deprecated. Check whether the URL of the current link on this web page refers to a page on a separate server, or simply another location on the same server.
 int numberOfPreviousPages()
          Deprecated. How deep is the stack of previous pages that this robot can return to? Each time the robot jumps to a new page, it remembers its previous page so you can returnToPreviousPage().
 PrintWriterWithHistory out()
          Deprecated. Get the output channel where this bot is sending its output.
 boolean outputIsHtml()
          Deprecated. Check whether this robot's output should be treated as plain text, or as HTML markup.
 void returnToPreviousPage()
          Deprecated. Causes the bot to leave the current page and return to the page it was previously visiting, at the location where it left off.
 void returnToStartOfPage()
          Deprecated. Moves the robot back to the start of the current page.
 void run()
          Deprecated. Execute this robot's built-in sequence of steps.
 void setOutputChannel(PrintWriter output)
          Deprecated. Tell this bot where to send its output.
 void setOutputIsHtml(boolean value)
          Deprecated. Set whether this robot's output should be treated as plain text, or as HTML markup.
 void setPhraseOfInterest(String phrase)
          Deprecated. A key phrase of interest to look for in documents.
 String toString()
          Deprecated. Get a printable summary of this robot.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

OldWebBot

public OldWebBot()
Deprecated. 
Creates a new WebBot that is not yet viewing any web page.


OldWebBot

public OldWebBot(URI uri)
Deprecated. 
Creates a new WebBot for a given URI.

Parameters:
uri - The web page where the robot will start.

OldWebBot

public OldWebBot(URL url)
Deprecated. 
Creates a new WebBot for a given URL.

Parameters:
url - The web page where the robot will start.

OldWebBot

public OldWebBot(String url)
Deprecated. 
Creates a new WebBot for a given URL.

Parameters:
url - The web page where the robot will start.

OldWebBot

public OldWebBot(File file)
Deprecated. 
Creates a new WebBot for a given file.

Parameters:
file - The web page where the robot will start.
Method Detail

isViewingWebPage

public boolean isViewingWebPage()
Deprecated. 
Is the robot currently viewing a real web page with readable contents? Normally, this would be true, but may be false if the bot has not been given a web page to start on, or if it has been given a malformed or nonexistent URL address, or even if the server for the targeted page is not available.

Returns:
True if the robot is currently viewing a real web page with readable contents

isLookingAtEndOfPage

public boolean isLookingAtEndOfPage()
Deprecated. 
Has the robot advanced through all the contents (headings and links) on the current page? Will also return true if isViewingWebPage() returns false.

Returns:
True if the robot has advanced over all the headings and links in the current document, or false if there are more headings and/or links to visit.

returnToStartOfPage

public void returnToStartOfPage()
Deprecated. 
Moves the robot back to the start of the current page. Requires the bot to be viewing a web page.


getPageTitle

public String getPageTitle()
Deprecated. 
Get the title the current web page. Requires the bot to be viewing a web page.

Returns:
The page's title, or null if the page has no title.

echoPageTitle

public void echoPageTitle()
Deprecated. 
Echo the current web page title to the robot's default output channel. Requires the bot to be viewing a web page.


getPageURL

public URL getPageURL()
Deprecated. 
Get the URL for the current web page. Requires the bot to be viewing a web page.

Returns:
The page's URL, if it exists.

toString

public String toString()
Deprecated. 
Get a printable summary of this robot.

Overrides:
toString in class Object
Returns:
The page's content

getPageContent

public String getPageContent()
Deprecated. 
Get the current web page's entire content as a string. Requires the bot to be viewing a web page.

Returns:
The page's content

setPhraseOfInterest

public void setPhraseOfInterest(String phrase)
Deprecated. 
A key phrase of interest to look for in documents. This string will be interpreted as a case-insensitive regular expression.

Parameters:
phrase - a regular expression

getPagePhraseCount

public int getPagePhraseCount()
Deprecated. 
Get a count of the number of times the set phrase of interest occurs in the current page. Requires the bot to be viewing a web page, and that the phrase of interest has been set.

Returns:
The number of occurrences of the phrase of interest in the current web page

getPagePhraseFrequency

public double getPagePhraseFrequency()
Deprecated. 
Get the frequency of the phrase of interest in the current page. This is a number between 0 and 1 that approximates the fraction of the page that is made up by the target phrase. It is calculated by taking the size of all the occurrences of the target phrase in the document and dividing by the document's total size.

Note that this number tends to be small, since even interesting phrases usually constitute only a small fraction of a page with any interesting amount of information in it. However, it does provide a relative measure of how many times a phrase has been used, normalized by the size of the document.

Requires the bot to be viewing a web page, and that the phrase of interest has been set.

Returns:
The frequency of the phrase of interest in the current web page

isLookingAtHeading

public boolean isLookingAtHeading()
Deprecated. 
Is the robot looking at (or standing on) an HTML heading element on the current page?

Returns:
True if the robot is positioned at a heading, or false otherwise.

advanceToNextHeading

public void advanceToNextHeading()
Deprecated. 
Advance the robot forward in the current document until it is looking at (or standing on) the next HTML heading element it can find. If there are no more headings in the document, it will end up looking at the end of the page. Requires the bot to be viewing a web page.


getHeadings

public List<OldWebBot.HeadingDescriptor> getHeadings()
Deprecated. 
Get an iterator over all headings in the current document. This method is designed to make it easy to write foreach-style loops over page headings. Requires the bot to be viewing a web page.

Returns:
an iterator of OldWebBot.HeadingDescriptor objects describing the headings in the page.

getHeadingsToLevel

public List<OldWebBot.HeadingDescriptor> getHeadingsToLevel(int level)
Deprecated. 
Get an iterator over all headings in the current document with a level less than or equal to the value specified. This method is designed to make it easy to write foreach-style loops over page headings. Requires the bot to be viewing a web page.

Parameters:
level - Only include headings at this level or above (i.e., numerically less than or equal to this number)
Returns:
an iterator of OldWebBot.HeadingDescriptor objects describing the headings in the page with levels less than or equal to the specified level.

echoCurrentElementText

public void echoCurrentElementText()
Deprecated. 
Echo the text of the current HTML element (heading, link, etc.) to the robot's default output channel. Requires the bot to be viewing an existing HTML element on the current web page.


getCurrentElementText

public String getCurrentElementText()
Deprecated. 
Get the text of the current HTML element on this web page--i.e., the title of a heading or the text associated with a link. Requires the bot to be looking at an element on the current web page.

Returns:
The heading's title.

getHeadingLevel

public int getHeadingLevel()
Deprecated. 
Get the heading level (1-6) of the current heading on this web page. Requires the bot to be looking at a heading element on the current web page.

Returns:
The heading's level.

isLookingAtLink

public boolean isLookingAtLink()
Deprecated. 
Is the robot looking at (or standing on) an HTML anchor containing an href attribute (that is, a link to another web page) on the current page?

Returns:
True if the robot is positioned at a link, or false otherwise.

advanceToNextLink

public void advanceToNextLink()
Deprecated. 
Advance the robot forward in the current document until it is looking at (or standing on) the next HTML anchor containing an href attribute that it can find. If there are no more headings in the document, it will end up looking at the end of the page. Requires the bot to be viewing a web page.


getLinkURI

public URI getLinkURI()
Deprecated. 
Get the URI of the current link on this web page. Requires the bot to be looking at a link (anchor) element on the current web page.

Returns:
The link's destination.

linkGoesToAnotherPage

public boolean linkGoesToAnotherPage()
Deprecated. 
Check whether the URL of the current link on this web page refers to a different page, or just another location within the current page. Requires the bot to be looking at a link (anchor) element on the current web page.

Returns:
True if the link refers to a different page

linkGoesToAnotherServer

public boolean linkGoesToAnotherServer()
Deprecated. 
Check whether the URL of the current link on this web page refers to a page on a separate server, or simply another location on the same server. Requires the bot to be looking at a link (anchor) element on the current web page.

Returns:
True if the link refers to a page located on a different server

getLinks

public List<URI> getLinks()
Deprecated. 
Get an iterator over all links in the current document. This method is designed to make it easy to write foreach-style loops over links. Requires the bot to be viewing a web page.

Returns:
an iterator of URI objects describing the links in the page.

getLinksToOtherPages

public List<URI> getLinksToOtherPages()
Deprecated. 
Get an iterator over all links in the current document that refer to other web pages. This is a subset of those returned by getLinks(), with any links to other locations within the same page filtered out. This method is designed to make it easy to write foreach-style loops over links. Requires the bot to be viewing a web page.

Returns:
an iterator of URI objects describing the links in the page.

getLinksOffServer

public List<URI> getLinksOffServer()
Deprecated. 
Get an iterator over all links in the current document that refer to pages on other servers. This is a subset of those returned by getLinks(), with any links to pages on the same server as the current page filtered out. This method is designed to make it easy to write foreach-style loops over links. Requires the bot to be viewing a web page.

Returns:
an iterator of URI objects describing the links in the page.

jumpToLinkedPage

public void jumpToLinkedPage()
Deprecated. 
Causes the bot to temporarily leave the current page and hop over to the page at the end of the current link. The bot will "remember" where it came from, keeping track of past pages in a stack. After working with the other page, you can use returnToPreviousPage() to come back to the point where you left off. Requires the bot to be looking at a link (anchor) element on the current web page.


returnToPreviousPage

public void returnToPreviousPage()
Deprecated. 
Causes the bot to leave the current page and return to the page it was previously visiting, at the location where it left off. The previous page is the one that was most recently "remembered", or alternatively, the one on top of the stack of previous pages that have been visited. Use this method in conjunction with jumpToLinkedPage() to explore multiple pages. Requires the bot to have some previous page to return to.


hasPreviousPage

public boolean hasPreviousPage()
Deprecated. 
Check to see if this bot previously visited a different page that it can now return to. Is the stack of previous pages empty or not?

Returns:
True if there is at least one previous page on the stack of previous visited pages, or false if there are none.

numberOfPreviousPages

public int numberOfPreviousPages()
Deprecated. 
How deep is the stack of previous pages that this robot can return to? Each time the robot jumps to a new page, it remembers its previous page so you can returnToPreviousPage(). These previous pages are remembered on a stack, and this method allows you to determine how deep this stack is--that is, how many times you can repeatedly call returnToPreviousPage() successfully.

Returns:
The depth of the previous page stack. This result is zero if the robot is on a page, but has not yet jumped to any others, or -1 if there is no current page at all.

jumpToPage

public void jumpToPage(URI uri)
Deprecated. 
Causes the bot to temporarily leave the current page and hop over to the page specified by the URL. The bot will "remember" where it came from, keeping track of past pages in a stack. After working with the other page, you can use returnToPreviousPage() to come back to the point where you left off.

Parameters:
uri - The new page to jump to

jumpToPage

public void jumpToPage(URL url)
Deprecated. 
Causes the bot to temporarily leave the current page and hop over to the page specified by the URL. The bot will "remember" where it came from, keeping track of past pages in a stack. After working with the other page, you can use returnToPreviousPage() to come back to the point where you left off.

Parameters:
url - The new page to jump to

jumpToPage

public void jumpToPage(File file)
Deprecated. 
Causes the bot to temporarily leave the current page and hop over to the specified file. The bot will "remember" where it came from, keeping track of past pages in a stack. After working with the other page, you can use returnToPreviousPage() to come back to the point where you left off.

Parameters:
file - The new page to jump to

jumpToPage

public void jumpToPage(String url)
Deprecated. 
Causes the bot to temporarily leave the current page and hop over to the page specified by the URL (as a string). The bot will "remember" where it came from, keeping track of past pages in a stack. After working with the other page, you can use returnToPreviousPage() to come back to the point where you left off.

Parameters:
url - The new page to jump to

jumpToThisHTML

public void jumpToThisHTML(String html)
Deprecated. 
Causes the bot to temporarily leave the current page and hop over to a specific HTML string provided as a parameter. Instead of reading web content from the internet, the text you pass in will be used instead. The bot will "remember" where it was before, keeping track of past pages in a stack. After working with the provided HTML content you pass in, you can use returnToPreviousPage() to come back to the point where you left off in the previous page.

Parameters:
html - A string containing an HTML document to treat as if it came from the web

hasVisitedPage

public boolean hasVisitedPage(URI uri)
Deprecated. 
Check whether this robot has visited this page before.

Parameters:
uri - The page to check
Returns:
True if this robot has previously visited (or is currently on) the given web page

hasVisitedPage

public boolean hasVisitedPage(URL url)
Deprecated. 
Check whether this robot has visited this page before.

Parameters:
url - The page to check
Returns:
True if this robot has previously visited (or is currently on) the given web page

hasVisitedPage

public boolean hasVisitedPage(File file)
Deprecated. 
Check whether this robot has visited this page before.

Parameters:
file - The page to check
Returns:
True if this robot has previously visited (or is currently on) the given web page

setOutputChannel

public void setOutputChannel(PrintWriter output)
Deprecated. 
Tell this bot where to send its output. Whenever you tell the bot to echo content or headings, they will go to this destination. By default, output goes to the standard output channel, but you can change the destination here.

Parameters:
output - The output channel to send messages to

getOutputChannel

public PrintWriterWithHistory getOutputChannel()
Deprecated. 
Get the output channel where this bot is sending its output.

Returns:
The current output channel for this bot

out

public PrintWriterWithHistory out()
Deprecated. 
Get the output channel where this bot is sending its output. This is just a short convenience synonym for getOutputChannel().

Returns:
The current output channel for this bot

outputIsHtml

public boolean outputIsHtml()
Deprecated. 
Check whether this robot's output should be treated as plain text, or as HTML markup. The default is false (treat as plain text).

Returns:
True if the output should be treated as HTML markup

setOutputIsHtml

public void setOutputIsHtml(boolean value)
Deprecated. 
Set whether this robot's output should be treated as plain text, or as HTML markup.

Parameters:
value - True if the output should be treated as HTML markup, false if it should be treated as plain text

run

public void run()
Deprecated. 
Execute this robot's built-in sequence of steps. The default sequence is to do nothing, but subclasses can override this method to add their own behaviors. These behaviors will be automatically run if the robot is attached to a RobotViewer.


Last updated: Wed, Apr 1, 2009 • 12:29 AM EDT

Copyright © 2009 Virginia Tech.