|
CS 1705 Library | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcs1705.web.WebBot
public class WebBot
This class represents a robot that knows how to walk through a web page and identify headings and links. It will automatically transform "messy" real-world html into conforming XHTML as it visits pages, so all tag matching and other support should presume XHTML conventions.
| Constructor Summary | |
|---|---|
WebBot()
Creates a new WebBot that is not yet viewing any web page. |
|
WebBot(String url)
Creates a new WebBot for a given URL. |
|
| Method Summary | |
|---|---|
void |
advanceToNextHeading()
Advance the robot forward in the current document until it is looking at (or standing on) the next HTML heading element it can find. |
void |
advanceToNextLink()
Advance the robot forward in the current document until it is looking at (or standing on) the next HTML anchor containing an href attribute that it can find. |
void |
echoCurrentElementText()
Echo the text of the current HTML element (heading, link, etc.) to the robot's default output channel. |
void |
echoPageTitle()
Echo the current web page title to the robot's default output channel. |
HtmlElement |
getCurrentElement()
Get the HTML element of interest that the robot is currently standing on. |
String |
getCurrentElementText()
Get the text of the current HTML element on this web page--i.e., the title of a heading or the text associated with a link. |
int |
getHeadingLevel()
Get the heading level (1-6) of the current heading on this web page. |
List<HtmlHeadingElement> |
getHeadings()
Get an iterator over all headings in the current document. |
List<HtmlHeadingElement> |
getHeadingsToLevel(int level)
Get an iterator over all headings in the current document with a level less than or equal to the value specified. |
List<URI> |
getLinks()
Get an iterator over all links in the current document. |
List<URI> |
getLinksOffServer()
Get an iterator over all links in the current document that refer to pages on other servers. |
List<URI> |
getLinksToOtherPages()
Get an iterator over all links in the current document that refer to other web pages. |
URI |
getLinkURI()
Get the URI of the current link on this web page. |
PrintWriterWithHistory |
getOutputChannel()
Get the output channel where this bot is sending its output. |
String |
getPageTitle()
Get the title the current web page. |
URL |
getPageURL()
Get the URL for the current web page. |
boolean |
hasPreviousPage()
Check to see if this bot previously visited a different page that it can now return to. |
boolean |
hasVisitedPage(URI uri)
Check whether this robot has visited this page before. |
boolean |
hasVisitedPage(URL url)
Check whether this robot has visited this page before. |
boolean |
isLookingAtEndOfPage()
Has the robot advanced through all the contents (headings and links) on the current page? Will also return true if isViewingWebPage() returns false. |
boolean |
isLookingAtHeading()
Is the robot looking at (or standing on) an HTML heading element on the current page? |
boolean |
isLookingAtLink()
Is the robot looking at (or standing on) an HTML anchor containing an href attribute (that is, a link to another web page) on the current page? |
boolean |
isViewingWebPage()
Is the robot currently viewing a real web page with readable contents? Normally, this would be true, but may be false if the bot has not been given a web page to start on, or if it has been given a malformed or nonexistent URL address, or even if the server for the targeted page is not available. |
void |
jumpToLinkedPage()
Causes the bot to temporarily leave the current page and hop over to the page at the end of the current link. |
void |
jumpToPage(String url)
Causes the bot to temporarily leave the current page and hop over to the page specified by the URL (as a string). |
void |
jumpToPage(URI uri)
Causes the bot to temporarily leave the current page and hop over to the page specified by the URL. |
void |
jumpToPage(URL url)
Causes the bot to temporarily leave the current page and hop over to the page specified by the URL. |
void |
jumpToThisHTML(String html)
Causes the bot to temporarily leave the current page and hop over to a specific HTML string provided as a parameter. |
boolean |
linkGoesToAnotherPage()
Check whether the URL of the current link on this web page refers to a different page, or just another location within the current page. |
boolean |
linkGoesToAnotherServer()
Check whether the URL of the current link on this web page refers to a page on a separate server, or simply another location on the same server. |
int |
numberOfPreviousPages()
How deep is the stack of previous pages that this robot can return to? Each time the robot jumps to a new page, it remembers its previous page so you can returnToPreviousPage(). |
PrintWriterWithHistory |
out()
Get the output channel where this bot is sending its output. |
boolean |
outputIsHtml()
Check whether this robot's output should be treated as plain text, or as HTML markup. |
URI |
resolveURIFromPage(String uri)
Get a fully-resolved URI from a (possibly relative) string URI, such as the value of an anchor's href or an img's src attribute. |
void |
returnToPreviousPage()
Causes the bot to leave the current page and return to the page it was previously visiting, at the location where it left off. |
void |
returnToStartOfPage()
Moves the robot back to the start of the current page. |
void |
run()
Execute this robot's built-in sequence of steps. |
void |
setOutputChannel(PrintWriter output)
Tell this bot where to send its output. |
void |
setOutputIsHtml(boolean value)
Set whether this robot's output should be treated as plain text, or as HTML markup. |
String |
toString()
Get a printable summary of this robot. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public WebBot()
public WebBot(String url)
url - The web page where the robot will start.| Method Detail |
|---|
public boolean isViewingWebPage()
public boolean isLookingAtEndOfPage()
isViewingWebPage() returns false.
public void returnToStartOfPage()
public String getPageTitle()
public void echoPageTitle()
public URL getPageURL()
public String toString()
toString in class Objectpublic HtmlElement getCurrentElement()
public boolean isLookingAtHeading()
public void advanceToNextHeading()
public List<HtmlHeadingElement> getHeadings()
HtmlHeadingElement objects describing the
headings in the page.public List<HtmlHeadingElement> getHeadingsToLevel(int level)
level - Only include headings at this level or above (i.e.,
numerically less than or equal to this number)
HtmlHeadingElement objects describing the
headings in the page with levels less than or equal to the
specified level.public void echoCurrentElementText()
public String getCurrentElementText()
public int getHeadingLevel()
public boolean isLookingAtLink()
public void advanceToNextLink()
public URI getLinkURI()
public boolean linkGoesToAnotherPage()
public boolean linkGoesToAnotherServer()
public List<URI> getLinks()
URI objects describing the
links in the page.public List<URI> getLinksToOtherPages()
getLinks(), with any links to other locations within the same
page filtered out. This method is designed to make it easy to write
foreach-style loops over links.
Requires the bot to be viewing a web page.
URI objects describing the
links in the page.public List<URI> getLinksOffServer()
getLinks(), with any links to pages on the same server as the
current page filtered out. This method is designed to make it easy
to write foreach-style loops over links.
Requires the bot to be viewing a web page.
URI objects describing the
links in the page.public void jumpToLinkedPage()
returnToPreviousPage() to
come back to the point where you left off.
Requires the bot to be looking at a link (anchor) element on
the current web page.
public void returnToPreviousPage()
jumpToLinkedPage() to
explore multiple pages.
Requires the bot to have some previous page to return to.
public boolean hasPreviousPage()
public int numberOfPreviousPages()
returnToPreviousPage(). These previous pages
are remembered on a stack, and this method allows you to determine
how deep this stack is--that is, how many times you can repeatedly
call returnToPreviousPage() successfully.
public void jumpToPage(String url)
returnToPreviousPage()
to come back to the point where you left off.
url - The new page to jump topublic void jumpToPage(URL url)
returnToPreviousPage() to
come back to the point where you left off.
url - The new page to jump topublic void jumpToPage(URI uri)
returnToPreviousPage() to
come back to the point where you left off.
uri - The new page to jump topublic void jumpToThisHTML(String html)
returnToPreviousPage()
to come back to the point where you left off in the previous page.
html - A string containing an HTML document to treat as if it
came from the webpublic URI resolveURIFromPage(String uri)
uri - The URI to convert to absolute form
public boolean hasVisitedPage(URI uri)
uri - The page to check
public boolean hasVisitedPage(URL url)
url - The page to check
public void setOutputChannel(PrintWriter output)
output - The output channel to send messages topublic PrintWriterWithHistory getOutputChannel()
public PrintWriterWithHistory out()
getOutputChannel().
public boolean outputIsHtml()
public void setOutputIsHtml(boolean value)
value - True if the output should be treated as HTML markup, false
if it should be treated as plain textpublic void run()
RobotViewer.
|
Last updated: Wed, Apr 1, 2009 12:29 AM EDT | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||