| Modifier and Type | Field and Description |
|---|---|
boolean |
success
Was this page read and initialized successfully?
|
URI |
uri
This page's URL as a URI.
|
URL |
url
This page's URL.
|
| Constructor and Description |
|---|
Page(File file)
Create a new page by reading it from a local file.
|
Page(String htmlContent)
Create a new page by reading it from a given HTML string.
|
Page(URL url)
Create a new page by reading it from the web.
|
| Modifier and Type | Method and Description |
|---|---|
void |
dump(PrintStream outstream)
Dump this page for diagnostic purposes.
|
String |
getContent()
Get this document's entire content as a string.
|
Node |
getDoc()
Get this document's entire content as a DOM tree.
|
List<HtmlHeadingElement> |
getHeadings(int level)
Get an iterator over the headings in this document.
|
List<URI> |
getLinks(int kind)
Get an iterator over the links in this document.
|
int |
getPatternCount()
Get the number of times the
WebBot.targetPhrase occurs in
this page. |
double |
getPatternFrequency()
Get the frequency of the
WebBot.targetPhrase, which approximates
the size of all the occurrences of the target phrase in the document
divided by the document's total size. |
String |
getTitle()
Get this document's title a string.
|
List<HtmlElement> |
xPathFindAll(String xpathQuery) |
HtmlElement |
xPathFindFirst(String xpathQuery) |
public URL url
public URI uri
public boolean success
public Page(URL url)
url - the page's URLpublic Page(File file)
file - The file to read frompublic Page(String htmlContent)
htmlContent - The content to use for this pagepublic List<HtmlHeadingElement> getHeadings(int level)
level - The level of headings to get, where 0 is all headings,
and 1-6 are only the headings <= the given numberpublic List<URI> getLinks(int kind)
kind - One of the constants ALL_LINKS, OTHER_PAGE_LINKS,
or OTHER_SITE_LINKS, indicating which links to include in the
iterator.public String getTitle()
public String getContent()
public Node getDoc()
public HtmlElement xPathFindFirst(String xpathQuery)
xpathQuery - An XPATH query to run against the DOM Treepublic List<HtmlElement> xPathFindAll(String xpathQuery)
xpathQuery - An XPATH query to run against the DOM Treepublic int getPatternCount()
WebBot.targetPhrase occurs in
this page.WebBot.targetPhrase occurredpublic double getPatternFrequency()
WebBot.targetPhrase, which approximates
the size of all the occurrences of the target phrase in the document
divided by the document's total size.WebBot.targetPhrase frequencypublic void dump(PrintStream outstream)
outstream - The output channel to dump on