Modifier and Type | Field and Description |
---|---|
boolean |
success
Was this page read and initialized successfully?
|
URI |
uri
This page's URL as a URI.
|
URL |
url
This page's URL.
|
Constructor and Description |
---|
Page(File file)
Create a new page by reading it from a local file.
|
Page(String htmlContent)
Create a new page by reading it from a given HTML string.
|
Page(URL url)
Create a new page by reading it from the web.
|
Modifier and Type | Method and Description |
---|---|
void |
dump(PrintStream outstream)
Dump this page for diagnostic purposes.
|
String |
getContent()
Get this document's entire content as a string.
|
Node |
getDoc()
Get this document's entire content as a DOM tree.
|
List<HtmlHeadingElement> |
getHeadings(int level)
Get an iterator over the headings in this document.
|
List<URI> |
getLinks(int kind)
Get an iterator over the links in this document.
|
int |
getPatternCount()
Get the number of times the
WebBot.targetPhrase occurs in
this page. |
double |
getPatternFrequency()
Get the frequency of the
WebBot.targetPhrase , which approximates
the size of all the occurrences of the target phrase in the document
divided by the document's total size. |
String |
getTitle()
Get this document's title a string.
|
List<HtmlElement> |
xPathFindAll(String xpathQuery) |
HtmlElement |
xPathFindFirst(String xpathQuery) |
public URL url
public URI uri
public boolean success
public Page(URL url)
url
- the page's URLpublic Page(File file)
file
- The file to read frompublic Page(String htmlContent)
htmlContent
- The content to use for this pagepublic List<HtmlHeadingElement> getHeadings(int level)
level
- The level of headings to get, where 0 is all headings,
and 1-6 are only the headings <= the given numberpublic List<URI> getLinks(int kind)
kind
- One of the constants ALL_LINKS, OTHER_PAGE_LINKS,
or OTHER_SITE_LINKS, indicating which links to include in the
iterator.public String getTitle()
public String getContent()
public Node getDoc()
public HtmlElement xPathFindFirst(String xpathQuery)
xpathQuery
- An XPATH query to run against the DOM Treepublic List<HtmlElement> xPathFindAll(String xpathQuery)
xpathQuery
- An XPATH query to run against the DOM Treepublic int getPatternCount()
WebBot.targetPhrase
occurs in
this page.WebBot.targetPhrase
occurredpublic double getPatternFrequency()
WebBot.targetPhrase
, which approximates
the size of all the occurrences of the target phrase in the document
divided by the document's total size.WebBot.targetPhrase
frequencypublic void dump(PrintStream outstream)
outstream
- The output channel to dump on