Concept
Sinobu provides functionality for handling HTML and XML documents.
It uses a built-in, lenient parser (often called a tag soup parser)
that can handle malformed HTML, similar to how web browsers do.
The core class for this is XML
, which offers a jQuery-like
API for traversing and manipulating the document structure.
🍜 Tag Soup Friendly
Handles malformed HTML gracefully, tolerating missing or mismatched tags commonly found in real-world web content.
public void caseInconsistencyLowerUpper() {
XML root = parseAsHTML("<div>crazy</DIV>");
assert root.find("div").size() == 1;
}
🎯 CSS Selector Power
Leverages CSS selectors for efficient and flexible element selection, similar to JavaScript libraries like jQuery.
public void tag() {
XML xml = I.xml("""
<root>
<h1></h1>
<article/>
<article></article>
</root>
""");
assert xml.find("h1").size() == 1;
assert xml.find("article").size() == 2;
}
🛠️ jQuery-Like API
Provides a fluent and chainable API for easy DOM manipulation, inspired by the familiar jQuery syntax.
public void append() {
XML root = I.xml("<m><Q><P/></Q><Q><P/></Q></m>");
assert root.find("Q").append("<R/><R/>").find("R").size() == 4;
assert root.find("Q > R").size() == 4;
assert root.find("Q > R:first-child").size() == 0;
}
🌐 XML Ready
Supports not only HTML but also XML documents, providing a unified interface for structured data processing.
Reading
You can parse HTML/XML from various sources using the I
class utility methods.
Sinobu automatically detects whether the input is a URL, file path, or raw string.
The parser is designed to be tolerant of errors and can handle common issues
found in real-world HTML, such as missing tags or incorrect nesting.
This makes it suitable for scraping or processing potentially messy markup.
I#xml(String)
I#xml(java.nio.file.Path)
I#xml(java.io.InputStream)
I#xml(java.io.Reader)
I#xml(Node)
public void htmlLiteral() {
XML root = I.xml("<html/>");
assert root.size() == 1;
assert root.name() == "html";
}
And can parse the invalid structure.
public void caseInconsistencyLowerUpper() {
XML root = parseAsHTML("<div>crazy</DIV>");
assert root.find("div").size() == 1;
}
public void slipOut() {
XML root = parseAsHTML("<a><b>crazy</a></b>");
assert root.find("a").text().equals("crazy");
assert root.find("b").text().equals("crazy");
}
Writing
You can serialize the XML structure back into a string representation or write it to an
Appendable
(like Writer
or StringBuilder
).
Sinobu offers both compact and formatted output options.
The XML#toString()
method provides a compact string representation without
extra formatting.
public void format() {
XML root = I.xml("<root><child/><child/></root>");
assert root.toString().equals(normalize("""
<root>
<child/>
<child/>
</root>
"""));
}
The XML#to(Appendable, String, String...)
method allows for pretty-printing
the XML structure with configurable indentation for improved readability.
You can specify the indentation string (e.g., a tab character \t
or spaces).
public void specialIndent() {
StringBuilder out = new StringBuilder();
I.xml("<root><child><nested/></child></root>").to(out, "*");
assert out.toString().equals(normalize("""
<root>
*<child>
**<nested/>
*</child>
</root>
"""));
}
You can also specify tag names that should be treated as inline elements (no line
breaks around them), preserving the original formatting for elements like <span>
or
<a>
.
public void inlineElement() {
StringBuilder out = new StringBuilder();
I.xml("<root><inline/></root>").to(out, "\t", "inline");
assert out.toString().equals("<root><inline/></root>");
}
By using the special prefix `` before a tag name, you can also specify tags that
should always be treated as non-empty elements (always have a closing tag like
<script></script>
, even if empty).
public void nonEmptyElement() {
StringBuilder out = new StringBuilder();
I.xml("<root/>").to(out, "\t", "&root");
assert out.toString().equals("<root></root>");
}
By passing null
as the indent character, formatting (indentation and line breaks)
is disabled, similar to toString()
but writing to an Appendable
.
public void withoutFormat() {
StringBuilder out = new StringBuilder();
I.xml("<root><child/></root>").to(out, null);
assert out.toString().equals("<root><child/></root>");
}
CSS Selector
Sinobu leverages CSS selectors for querying elements within the document structure using
the XML#find(String)
method. This provides a powerful and familiar way to select
nodes, similar to JavaScript's document.querySelectorAll
.
This library supports many standard CSS3 selectors and includes some useful extensions.
public void tag() {
XML xml = I.xml("""
<root>
<h1></h1>
<article/>
<article></article>
</root>
""");
assert xml.find("h1").size() == 1;
assert xml.find("article").size() == 2;
}
public void attribute() {
XML xml = I.xml("""
<m>
<e A='one' B='one'/>
<e A='two' B='two'/>
</m>
""");
assert xml.find("[A]").size() == 2;
assert xml.find("[A=one]").size() == 1;
}
public void clazz() {
XML xml = I.xml("""
<root>
<p class="on"/>
<p class="on large"/>
<p class=""/>
<p no-class-attr="on"/>
</root>
""");
assert xml.find(".on").size() == 2;
assert xml.find(".large").size() == 1;
assert xml.find(".on.large").size() == 1;
}
public void mix() {
XML xml = I.xml("""
<m>
<ok/>
<ok>
<ok id="not"/>
<not>
<ok/>
</not>
</ok>
</m>
""");
assert xml.find("ok").size() == 4;
assert xml.find("not ok").size() == 1;
assert xml.find("ok > ok").size() == 1;
}
Combinators
Combinators define the relationship between selectors.
Combinator | Description | Notes |
---|---|---|
(Space) |
Descendant | Default combinator between selectors |
> |
Child | Direct children |
+ |
Adjacent Sibling | Immediately following sibling |
~ |
General Sibling | All following siblings |
< |
Adjacent Previous Sibling | Sinobu Extension |
, |
Selector List | Groups multiple selectors |
| |
Namespace Separator | Used with type and attribute selectors |
Basic Selectors
Basic selectors target elements based on their type, class, or ID.
Selector Type | Example | Description |
---|---|---|
Type | div |
By element name |
Class | .warning |
By class attribute |
ID | #main |
By id attribute |
Universal | * |
All elements |
Attribute Selectors
Attribute selectors target elements based on the presence or value of their attributes.
Selector Syntax | Description |
---|---|
[attr] |
Elements with an attr attribute |
[attr=value] |
Elements where attr equals value |
[attr~=value] |
Elements where attr contains the word value |
[attr*=value] |
Elements where attr contains substring value |
[attr^=value] |
Elements where attr starts with value |
[attr$=value] |
Elements where attr ends with value |
[ns:attr] |
Elements with attribute attr in namespace ns |
[ns:attr=value] |
Elements with namespaced attribute and value |
Pseudo Class Selectors
Pseudo-classes select elements based on their state, position, or characteristics not reflected by simple selectors. The table includes standard pseudo-classes and kiss library specific extensions (marked as Sinobu Extension).
Pseudo-Class | Description | Notes |
---|---|---|
:first-child |
First element among siblings | |
:last-child |
Last element among siblings | |
:only-child |
Element that is the only child | |
:first-of-type |
First element of its type among siblings | |
:last-of-type |
Last element of its type among siblings | |
:only-of-type |
Element that is the only one of its type | |
:nth-child(n) |
n-th element among siblings | keyword |
:nth-last-child(n) |
n-th element among siblings, from last | keyword |
:nth-of-type(n) |
n-th element of its type among siblings | keyword |
:nth-last-of-type(n) |
n-th element of type among siblings from last | keyword |
:empty |
Elements with no children (incl. text) | |
:not(selector) |
Elements not matching the inner selector |
|
:has(selector) |
Elements having a descendant matching selector |
|
:root |
Document's root element | |
:contains(text) |
Elements containing text directly |
Sinobu Extension |
:parent |
Parent element | Sinobu Extension |
Note: User interface state pseudo-classes (like :hover
, :focus
, :checked
) are
generally not supported as they relate to browser interactions rather than static
document structure analysis.
Manipulation
The XML
object provides a fluent API, similar to jQuery, for modifying the
document structure.
Important: These manipulation methods modify the underlying DOM structure directly;
the XML
object itself is mutable in this regard. Explore the nested classes for
specific manipulation categories.
Adding Content
Methods for inserting new content (elements, text, or other XML structures) relative to the selected elements in the document.
Method Link | Description |
---|---|
XML#append(Object) |
Insert content at the end of each element. |
XML#prepend(Object) |
Insert content at the beginning of each element. |
XML#before(Object) |
Insert content before each element. |
XML#after(Object) |
Insert content after each element. |
XML#child(String) |
Create and append a new child element. |
XML#child(String, Consumer) |
Create, append, and configure a new child. |
public void append() {
XML root = I.xml("<m><Q><P/></Q><Q><P/></Q></m>");
assert root.find("Q").append("<R/><R/>").find("R").size() == 4;
assert root.find("Q > R").size() == 4;
assert root.find("Q > R:first-child").size() == 0;
}
public void prepend() {
String xml = "<m><Q><P/></Q><Q><P/></Q></m>";
XML e = I.xml(xml);
assert e.find("Q").prepend("<R/><R/>").find("R").size() == 4;
assert e.find("Q > R").size() == 4;
assert e.find("Q > R:first-child").size() == 2;
}
Removing Content
Methods for removing content or elements from the document.
Method Link | Description |
---|---|
XML#empty() |
Remove all child nodes from elements. |
XML#remove() |
Remove the selected elements from the DOM. |
public void empty() {
XML root = I.xml("<Q><P/><P/></Q>");
root.empty();
assert root.find("P").size() == 0;
}
public void remove() {
XML root = I.xml("<Q><S/><T/><S/></Q>");
assert root.find("*").remove().size() == 3;
assert root.find("S").size() == 0;
assert root.find("T").size() == 0;
}
Wrapping
Methods for wrapping selected elements with new HTML structures.
Method Link | Description |
---|---|
XML#wrap(Object) |
Wrap each selected element individually. |
XML#wrapAll(Object) |
Wrap all elements together with one structure. |
public void wrap() {
String xml = "<m><Q/><Q/></m>";
XML e = I.xml(xml);
e.find("Q").wrap("<P/>");
assert e.find("P > Q").size() == 2;
assert e.find("P").size() == 2;
assert e.find("Q").size() == 2;
}
public void wrapAll() {
String xml = "<m><Q/><Q/></m>";
XML e = I.xml(xml);
e.find("Q").wrapAll("<P/>");
assert e.find("P > Q").size() == 2;
assert e.find("P").size() == 1;
assert e.find("Q").size() == 2;
}
Text Content
Methods for getting or setting the text content of elements. Text content represents the combined text of an element and its descendants.
Method Link | Description |
---|---|
XML#text() |
Get the combined text content of elements. |
XML#text(String) |
Set text content, replacing existing. |
public void textGet() {
String xml = "<Q>ss<P>a<i>a</i>a</P><P> b </P><P>c c</P>ss</Q>";
assert I.xml(xml).find("P").text().equals("aaa b c c");
}
public void textSet() {
String xml = "<Q><P>aaa</P></Q>";
XML e = I.xml(xml);
e.find("P").text("set");
assert e.find("P:contains(set)").size() == 1;
}
Attributes
Methods for managing element attributes (e.g., href
, src
, id
). Since the
class
attribute is frequently manipulated, dedicated helper methods are provided
for convenience.
Method Link | Description |
---|---|
XML#attr(String) |
Get attribute value for the first element. |
XML#attr(String, Object) |
Set attribute value; null removes attribute. |
XML#addClass(String...) |
Add one or more classes. |
XML#removeClass(String...) |
Remove one or more classes. |
XML#toggleClass(String) |
Add or remove a class based on presence. |
XML#hasClass(String) |
Check if any element has the class. |
public void attrGet() {
String xml = "<Q name='value' key='map'/>";
assert I.xml(xml).attr("name").equals("value");
assert I.xml(xml).attr("key").equals("map");
}
public void attrSet() {
String xml = "<Q name='value' key='map'/>";
XML e = I.xml(xml);
e.attr("name", "set");
assert e.attr("name").equals("set");
assert e.attr("key").equals("map");
assert e.attr("name", null).find("Q[name]").size() == 0;
}
public void addClass() {
assert I.xml("<a/>").addClass("add").attr("class").equals("add");
assert I.xml("<a class='base'/>").addClass("add").attr("class").equals("base add");
assert I.xml("<a class='base'/>").addClass("base").attr("class").equals("base");
assert I.xml("<a class='base'/>").addClass("add", "base", "ad").attr("class").equals("base add ad");
}
Cloning
Method for duplicating elements, creating a deep copy.
Method Link | Description |
---|---|
XML#clone() |
Create a deep copy of selected elements. |
Traversing
Navigate the DOM tree relative to the currently selected elements.
Most traversal methods return a new XML
object containing the resulting elements,
allowing for method chaining without modifying the original selection (unless intended).
Explore the nested classes for specific traversal categories.
Filtering
Methods for filtering the current set of selected elements or finding new ones within the current context.
Method Link | Description |
---|---|
XML#first() |
Reduce the set to the first element. |
XML#last() |
Reduce the set to the last element. |
XML#find(String) |
Find descendants matching the selector. |
public void first() {
XML xml = I.xml("""
<root>
<child1 class='a'/>
<child2 class='a'/>
<child3 class='a'/>
</root>
""");
XML found = xml.find(".a");
assert found.size() == 3;
XML first = found.first();
assert first.size() == 1;
assert first.name().equals("child1");
}
public void tag() {
XML xml = I.xml("""
<root>
<h1></h1>
<article/>
<article></article>
</root>
""");
assert xml.find("h1").size() == 1;
assert xml.find("article").size() == 2;
}
Iteration
The XML
object implements Iterable
, allowing easy iteration
over each selected DOM element individually using a standard Java for-each loop.
This is useful for processing each element in a selection.
void iterate() {
XML elements = I.xml("<div><p>1</p><p>2</p></div>").find("p");
for (XML p : elements) {
System.out.println(p.text());
}
}