Sinobu

Concept

Sinobu provides functionality for handling HTML and XML documents. It uses a built-in, lenient parser (often called a tag soup parser) that can handle malformed HTML, similar to how web browsers do. The core class for this is XML, which offers a jQuery-like API for traversing and manipulating the document structure.

🍜 Tag Soup Friendly

Handles malformed HTML gracefully, tolerating missing or mismatched tags commonly found in real-world web content.

public void caseInconsistencyLowerUpper() {
    XML root = parseAsHTML("<div>crazy</DIV>");

    assert root.find("div").size() == 1;
}

🎯 CSS Selector Power

Leverages CSS selectors for efficient and flexible element selection, similar to JavaScript libraries like jQuery.

public void tag() {
    XML xml = I.xml("""
            <root>
                <h1></h1>
                <article/>
                <article></article>
            </root>
            """);

    assert xml.find("h1").size() == 1;
    assert xml.find("article").size() == 2;
}

🛠️ jQuery-Like API

Provides a fluent and chainable API for easy DOM manipulation, inspired by the familiar jQuery syntax.

public void append() {
    XML root = I.xml("<m><Q><P/></Q><Q><P/></Q></m>");

    assert root.find("Q").append("<R/><R/>").find("R").size() == 4;
    assert root.find("Q > R").size() == 4;
    assert root.find("Q > R:first-child").size() == 0;
}

🌐 XML Ready

Supports not only HTML but also XML documents, providing a unified interface for structured data processing.

Reading

You can parse HTML/XML from various sources using the I class utility methods. Sinobu automatically detects whether the input is a URL, file path, or raw string. The parser is designed to be tolerant of errors and can handle common issues found in real-world HTML, such as missing tags or incorrect nesting. This makes it suitable for scraping or processing potentially messy markup.

public void htmlLiteral() {
    XML root = I.xml("<html/>");

    assert root.size() == 1;
    assert root.name() == "html";
}

And can parse the invalid structure.

public void caseInconsistencyLowerUpper() {
    XML root = parseAsHTML("<div>crazy</DIV>");

    assert root.find("div").size() == 1;
}
public void slipOut() {
    XML root = parseAsHTML("<a><b>crazy</a></b>");

    assert root.find("a").text().equals("crazy");
    assert root.find("b").text().equals("crazy");
}

Writing

You can serialize the XML structure back into a string representation or write it to an Appendable (like Writer or StringBuilder). Sinobu offers both compact and formatted output options.

The XML#toString() method provides a compact string representation without extra formatting.

public void format() {
    XML root = I.xml("<root><child/><child/></root>");

    assert root.toString().equals(normalize("""
            <root>
                <child/>
                <child/>
            </root>
            """));
}

The XML#to(Appendable, String, String...) method allows for pretty-printing the XML structure with configurable indentation for improved readability. You can specify the indentation string (e.g., a tab character \t or spaces).

public void specialIndent() {
    StringBuilder out = new StringBuilder();
    I.xml("<root><child><nested/></child></root>").to(out, "*");

    assert out.toString().equals(normalize("""
            <root>
            *<child>
            **<nested/>
            *</child>
            </root>
            """));
}

You can also specify tag names that should be treated as inline elements (no line breaks around them), preserving the original formatting for elements like <span> or <a>.

public void inlineElement() {
    StringBuilder out = new StringBuilder();
    I.xml("<root><inline/></root>").to(out, "\t", "inline");

    assert out.toString().equals("<root><inline/></root>");
}

By using the special prefix `` before a tag name, you can also specify tags that should always be treated as non-empty elements (always have a closing tag like <script></script>, even if empty).

public void nonEmptyElement() {
    StringBuilder out = new StringBuilder();
    I.xml("<root/>").to(out, "\t", "&root");

    assert out.toString().equals("<root></root>");
}

By passing null as the indent character, formatting (indentation and line breaks) is disabled, similar to toString() but writing to an Appendable.

public void withoutFormat() {
    StringBuilder out = new StringBuilder();
    I.xml("<root><child/></root>").to(out, null);

    assert out.toString().equals("<root><child/></root>");
}

CSS Selector

Sinobu leverages CSS selectors for querying elements within the document structure using the XML#find(String) method. This provides a powerful and familiar way to select nodes, similar to JavaScript's document.querySelectorAll. This library supports many standard CSS3 selectors and includes some useful extensions.

public void tag() {
    XML xml = I.xml("""
            <root>
                <h1></h1>
                <article/>
                <article></article>
            </root>
            """);

    assert xml.find("h1").size() == 1;
    assert xml.find("article").size() == 2;
}
public void attribute() {
    XML xml = I.xml("""
            <m>
                <e A='one' B='one'/>
                <e A='two' B='two'/>
            </m>
            """);

    assert xml.find("[A]").size() == 2;
    assert xml.find("[A=one]").size() == 1;
}
public void clazz() {
    XML xml = I.xml("""
            <root>
                <p class="on"/>
                <p class="on large"/>
                <p class=""/>
                <p no-class-attr="on"/>
            </root>
            """);

    assert xml.find(".on").size() == 2;
    assert xml.find(".large").size() == 1;
    assert xml.find(".on.large").size() == 1;
}
public void mix() {
    XML xml = I.xml("""
            <m>
                <ok/>
                <ok>
                    <ok id="not"/>
                    <not>
                        <ok/>
                    </not>
                </ok>
            </m>
            """);

    assert xml.find("ok").size() == 4;
    assert xml.find("not ok").size() == 1;
    assert xml.find("ok > ok").size() == 1;
}

Combinators

Combinators define the relationship between selectors.

Combinator Description Notes
(Space) Descendant Default combinator between selectors
> Child Direct children
+ Adjacent Sibling Immediately following sibling
~ General Sibling All following siblings
< Adjacent Previous Sibling Sinobu Extension
, Selector List Groups multiple selectors
| Namespace Separator Used with type and attribute selectors

Basic Selectors

Basic selectors target elements based on their type, class, or ID.

Selector Type Example Description
Type div By element name
Class .warning By class attribute
ID #main By id attribute
Universal * All elements

Attribute Selectors

Attribute selectors target elements based on the presence or value of their attributes.

Selector Syntax Description
[attr] Elements with an attr attribute
[attr=value] Elements where attr equals value
[attr~=value] Elements where attr contains the word value
[attr*=value] Elements where attr contains substring value
[attr^=value] Elements where attr starts with value
[attr$=value] Elements where attr ends with value
[ns:attr] Elements with attribute attr in namespace ns
[ns:attr=value] Elements with namespaced attribute and value

Pseudo Class Selectors

Pseudo-classes select elements based on their state, position, or characteristics not reflected by simple selectors. The table includes standard pseudo-classes and kiss library specific extensions (marked as Sinobu Extension).

Pseudo-Class Description Notes
:first-child First element among siblings
:last-child Last element among siblings
:only-child Element that is the only child
:first-of-type First element of its type among siblings
:last-of-type Last element of its type among siblings
:only-of-type Element that is the only one of its type
:nth-child(n) n-th element among siblings keyword
:nth-last-child(n) n-th element among siblings, from last keyword
:nth-of-type(n) n-th element of its type among siblings keyword
:nth-last-of-type(n) n-th element of type among siblings from last keyword
:empty Elements with no children (incl. text)
:not(selector) Elements not matching the inner selector
:has(selector) Elements having a descendant matching selector
:root Document's root element
:contains(text) Elements containing text directly Sinobu Extension
:parent Parent element Sinobu Extension

Note: User interface state pseudo-classes (like :hover, :focus, :checked) are generally not supported as they relate to browser interactions rather than static document structure analysis.

Manipulation

The XML object provides a fluent API, similar to jQuery, for modifying the document structure.

Important: These manipulation methods modify the underlying DOM structure directly; the XML object itself is mutable in this regard. Explore the nested classes for specific manipulation categories.

Adding Content

Methods for inserting new content (elements, text, or other XML structures) relative to the selected elements in the document.

Method Link Description
XML#append(Object) Insert content at the end of each element.
XML#prepend(Object) Insert content at the beginning of each element.
XML#before(Object) Insert content before each element.
XML#after(Object) Insert content after each element.
XML#child(String) Create and append a new child element.
XML#child(String, Consumer) Create, append, and configure a new child.
public void append() {
    XML root = I.xml("<m><Q><P/></Q><Q><P/></Q></m>");

    assert root.find("Q").append("<R/><R/>").find("R").size() == 4;
    assert root.find("Q > R").size() == 4;
    assert root.find("Q > R:first-child").size() == 0;
}
public void prepend() {
    String xml = "<m><Q><P/></Q><Q><P/></Q></m>";

    XML e = I.xml(xml);
    assert e.find("Q").prepend("<R/><R/>").find("R").size() == 4;
    assert e.find("Q > R").size() == 4;
    assert e.find("Q > R:first-child").size() == 2;
}

Removing Content

Methods for removing content or elements from the document.

Method Link Description
XML#empty() Remove all child nodes from elements.
XML#remove() Remove the selected elements from the DOM.
public void empty() {
    XML root = I.xml("<Q><P/><P/></Q>");
    root.empty();

    assert root.find("P").size() == 0;
}
public void remove() {
    XML root = I.xml("<Q><S/><T/><S/></Q>");
    assert root.find("*").remove().size() == 3;

    assert root.find("S").size() == 0;
    assert root.find("T").size() == 0;
}

Wrapping

Methods for wrapping selected elements with new HTML structures.

Method Link Description
XML#wrap(Object) Wrap each selected element individually.
XML#wrapAll(Object) Wrap all elements together with one structure.
public void wrap() {
    String xml = "<m><Q/><Q/></m>";

    XML e = I.xml(xml);
    e.find("Q").wrap("<P/>");

    assert e.find("P > Q").size() == 2;
    assert e.find("P").size() == 2;
    assert e.find("Q").size() == 2;
}
public void wrapAll() {
    String xml = "<m><Q/><Q/></m>";

    XML e = I.xml(xml);
    e.find("Q").wrapAll("<P/>");

    assert e.find("P > Q").size() == 2;
    assert e.find("P").size() == 1;
    assert e.find("Q").size() == 2;
}

Text Content

Methods for getting or setting the text content of elements. Text content represents the combined text of an element and its descendants.

Method Link Description
XML#text() Get the combined text content of elements.
XML#text(String) Set text content, replacing existing.
public void textGet() {
    String xml = "<Q>ss<P>a<i>a</i>a</P><P> b </P><P>c c</P>ss</Q>";

    assert I.xml(xml).find("P").text().equals("aaa b c c");
}
public void textSet() {
    String xml = "<Q><P>aaa</P></Q>";

    XML e = I.xml(xml);
    e.find("P").text("set");

    assert e.find("P:contains(set)").size() == 1;
}

Attributes

Methods for managing element attributes (e.g., href, src, id). Since the class attribute is frequently manipulated, dedicated helper methods are provided for convenience.

Method Link Description
XML#attr(String) Get attribute value for the first element.
XML#attr(String, Object) Set attribute value; null removes attribute.
XML#addClass(String...) Add one or more classes.
XML#removeClass(String...) Remove one or more classes.
XML#toggleClass(String) Add or remove a class based on presence.
XML#hasClass(String) Check if any element has the class.
public void attrGet() {
    String xml = "<Q name='value' key='map'/>";

    assert I.xml(xml).attr("name").equals("value");
    assert I.xml(xml).attr("key").equals("map");
}
public void attrSet() {
    String xml = "<Q name='value' key='map'/>";

    XML e = I.xml(xml);
    e.attr("name", "set");

    assert e.attr("name").equals("set");
    assert e.attr("key").equals("map");
    assert e.attr("name", null).find("Q[name]").size() == 0;
}
public void addClass() {
    assert I.xml("<a/>").addClass("add").attr("class").equals("add");
    assert I.xml("<a class='base'/>").addClass("add").attr("class").equals("base add");
    assert I.xml("<a class='base'/>").addClass("base").attr("class").equals("base");
    assert I.xml("<a class='base'/>").addClass("add", "base", "ad").attr("class").equals("base add ad");
}

Cloning

Method for duplicating elements, creating a deep copy.

Method Link Description
XML#clone() Create a deep copy of selected elements.

Traversing

Navigate the DOM tree relative to the currently selected elements. Most traversal methods return a new XML object containing the resulting elements, allowing for method chaining without modifying the original selection (unless intended). Explore the nested classes for specific traversal categories.

Filtering

Methods for filtering the current set of selected elements or finding new ones within the current context.

Method Link Description
XML#first() Reduce the set to the first element.
XML#last() Reduce the set to the last element.
XML#find(String) Find descendants matching the selector.
public void first() {
    XML xml = I.xml("""
            <root>
                <child1 class='a'/>
                <child2 class='a'/>
                <child3 class='a'/>
            </root>
            """);
    XML found = xml.find(".a");
    assert found.size() == 3;

    XML first = found.first();
    assert first.size() == 1;
    assert first.name().equals("child1");
}
public void tag() {
    XML xml = I.xml("""
            <root>
                <h1></h1>
                <article/>
                <article></article>
            </root>
            """);

    assert xml.find("h1").size() == 1;
    assert xml.find("article").size() == 2;
}

Tree Navigation

Methods for navigating the DOM tree relative to the current elements, including moving vertically (up to parents, down to children) and horizontally (sideways to siblings).

Method Link Description
XML#parent() Get the direct parent of each element in the current set. Duplicates are removed.
XML#children() Get the direct children of each element in the set.
XML#firstChild() Get the first direct child of each element in the set.
XML#lastChild() Get the last direct child of each element in the set.
XML#prev() Get the immediately preceding sibling of each element in the set.
XML#next() Get the immediately following sibling of each element in the set.
public void children() {
    XML root1 = I.xml("""
            <root>
                <first/>
                <center/>
                <last/>
            </root>
            """);
    assert root1.children().size() == 3;

    XML root2 = I.xml("""
            <root>
                text<first/>is
                <child>
                    <center/>
                </child>
                ignored<last/>!!
            </root>
            """);
    assert root2.children().size() == 3;

    XML root3 = I.xml("<root/>");
    assert root3.children().size() == 0;
}
public void next() {
    XML root = I.xml("""
            <root>
                <first/>
                <center/>
                text is ignored
                <last/>
            </root>
            """);

    XML next1 = root.find("first").next();
    assert next1.name().equals("center");

    XML next2 = root.find("center").next();
    assert next2.name().equals("last");

    XML next3 = root.find("last").next();
    assert next3.size() == 0;
}

Iteration

The XML object implements Iterable, allowing easy iteration over each selected DOM element individually using a standard Java for-each loop. This is useful for processing each element in a selection.

void iterate() {
    XML elements = I.xml("<div><p>1</p><p>2</p></div>").find("p");

    for (XML p : elements) {
        System.out.println(p.text());
    }
}
JSONTemplate Engine