Skip to content

fix(deps): update dependency org.jsoup:jsoup to v1.21.1 #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Apr 25, 2025

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
org.jsoup:jsoup (source) 1.18.3 -> 1.21.1 age adoption passing confidence

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

jhy/jsoup (org.jsoup:jsoup)

v1.21.1

Changes
  • Removed previously deprecated methods. #​2317
  • Deprecated the :matchText pseduo-selector due to its side effects on the DOM; use the new ::textnode selector and the Element#selectNodes(String css, Class type) method instead. #​2343
  • Deprecated Connection.Response#bufferUp() in lieu of Connection.Response#readFully() which can throw a checked IOException.
  • Deprecated internal methods Validate#ensureNotNull (replaced by typed Validate#expectNotNull); protected HTML appenders from Attribute and Node.
  • If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.
Improvements
  • Enhanced the Selector to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: ::comment:contains(prices) + p will select p elements immediately after a <!-- prices: --> comment. Supported types include ::node, ::leafnode, ::comment, ::text, ::data, and ::cdata. Node contextual selectors like ::node:contains(text), :matches(regex), and :blank are also supported. Introduced Element#selectNodes(String css) and Element#selectNodes(String css, Class nodeType) for direct node selection. #​2324
  • Added TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).
  • Made TokenQueue and CharacterReader autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.
  • Added Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias of QueryParser.parse(String css).
  • Custom tags (defined via the TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.
  • Added NodeVisitor#traverse(Node) to simplify node traversal calls (vs. importing NodeTraversor).
  • Updated the default user-agent string to improve compatibility. #​2341
  • The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #​2326.
  • Added Connection#readFully() as a replacement for Connection#bufferUp() with an explicit IOException. Similarly, added Connection#readBody() over Connection#body(). Deprecated Connection#bufferUp(). #​2327
  • When serializing HTML, the < and > characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #​2337
  • Changed Connection to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via -Djsoup.useHttpClient=false. #​2340
Bug Fixes
  • The contents of a script in a svg foreign context should be parsed as script data, not text. #​2320
  • Tag#isFormSubmittable() was updating the Tag's options. #​2323
  • The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. #​2325
  • Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. #​2332
  • When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. #​2334

v1.20.1

Changes
  • To better follow the HTML5 spec and current browsers, the HTML parser no longer allows self-closing tags (<foo />)
    to close HTML elements by default. Foreign content (SVG, MathML), and content parsed with the XML parser, still
    supports self-closing tags. If you need specific HTML tags to support self-closing, you can register a custom tag via
    the TagSet configured in Parser.tagSet(), using Tag#set(Tag.SelfClose). Standard void tags (such as <img>,
    <br>, etc.) continue to behave as usual and are not affected by this
    change. #​2300.
  • The following internal components have been deprecated. If you do happen to be using any of these, please take the opportunity now to migrate away from them, as they will be removed in jsoup 1.21.1.
    • ChangeNotifyingArrayList, Document.updateMetaCharsetElement(), Document.updateMetaCharsetElement(boolean), HtmlTreeBuilder.isContentForTagData(String), Parser.isContentForTagData(String), Parser.setTreeBuilder(TreeBuilder), Tag.formatAsBlock(), Tag.isFormListed(), TokenQueue.addFirst(String), TokenQueue.chompTo(String), TokenQueue.chompToIgnoreCase(String), TokenQueue.consumeToIgnoreCase(String), TokenQueue.consumeWord(), TokenQueue.matchesAny(String...)
Functional Improvements
  • Rebuilt the HTML pretty-printer, to simplify and consolidate the implementation, improve consistency, support custom
    Tags, and provide a cleaner path for ongoing improvements. The specific HTML produced by the pretty-printer may be
    different from previous versions. #​2286.
  • Added the ability to define custom tags, and to modify properties of known tags, via the TagSet tag collection.
    Their properties can impact both the parse and how content is
    serialized (output as HTML or XML). #​2285.
  • Element.cssSelector() will prefer to return shorter selectors by using ancestor IDs when available and unique. E.g.
    #id > div > p instead of html > body > div > div > p #​2283.
  • Added Elements.deselect(int index), Elements.deselect(Object o), and Elements.deselectAll() methods to remove
    elements from the Elements list without removing them from the underlying DOM. Also added Elements.asList() method
    to get a modifiable list of elements without affecting the DOM. (Individual Elements remain linked to the
    DOM.) #​2100.
  • Added support for sending a request body from an InputStream with
    Connection.requestBodyStream(InputStream stream). #​1122.
  • The XML parser now supports scoped xmlns: prefix namespace declarations, and applies the correct namespace to Tags and
    Attributes. Also, added Tag#prefix(), Tag#localName(), Attribute#prefix(), Attribute#localName(), and
    Attribute#namespace() to retrieve these. #​2299.
  • CSS identifiers are now escaped and unescaped correctly to the CSS spec. Element#cssSelector() will emit
    appropriately escaped selectors, and the QueryParser supports those. Added Selector.escapeCssIdentifier() and
    Selector.unescapeCssIdentifier(). #​2297, #​2305
Structure and Performance Improvements
  • Refactored the CSS QueryParser into a clearer recursive descent
    parser. #​2310.
  • CSS selectors with consecutive combinators (e.g. div >> p) will throw an explicit parse
    exception. #​2311.
  • Performance: reduced the shallow size of an Element from 40 to 32 bytes, and the NodeList from 32 to 24.
    #​2307.
  • Performance: reduced GC load of new StringBuilders when tokenizing input
    HTML. #​2304.
  • Made Parser instances threadsafe, so that inadvertent use of the same instance across threads will not lead to
    errors. For actual concurrency, use Parser#newInstance() per
    thread. #​2314.
Bug Fixes
  • Element names containing characters invalid in XML are now normalized to valid XML names when
    serializing. #​1496.
  • When serializing to XML, characters that are invalid in XML 1.0 should be removed (not
    encoded). #​1743.
  • When converting a Document to the W3C DOM in W3CDom, elements with an attribute in an undeclared namespace now
    get a declaration of xmlns:prefix="undefined". This allows subsequent serialization to XML via W3CDom.asString()
    to succeed. #​2087.
  • The StreamParser could emit the final elements of a document twice, due to how onNodeCompleted was fired when closing out the stack. #​2295.
  • When parsing with the XML parser and error tracking enabled, the trailing ? in <?xml version="1.0"?> would
    incorrectly emit an error. #​2298.
  • Calling Element#cssSelector() on an element with combining characters in the class or ID now produces the correct output. #​1984.

v1.19.1

Changes
  • Added support for http/2 requests in Jsoup.connect(), when running on Java 11+, via the Java HttpClient
    implementation. #​2257.
    • In this version of jsoup, the default is to make requests via the HttpUrlConnection implementation: use
      System.setProperty("jsoup.useHttpClient", "true"); to enable making requests via the HttpClient instead ,
      which will enable http/2 support, if available. This will become the default in a later version of jsoup, so now is
      a good time to validate it.
    • If you are repackaging the jsoup jar in your deployment (i.e. creating a shaded- or a fat-jar), make sure to specify
      that as a Multi-Release
      JAR.
    • If the HttpClient impl is not available in your JRE, requests will continue to be made via
      HttpURLConnection (in http/1.1 mode).
  • Updated the minimum Android API Level validation from 10 to 21. As with previous jsoup versions, Android
    developers need to enable core library desugaring. The minimum Java version remains Java 8.
    #​2173
  • Removed previously deprecated class: org.jsoup.UncheckedIOException (replace with java.io.UncheckedIOException);
    moved previously deprecated method Element Element#forEach(Consumer) to
    void Element#forEach(Consumer()). #​2246
  • Deprecated the methods Document#updateMetaCharsetElement(boolean) and Document#updateMetaCharsetElement(), as the
    setting had no effect. When Document#charset(Charset) is called, the document's meta charset or XML encoding
    instruction is always set. #​2247
Improvements
  • When cleaning HTML with a Safelist that preserves relative links, the isValid() method will now consider these
    links valid. Additionally, the enforced attribute rel=nofollow will only be added to external links when configured
    in the safelist. #​2245
  • Added Element#selectStream(String query) and Element#selectStream(Evaluator) methods, that return a Stream of
    matching elements. Elements are evaluated and returned as they are found, and the stream can be
    terminated early. #​2092
  • Element objects now implement Iterable, enabling them to be used in enhanced for loops.
  • Added support for fragment parsing from a Reader via
    Parser#parseFragmentInput(Reader, Element, String). #​1177
  • Reintroduced CLI executable examples, in jsoup-examples.jar. #​1702
  • Optimized performance of selectors like #id .class (and other similar descendant queries) by around 4.6x, by better
    balancing the Ancestor evaluator's cost function in the query
    planner. #​2254
  • Removed the legacy parsing rules for <isindex> tags, which would autovivify a form element with labels. This is no
    longer in the spec.
  • Added Elements.selectFirst(String cssQuery) and Elements.expectFirst(String cssQuery), to select the first
    matching element from an Elements list. #​2263
  • When parsing with the XML parser, XML Declarations and Processing Instructions are directly handled, vs bouncing
    through the HTML parser's bogus comment handler. Serialization for non-doctype declarations no longer end with a
    spurious !. #​2275
  • When converting parsed HTML to XML or the W3C DOM, element names containing < are normalized to _ to ensure valid
    XML. For example, <foo<bar> becomes <foo_bar>, as XML does not allow < in element names, but HTML5
    does. #​2276
  • Reimplemented the HTML5 Adoption Agency Algorithm to the current spec. This handles mis-nested formating / structural elements. #​2278
Bug Fixes
  • If an element has an ; in an attribute name, it could not be converted to a W3C DOM element, and so subsequent XPath
    queries could miss that element. Now, the attribute name is more completely
    normalized. #​2244
  • For backwards compatibility, reverted the internal attribute key for doctype names to
    "name". #​2241
  • In Connection, skip cookies that have no name, rather than throwing a validation
    exception. #​2242
  • When running on JDK 1.8, the error java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
    could be thrown when calling Response#body() after parsing from a URL and the buffer size was
    exceeded. #​2250
  • For backwards compatibility, allow null InputStream inputs to Jsoup.parse(InputStream stream, ...), by returning
    an empty Document. #​2252
  • A template tag containing an li within an open li would be parsed incorrectly, as it was not recognized as a
    "special" tag (which have additional processing rules). Also, added the SVG and MathML namespace tags to the list of
    special tags. #​2258
  • A template tag containing a button within an open button would be parsed incorrectly, as the "in button scope"
    check was not aware of the template element. Corrected other instances including MathML and SVG elements,
    also. #​2271
  • An :nth-child selector with a negative digit-less step, such as :nth-child(-n+2), would be parsed incorrectly as a
    positive step, and so would not match as expected. #​1147
  • Calling doc.charset(charset) on an empty XML document would throw an
    IndexOutOfBoundsException. #​2266
  • Fixed a memory leak when reusing a nested StructuralEvaluator (e.g., a selector ancestor chain like A B C) by
    ensuring cache reset calls cascade to inner members. #​2277
  • Concurrent calls to doc.clone().append(html) were not supported. When a document was cloned, its Parser was not cloned but was a shallow copy of the original parser. #​2281

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/jsoup.version branch from 151963d to 3347862 Compare April 29, 2025 17:27
@renovate renovate bot changed the title fix(deps): update dependency org.jsoup:jsoup to v1.19.1 fix(deps): update dependency org.jsoup:jsoup to v1.20.1 Apr 29, 2025
@renovate renovate bot force-pushed the renovate/jsoup.version branch from 3347862 to d8ca3a5 Compare May 20, 2025 10:53
@renovate renovate bot force-pushed the renovate/jsoup.version branch from d8ca3a5 to 28243cc Compare June 23, 2025 06:54
@renovate renovate bot changed the title fix(deps): update dependency org.jsoup:jsoup to v1.20.1 fix(deps): update dependency org.jsoup:jsoup to v1.21.1 Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants