fix(deps): update dependency org.jsoup:jsoup to v1.21.1 #47

renovate · 2025-04-25T22:50:45Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
org.jsoup:jsoup (source)	`1.18.3` -> `1.21.1`

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

Release Notes

jhy/jsoup (org.jsoup:jsoup)

`v1.21.1`

Changes

Removed previously deprecated methods. #2317
Deprecated the :matchText pseduo-selector due to its side effects on the DOM; use the new ::textnode selector and the Element#selectNodes(String css, Class type) method instead. #2343
Deprecated Connection.Response#bufferUp() in lieu of Connection.Response#readFully() which can throw a checked IOException.
Deprecated internal methods Validate#ensureNotNull (replaced by typed Validate#expectNotNull); protected HTML appenders from Attribute and Node.
If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.

Improvements

Enhanced the Selector to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: ::comment:contains(prices) + p will select p elements immediately after a  comment. Supported types include ::node, ::leafnode, ::comment, ::text, ::data, and ::cdata. Node contextual selectors like ::node:contains(text), :matches(regex), and :blank are also supported. Introduced Element#selectNodes(String css) and Element#selectNodes(String css, Class nodeType) for direct node selection. #2324
Added TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).
Made TokenQueue and CharacterReader autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.
Added Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias of QueryParser.parse(String css).
Custom tags (defined via the TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.
Added NodeVisitor#traverse(Node) to simplify node traversal calls (vs. importing NodeTraversor).
Updated the default user-agent string to improve compatibility. #2341
The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #2326.
Added Connection#readFully() as a replacement for Connection#bufferUp() with an explicit IOException. Similarly, added Connection#readBody() over Connection#body(). Deprecated Connection#bufferUp(). #2327
When serializing HTML, the < and > characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337
Changed Connection to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via -Djsoup.useHttpClient=false. #2340

Bug Fixes

The contents of a script in a svg foreign context should be parsed as script data, not text. #2320
Tag#isFormSubmittable() was updating the Tag's options. #2323
The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. #2325
Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. #2332
When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. #2334

`v1.20.1`

Changes

To better follow the HTML5 spec and current browsers, the HTML parser no longer allows self-closing tags (<foo />)
to close HTML elements by default. Foreign content (SVG, MathML), and content parsed with the XML parser, still
supports self-closing tags. If you need specific HTML tags to support self-closing, you can register a custom tag via
the TagSet configured in Parser.tagSet(), using Tag#set(Tag.SelfClose). Standard void tags (such as <img>,
<br>, etc.) continue to behave as usual and are not affected by this
change. #2300.
The following internal components have been deprecated. If you do happen to be using any of these, please take the opportunity now to migrate away from them, as they will be removed in jsoup 1.21.1.
- ChangeNotifyingArrayList, Document.updateMetaCharsetElement(), Document.updateMetaCharsetElement(boolean), HtmlTreeBuilder.isContentForTagData(String), Parser.isContentForTagData(String), Parser.setTreeBuilder(TreeBuilder), Tag.formatAsBlock(), Tag.isFormListed(), TokenQueue.addFirst(String), TokenQueue.chompTo(String), TokenQueue.chompToIgnoreCase(String), TokenQueue.consumeToIgnoreCase(String), TokenQueue.consumeWord(), TokenQueue.matchesAny(String...)

Functional Improvements

Rebuilt the HTML pretty-printer, to simplify and consolidate the implementation, improve consistency, support custom
Tags, and provide a cleaner path for ongoing improvements. The specific HTML produced by the pretty-printer may be
different from previous versions. #2286.
Added the ability to define custom tags, and to modify properties of known tags, via the TagSet tag collection.
Their properties can impact both the parse and how content is
serialized (output as HTML or XML). #2285.
Element.cssSelector() will prefer to return shorter selectors by using ancestor IDs when available and unique. E.g.
#id > div > p instead of html > body > div > div > p #2283.
Added Elements.deselect(int index), Elements.deselect(Object o), and Elements.deselectAll() methods to remove
elements from the Elements list without removing them from the underlying DOM. Also added Elements.asList() method
to get a modifiable list of elements without affecting the DOM. (Individual Elements remain linked to the
DOM.) #2100.
Added support for sending a request body from an InputStream with
Connection.requestBodyStream(InputStream stream). #1122.
The XML parser now supports scoped xmlns: prefix namespace declarations, and applies the correct namespace to Tags and
Attributes. Also, added Tag#prefix(), Tag#localName(), Attribute#prefix(), Attribute#localName(), and
Attribute#namespace() to retrieve these. #2299.
CSS identifiers are now escaped and unescaped correctly to the CSS spec. Element#cssSelector() will emit
appropriately escaped selectors, and the QueryParser supports those. Added Selector.escapeCssIdentifier() and
Selector.unescapeCssIdentifier(). #2297, #2305

Structure and Performance Improvements

Refactored the CSS QueryParser into a clearer recursive descent
parser. #2310.
CSS selectors with consecutive combinators (e.g. div >> p) will throw an explicit parse
exception. #2311.
Performance: reduced the shallow size of an Element from 40 to 32 bytes, and the NodeList from 32 to 24.
#2307.
Performance: reduced GC load of new StringBuilders when tokenizing input
HTML. #2304.
Made Parser instances threadsafe, so that inadvertent use of the same instance across threads will not lead to
errors. For actual concurrency, use Parser#newInstance() per
thread. #2314.

Bug Fixes

Element names containing characters invalid in XML are now normalized to valid XML names when
serializing. #1496.
When serializing to XML, characters that are invalid in XML 1.0 should be removed (not
encoded). #1743.
When converting a Document to the W3C DOM in W3CDom, elements with an attribute in an undeclared namespace now
get a declaration of xmlns:prefix="undefined". This allows subsequent serialization to XML via W3CDom.asString()
to succeed. #2087.
The StreamParser could emit the final elements of a document twice, due to how onNodeCompleted was fired when closing out the stack. #2295.
When parsing with the XML parser and error tracking enabled, the trailing ? in <?xml version="1.0"?> would
incorrectly emit an error. #2298.
Calling Element#cssSelector() on an element with combining characters in the class or ID now produces the correct output. #1984.

`v1.19.1`

Changes

Added support for http/2 requests in Jsoup.connect(), when running on Java 11+, via the Java HttpClient
implementation. #2257.
- In this version of jsoup, the default is to make requests via the HttpUrlConnection implementation: use
  System.setProperty("jsoup.useHttpClient", "true"); to enable making requests via the HttpClient instead ,
  which will enable http/2 support, if available. This will become the default in a later version of jsoup, so now is
  a good time to validate it.
- If you are repackaging the jsoup jar in your deployment (i.e. creating a shaded- or a fat-jar), make sure to specify
  that as a Multi-Release
  JAR.
- If the HttpClient impl is not available in your JRE, requests will continue to be made via
  HttpURLConnection (in http/1.1 mode).
Updated the minimum Android API Level validation from 10 to 21. As with previous jsoup versions, Android
developers need to enable core library desugaring. The minimum Java version remains Java 8.
#2173
Removed previously deprecated class: org.jsoup.UncheckedIOException (replace with java.io.UncheckedIOException);
moved previously deprecated method Element Element#forEach(Consumer) to
void Element#forEach(Consumer()). #2246
Deprecated the methods Document#updateMetaCharsetElement(boolean) and Document#updateMetaCharsetElement(), as the
setting had no effect. When Document#charset(Charset) is called, the document's meta charset or XML encoding
instruction is always set. #2247

Improvements

When cleaning HTML with a Safelist that preserves relative links, the isValid() method will now consider these
links valid. Additionally, the enforced attribute rel=nofollow will only be added to external links when configured
in the safelist. #2245
Added Element#selectStream(String query) and Element#selectStream(Evaluator) methods, that return a Stream of
matching elements. Elements are evaluated and returned as they are found, and the stream can be
terminated early. #2092
Element objects now implement Iterable, enabling them to be used in enhanced for loops.
Added support for fragment parsing from a Reader via
Parser#parseFragmentInput(Reader, Element, String). #1177
Reintroduced CLI executable examples, in jsoup-examples.jar. #1702
Optimized performance of selectors like #id .class (and other similar descendant queries) by around 4.6x, by better
balancing the Ancestor evaluator's cost function in the query
planner. #2254
Removed the legacy parsing rules for <isindex> tags, which would autovivify a form element with labels. This is no
longer in the spec.
Added Elements.selectFirst(String cssQuery) and Elements.expectFirst(String cssQuery), to select the first
matching element from an Elements list. #2263
When parsing with the XML parser, XML Declarations and Processing Instructions are directly handled, vs bouncing
through the HTML parser's bogus comment handler. Serialization for non-doctype declarations no longer end with a
spurious !. #2275
When converting parsed HTML to XML or the W3C DOM, element names containing < are normalized to _ to ensure valid
XML. For example, <foo<bar> becomes <foo_bar>, as XML does not allow < in element names, but HTML5
does. #2276
Reimplemented the HTML5 Adoption Agency Algorithm to the current spec. This handles mis-nested formating / structural elements. #2278

Bug Fixes

If an element has an ; in an attribute name, it could not be converted to a W3C DOM element, and so subsequent XPath
queries could miss that element. Now, the attribute name is more completely
normalized. #2244
For backwards compatibility, reverted the internal attribute key for doctype names to
"name". #2241
In Connection, skip cookies that have no name, rather than throwing a validation
exception. #2242
When running on JDK 1.8, the error java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
could be thrown when calling Response#body() after parsing from a URL and the buffer size was
exceeded. #2250
For backwards compatibility, allow null InputStream inputs to Jsoup.parse(InputStream stream, ...), by returning
an empty Document. #2252
A template tag containing an li within an open li would be parsed incorrectly, as it was not recognized as a
"special" tag (which have additional processing rules). Also, added the SVG and MathML namespace tags to the list of
special tags. #2258
A template tag containing a button within an open button would be parsed incorrectly, as the "in button scope"
check was not aware of the template element. Corrected other instances including MathML and SVG elements,
also. #2271
An :nth-child selector with a negative digit-less step, such as :nth-child(-n+2), would be parsed incorrectly as a
positive step, and so would not match as expected. #1147
Calling doc.charset(charset) on an empty XML document would throw an
IndexOutOfBoundsException. #2266
Fixed a memory leak when reusing a nested StructuralEvaluator (e.g., a selector ancestor chain like A B C) by
ensuring cache reset calls cascade to inner members. #2277
Concurrent calls to doc.clone().append(html) were not supported. When a document was cloned, its Parser was not cloned but was a shallow copy of the original parser. #2281

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate bot force-pushed the renovate/jsoup.version branch from 151963d to 3347862 Compare April 29, 2025 17:27

renovate bot changed the title ~~fix(deps): update dependency org.jsoup:jsoup to v1.19.1~~ fix(deps): update dependency org.jsoup:jsoup to v1.20.1 Apr 29, 2025

renovate bot force-pushed the renovate/jsoup.version branch from 3347862 to d8ca3a5 Compare May 20, 2025 10:53

fix(deps): update dependency org.jsoup:jsoup to v1.21.1

28243cc

renovate bot force-pushed the renovate/jsoup.version branch from d8ca3a5 to 28243cc Compare June 23, 2025 06:54

renovate bot changed the title ~~fix(deps): update dependency org.jsoup:jsoup to v1.20.1~~ fix(deps): update dependency org.jsoup:jsoup to v1.21.1 Jun 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(deps): update dependency org.jsoup:jsoup to v1.21.1 #47

fix(deps): update dependency org.jsoup:jsoup to v1.21.1 #47

Uh oh!

renovate bot commented Apr 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

fix(deps): update dependency org.jsoup:jsoup to v1.21.1 #47

Are you sure you want to change the base?

fix(deps): update dependency org.jsoup:jsoup to v1.21.1 #47

Uh oh!

Conversation

renovate bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v1.21.1

Changes

Improvements

Bug Fixes

v1.20.1

Changes

Functional Improvements

Structure and Performance Improvements

Bug Fixes

v1.19.1

Changes

Improvements

Bug Fixes

Configuration

Uh oh!

Uh oh!

renovate bot commented Apr 25, 2025 •

edited

Loading

`v1.21.1`

`v1.20.1`

`v1.19.1`