Skip to content

Conversation

Marcono1234
Copy link
Contributor

@Marcono1234 Marcono1234 commented Sep 26, 2024

Follow-up for #29 and #24 (comment)

Currently the Language constructor reinterprets the native language pointer using LIBRARY_ARENA; this is incorrect (?) because the language pointer in the parsers (at least those currently generated by the tree-sitter CLI) is not newly allocated memory on the heap, but instead points to static data in the native library.

So if the native library is unloaded (explicitly by the user by calling Arena#close, or implicitly by the garbage collector?) a JVM crash can occur because the scope of the language pointer is LIBRARY_ARENA instead of the scope of the native library, so the java.lang.foreign implementation does not notice that the address is not accessible anymore.

However, there is still another issue then: Linker uses the global scope for results of downcalls. So even though function has the correct scope (= scope of the native library), languagePointer has the wrong scope (= global scope). Setting the correct scope can unfortunately only be done if the Arena, which was used to load the library, is present, see also https://mail.openjdk.org/pipermail/panama-dev/2024-September/020636.html and https://bugs.openjdk.org/browse/JDK-8340641.

Therefore this pull request adds a Language#load overload with Arena parameter.


What do you think? I have marked this as draft for now because I am not completely sure about these changes, but I think it tries as good as possible to prevent JVM crashes when you unload the library while the language is still in use.

@ObserverOfTime
Copy link
Member

This breaks the regular constructor.

@Marcono1234
Copy link
Contributor Author

Marcono1234 commented Sep 27, 2024

Do you mean for the case where the user loaded the language pointer with Arena.ofConfined() but expected to be able to use the Language from other threads?
Maybe it is reasonable that this should not work, and it will be easy for users to solve, they just need to use any Arena other than ofConfined() to load the parser library, e.g. ofShared() or global().

Otherwise I don't see how this breaks the behavior of the constructor. As we noticed in the discussion related to #29, rather the current behavior is broken:
The language pointer points within static data in the parser library. Using reinterpret(LIBRARY_ARENA, ...) doesn't change anything about that. The address of the pointer is still the same, except that now the foreign API thinks the scope is LIBRARY_ARENA, while in reality the correct scope is the scope of the parser library1, whose lifespan might be shorter.
Due to this the JVM crashes when the user uses the Language when the parser library has already been unloaded, instead of just throwing an exception.

As side note: If you want to manually experiment with this, or include it in the unit tests, you probably have to make sure the parser library (e.g. libtree-sitter-java.so) has not been loaded yet. Otherwise if you have the library loaded multiple times it seems unloading (through Arena#close for the Arena of the SymbolLookup) is only effective if all instances of the library have been unloaded (?).

Footnotes

  1. Which the user hopefully set on the MemorySegment, respectively which the new load(SymbolLookup, Arena, String) overload added by this PR now does.

@Marcono1234
Copy link
Contributor Author

I also brought up the general issue with scopes in the foreign API and then specifically the jtreesitter situation on the JDK mailing list, where a JDK maintainer also mentioned that the current reinterpret(LIBRARY_ARENA, ...) usage is probably incorrect, see https://mail.openjdk.org/pipermail/panama-dev/2024-September/020640.html

Another interesting point mentioned there is that attaching the ts_language_delete cleanup action to the user provided MemorySegment might not be safe1: If the user creates multiple Language instances from the same MemorySegment, then ts_language_delete would be attached multiple times which would likely lead to a double-free.

Footnotes

  1. Though currently ts_language_delete is no-op so it might not actually cause any issues yet.

@ObserverOfTime
Copy link
Member

To prevent crashes, I think we should just properly document that the arena must be kept open.

I also brought up the general issue with scopes in the foreign API and then specifically the jtreesitter situation on the JDK mailing list, where a JDK maintainer also mentioned that the current reinterpret(LIBRARY_ARENA, ...) usage is probably incorrect, see https://mail.openjdk.org/pipermail/panama-dev/2024-September/020640.html

I replaced the reinterpret() usage with a java.lang.ref.Cleaner.

Another interesting point mentioned there is that attaching the ts_language_delete cleanup action to the user provided MemorySegment might not be safe1: If the user creates multiple Language instances from the same MemorySegment, then ts_language_delete would be attached multiple times which would likely lead to a double-free.

ts_language_delete is also used in other bindings (e.g. Rust) so I assume it's safe to call multiple times.

@Marcono1234
Copy link
Contributor Author

To prevent crashes, I think we should just properly document that the arena must be kept open.

Are you opposed to the changes in this pull request of setting the scope to the library scope though (in addition to documentation changes)? Just to provide an extra level of safety in case users accidentally disregard this and close the arena too early.


I replaced the reinterpret() usage with a java.lang.ref.Cleaner.

Hmmm ok, though I think this is not directly related to the problems mentioned here. My concern was not that ts_language_delete is called too late (especially if the current implementation is no-op anyway). And also with the Cleaner approach it seems ts_language_delete might be called for the same native address multiple times when the user creates multiple Language objects.


ts_language_delete is also used in other bindings (e.g. Rust) so I assume it's safe to call multiple times.

It seems the difference for the Rust bindings is that there in the common use case you provide the tree_sitter_... function pointer through LanguageFn (at least since tree-sitter/tree-sitter#3069) and then Language::new obtains the language pointer, whereas here for the Java bindings the user directly provides the language pointer.
So if TSLanguage were to contain heap allocated memory (and ts_language_delete performed free), then Rust's implementation would be safe since it obtained newly allocated memory whereas the Java implementation here cannot know whether the user used the same language pointer for multiple Language objects.

@Marcono1234
Copy link
Contributor Author

The Cleaner approach might also introduce another problem: asReadOnly() is called on the original language MemorySegment, which might return a distinct MemorySegment without any strong reference to the original one. And because the Cleaner is registered for the original MemorySegment, if the user does not hold a strong reference to it either, then ts_language_delete might be called while the Language is still in use (I think).

Registering the cleaner for this.self is probably not an option since that will then prevent it from ever being garbage collected, so I am in general not sure if the Cleaner approach can work here.
The Cleaner could be registered for the constructed Language object instead, but then there would still be a problem if the user creates multiple Language objects from the same MemorySegment.

@ObserverOfTime
Copy link
Member

Just to provide an extra level of safety in case users accidentally disregard this and close the arena too early.

We can't be responsible for low-level user errors if the correct process is documented.

It seems the difference for the Rust bindings is that there in the common use case you provide the tree_sitter_... function pointer through LanguageFn (at least since tree-sitter/tree-sitter#3069) and then Language::new obtains the language pointer, whereas here for the Java bindings the user directly provides the language pointer.

That's true. I'll see if I can change the implementation to receive a function handle.

@Marcono1234 Marcono1234 deleted the language-loading-crash branch December 28, 2024 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants