-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8368527: JMX: Add an MXBeans method to query GC CPU time #27537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back JonasNorlinder! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@JonasNorlinder The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/label add hotspot-gc |
@JonasNorlinder |
/label remove core-libs |
@AlanBateman |
This proposal will probably require discussion as to whether this is a property of a standard MXBean or a JDK-specific MXBean. It might be that GarbageCollectorMXBean is a better place for this. |
Thanks @AlanBateman. I did first consider To clarify what I mean with "exposing a sub-component at a time"; consider the following
Its output will be
So no
Additionally it includes a method to request a GC, which made me think that this could be a good fit for this method. That being said if my above observations are incorrect or there is a more appropriate place to put this method I'm happy to update the PR. |
Webrevs
|
I don't think this is appropriately placed in MemoryMXBean. I will discuss in the CSR request |
src/hotspot/share/include/jmm.h
Outdated
JMM_GC_COUNT = 10, /* Total number of collections */ | ||
JMM_JVM_UPTIME_MS = 11, /* The JVM uptime in milliseconds */ | ||
JMM_GC_CPU_TIME = 11, /* Total accumulated GC CPU time */ | ||
JMM_JVM_UPTIME_MS = 12, /* The JVM uptime in milliseconds */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit odd to me to change the existing define of UPTIME here.
OK it is not a public interface used between different versions, and people should not be mixing up jmm.h and management implementations... But usually we would just add the new definition? (items below are not strictly grouped by name) Maybe this is just a nit, and only makes cross-version comparisons easier.
Looks good overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it’s a useful function. (I would not have expected it in the memory bean but in the go bean, but I guess both is fine.) Some more comments inline
/** | ||
* Returns the CPU time used by all garbage collection threads. | ||
* | ||
* <p> This include time since genesis, so the value can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The “so the” is not too obvious to me. Maybe simplify it a bit “This includes time since genesis and counts activities even before the first collection cycle.”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ecki for pointing that out :) I make a simplification whenever the CSR discussion (https://bugs.openjdk.org/browse/JDK-8368529) have reached a consensus where this method should live.
* workers, VM operations on the VM thread and the string | ||
* deduplication thread (if enabled). This method returns | ||
* {@code -1} if the platform does not support this operation | ||
* or if called during shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explicitly state here that this is concurrent time as well as worker time spent during pauses but not accumulated pause times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to ensure that I read this correct, please correct any mistake below:
- concurrent time =
CPU time
- accumulate times =
wall-clock time
Is your suggestion that we should clarify that the method may only account for CPU time
and not wall-clock time
during pauses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s mostly about making clear it’s Not a measurement for pause time (no matter if wall clock or per thread), dont know how to best Formulate it. I think it’s needed sind typically pause times have been associated with GRC thread Timings. Maybe something like „the accounted CPU time can be spent concurrently with application threads or during pauses“ or something Like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the CSR discussion I proposed changing the method name to getTotalGcCpuTime()
as @kevinjwalls raised concerns about conflating it with per generation CPU timings. Would the renaming to getTotalGcCpuTime()
also solve your concern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, genesis (Universe::genesis()?), is a VM detail that may be unclear. Do we mean:
"
Returns the CPU time used by all garbage collection threads.
This includes time for all driver threads, workers, VM operations
on the VM thread, and the string deduplication thread (if enabled).
Therefore the value can be non-zero even if no garbage collection cycles have occurred.
This method returns {@code -1} if the platform does not support this operation, or if called during shutdown.
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant genesis in the literal sense i.e. since thread creation not relating to Universe::genesis
. It may be non-zero since each there may do some initialization work and thus CPU time would be non-zero. However given @AlanBateman comment about being HotSpot VM specific should we avoid talking about these specific details here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CPU time used by all garbage collection threads" or even "Accumulated CPU time..." would seem clear that we mean these threads, in this process. We can leave genesis out of it, whether it meant start of process or dawn of time. 8-)
Yes, might be better as just, "This may include some startup work and overhead, therefore the value can be non-zero even...".
* or if called during shutdown. | ||
* | ||
* @return the total CPU time for all garbage collection | ||
* threads in nanoseconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure did I miss the discussion, other methods like getTotalCompilationTime() return millis, is it ok or required to use new units here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed unfortunate that methods mix units but we are complementing, OperatingSystemMXBean.getProcessCpuTime()
which returns nanoseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s in Both names - especially with the Description - understandable what it means, I was just afraid that people could confuse concepts. Maybe it doesn’t really matter. I think in the gc specific ones the colateral doc was a bit more specific about concurrent times and pauses, in the past.
Thread t = new Thread(() -> { | ||
while (true) { | ||
long gcCpuTimeFromThread = mxMemoryBean.getGcCpuTime(); | ||
if (gcCpuTimeFromThread < -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the tests actually test if it is ever != -1 or if it is monotonically increasing or under which conditions (Gc type? Platform? Build flags?) it is unsupported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why there is no test for that is because -1
is a valid return value. If this test is run on a system that does not support CPU time from os or if it queried during shutdown when we are terminating threads we opt to return -1
for safety reasons. We cannot assume that OpenJDK is never run on a platform that does not support certain optional os
implementations.
The goal with this test is to verify the shutdown protection and the API. Additionally, since this API is piggybacking on other methods (effectively in os
) that are actually responsible for the CPU time functionality, I believe they would be responsible implement tests for what you suggest.
FWIW; I had a quick look and the tests that you are expecting might already be found in e.g. test/hotspot/jtreg/vmTestbase/nsk/monitoring/share/ThreadMonitor.java
, vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/TestDescription.java
Thanks @kevinjwalls for your review. I will update the order in |
@ecki That's great to hear 🚀! Thanks for taking your time to go through this and provide feedback/comments. |
The CSR is probably a bit premature as this one is going to require seeing if the existing MXBeans are the best place for this (at least some of the original modelling assumed STW collectors) or whether a new management interface is needed. It will need think about whether should be a standard or JDK-specific API. Right now, the draft API spec makes it sounds very HotSpot VM specific. If added to an existing interface then it will need to be a default method and specifies to have default behavior when not implemented. |
Sorry if I broke protocol (first time I submit a PR requiring a CSR). Are you saying that I should had sent out this suggestion on a mailing list for a pre-discussion before actually submitting the CSR?
Could you elaborate on what makes this HotSpot VM specific? I think driver threads and workers are a generic concept but I do see your point that VM operations and string deduplication tends to be more HotSpot VM specific. Is that what you meant? I added these specific pointers in an effort to be helpful for a user to understand what parts the metric could include (but for actual details they would need to go to the VM implementation). To avoid being HotSpot VM specific I prefaced with "In general, ...". Should I remove these specialized pointers?
Great point. Should have added a default behavior. Will include that in an updated PR (if we reach a consensus that we should add it to an existing interface). |
I don't think there are rules or a protocol around this. It's mostly just to avoid trying to get agreement on a direction in two places at the same time.
"genesis", "VM operations on the VM thread", ... are very implementation specific. The API docs, esp. the standard APIs, have to VM and independent of implementation. In some places you'll have usage of I think the main question for the proposal is which management interface is appropriate. The j.l.management API was designed a long time ago (JSR-174, Java 5) and the abstractions it defines (memory, memory manager, memory pools) mean there isn't an obvious place for attributes and operations like the attribute proposed here. On the surface it might seem that "the" GarbageCollectorMXBean is the right place but it's not a singleton, instead there are 2, 3, or 4 management interfaces for each GC. I assume this is why the proposal is currently to add an read-only attribute to MemoryMXBean. One idea to try is introducing a JDK-specific MemoryMXBean, meaning in com.sun.management with the other JDK-specific management interfaces. This would give you flexibility to introduce the notion of "GC threads" (the APIs don't know about non-Java threads), and reference OperatingSystemMXBean::getProcessCpuTime as this is also a JDK-specific management interfaces. Would you be able to try that out and see what issue come up? |
I introduced a JDK-specific MemoryMXBean in import java.lang.management.ManagementFactory;
import com.sun.management.MemoryMXBean;
import com.sun.management.OperatingSystemMXBean;
public class Main2 {
public static void main(String[] args) {
final MemoryMXBean mxMemoryBean = ManagementFactory.getPlatformMXBean(MemoryMXBean.class);
final OperatingSystemMXBean mxOSBean = ManagementFactory.getPlatformMXBean(OperatingSystemMXBean.class);
System.out.println("getTotalGcCpuTime: " + mxMemoryBean.getTotalGcCpuTime());
System.out.println("getProcessCpuTime: " + mxOSBean.getProcessCpuTime());
}
} Given that I updated the language in the documentation: /**
* Returns the CPU time used by garbage collection.
*
* <p> CPU time used by all garbage collection. In
* general this includes time for all driver threads,
* workers, VM operations on the VM thread and the string
* deduplication thread (if enabled). May be non-zero even if no
* GC cycle occurred. This method returns {@code -1} if the
* platform does not support this operation or if called during
* shutdown.
*
* @return the total accumulated CPU time for garbage collection
* in nanoseconds.
*
* @since 26
*/ Since we include time on VM thread which strictly is not a garbage collection thread I think it is better to constrain it to "garbage collection" in general. Let me know what you think. |
@AlanBateman if we agree on the new approach, I assume that a CSR would no longer be needed? |
A JDK-specific management interface is a supported/documented interface so additions/changes will need a CSR. I would suggest ignoring for the CSR for a few days to give time for feedback here on the updated proposal. |
* deduplication thread (if enabled). May be non-zero even if no | ||
* GC cycle occurred. This method returns {@code -1} if the | ||
* platform does not support this operation or if called during | ||
* shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the proposal is a JDK-specific management interface it means that this method can say something about how it relates to, or how it might be used with, OperatingSystemMXBean::getProcessCpuTime.
I'm also wondering about ThreadMXBean::isThreadCpuTimeSupported. Is it possible for this to return false but getTotalGcCpuTime to return a value >= 0. The former concerns Java threads so it might not have any connection to this new method which concerns CPU usage by non-Java threads, but someone is bound to ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering about ThreadMXBean::isThreadCpuTimeSupported. Is it possible for this to return false but getTotalGcCpuTime to return a value >= 0.
ThreadMXBean::isThreadCpuTimeSupported
uses the value from os::is_thread_cpu_time_supported
:
jdk/src/hotspot/share/services/management.cpp
Lines 131 to 137 in 6431310
if (os::is_thread_cpu_time_supported()) { | |
_optional_support.isCurrentThreadCpuTimeSupported = 1; | |
_optional_support.isOtherThreadCpuTimeSupported = 1; | |
} else { | |
_optional_support.isCurrentThreadCpuTimeSupported = 0; | |
_optional_support.isOtherThreadCpuTimeSupported = 0; | |
} |
I ensure to always return -1
if thread cpu time is not supported as this is the contract that is promised by the documentation.
jdk/src/hotspot/share/services/management.cpp
Lines 894 to 896 in 6431310
if (!os::is_thread_cpu_time_supported()) { | |
return -1; | |
} |
So, no this is not possible. However, is_thread_cpu_time_enabled
is not being considered, but as you say, this method only concerns Java threads.
Thanks all for your suggestions. I have updated the CSR to align with the discussion in this PR. |
Questions that this change brings up to me: Is there a general understanding or a statement of what it means to be standard or JDK-specific in this area? How do we make it easy for somebody to add useful features in the right place... Particularly on this method about GC time, I don't really follow the reasoning for not accepting the method in java.lang.MemoryMXBean. The most generic Java API can admit that Java is Garbage-Collected, and that doing that work can take some CPU time. The originally proposed JavaDoc had some HotSpot-specific terms in it, which maybe did not fit. But they could still be valid if rephrased as a note of examples of the kinds of things that an implementation may need to do. I'm wondering if the com.sun.management... interfaces made sense as a way to add features outside of an original JSR spec. Maybe that was their intent. But are we still in that world, do we still need to do that? |
It's similar to standard APIs vs. JDK-specific APIs. A management interface with HotSpot VM or JDK-specific attributes or operations would be a candidate for the jdk.management module rather than the java.management module. I wasn't directly involved in JSR-174 but there were three VM vendors in the EG at the time. The management interfaces that were defined had to be feasible to implement on all. There was interest in further management interfaces and the Sun JDK defined its extensions in com.sun.management. The word "extension" is significant here because asking the platform MBeanServer for a standard MXBean (either directly or via the standard object name) provides access to additional attributes/operations defined by the extension. It might seem a bit strange to see csm.ThreadMXBean extends jlm.ThreadMXBean, but that was the pattern established back in JDK 5. A lookup of say "java.lang:type=Threading" gives access to csm.ThreadMXBean with the additional attributes and operations defined in the subclass. It may be that some of the attributes or operations in these extensions could be proposed for the standard management interfaces, not clear if this is worth doing. TBH, I think the bigger issue is that these management interfaces haven't evolved significantly, they pre-date fully concurrent GCs and other significant improvements. Note that are some management interfaces that are clearly HotSpot VM or JDK-specific, e.g. csm.HotSpotDiagnosticMXBean and jdk.management.VirtualThreadSchedulerMXBean. As regards the proposal here. I think the API docs will need work. The standard APIs don't know about non-Java threads. This is why it can't speak of "driver threads" and the "VM thread". It seems reasonable to introduce the notion that garbage collection consuming CPU cycle but anything about STW vs. concurrent GCs is more for an implNote. One of the goals, as I understand it, is to use it in conjunction with JDK-specific OperatingSystemMXBean "processCpuTime" attribute. It's possible to cross link in API docs but its relation to that attribute can't be normative if in a standard API. So I think focus on the specifying the attribute first. So far I think Jonas has demonstrated that it is feasible to implementation as either a standard or extension. |
A possible more generic api doc:
|
Or:
|
Hi all,
This PR augments the CPU time sampling measurement capabilities that a user can perform from Java code with the addition of
MemoryMXBean.getGcCpuTime()
. With this patch it will be possible for a user to measure process and GC CPU time during critical section or iterations in benchmarks to name a few. This new method complements the existingOperatingSystemMXBean.getProcessCpuTime()
for a refined understanding.CollectedHeap::gc_threads_do
may operate on terminated GC threads during shutdown, but thanks to JDK-8366865 by @walulyai we can piggyback on the newUniverse::is_shutting_down
. I have implemented a stress-testtest/jdk/java/lang/management/MemoryMXBean/GetGcCpuTime.java
that may identify reading CPU time of terminated threads. Synchronizing onUniverse::is_shutting_down
andHeap_lock
resolves this problem.FWIW; To my understanding we don't want to add a
Universe::is_shutting_down
check in gc_threads_do as this may introduce a performance penalty that is unacceptable, therefore we must be careful about the few places where external users call upon gc_threads_do and may race with a terminating VM.Tested: test/jdk/java/lang/management/MemoryMXBean/GetGcCpuTime.java, jdk/javax/management/mxbean hotspot/jtreg/vmTestbase/nsk/monitoring on Linux x64, Linux aarch64, Windows x64, macOS x64 and macOS aarch64 with release and fastdebug.
Progress
Issues
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27537/head:pull/27537
$ git checkout pull/27537
Update a local copy of the PR:
$ git checkout pull/27537
$ git pull https://git.openjdk.org/jdk.git pull/27537/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27537
View PR using the GUI difftool:
$ git pr show -t 27537
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27537.diff
Using Webrev
Link to Webrev Comment