-
Notifications
You must be signed in to change notification settings - Fork 42
Add UserGuide documentation for IMDReader
#430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive UserGuide documentation for the IMDReader
class, enabling users to perform real-time streaming analysis of molecular dynamics simulations using the Interactive Molecular Dynamics (IMD) v3 protocol. The documentation follows the template specified in Issue #427.
- Adds complete documentation for IMD streaming including setup, usage, and limitations
- Provides a comprehensive tutorial notebook with practical examples for different MD engines
- Integrates the new documentation into the existing documentation structure
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
doc/source/formats/reference/imd.rst | Complete IMD format documentation with setup instructions, usage examples, and API reference |
doc/source/examples/other/streaming_imd.ipynb | Tutorial notebook demonstrating real-time streaming analysis with practical examples |
doc/source/examples/other/README.rst | Updated to include the new streaming_imd tutorial in the documentation index |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
MD Engine Configuration | ||
----------------------- | ||
|
||
We provide below example configurations for enabling IMDv3 streaming in popular MD engines. |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected grammar: "We provide example configurations below for enabling IMDv3 streaming in popular MD engines."
We provide below example configurations for enabling IMDv3 streaming in popular MD engines. | |
We provide example configurations below for enabling IMDv3 streaming in popular MD engines. |
Copilot uses AI. Check for mistakes.
Connection Management | ||
--------------------- | ||
|
||
Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks et al.: |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "et al." is typically used for citing multiple authors in academic references, not for listing examples. Consider using "and similar environments" or "and other interactive environments" instead.
Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks et al.: | |
Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks and other interactive environments: |
Copilot uses AI. Check for mistakes.
"### GROMACS Setup\n", | ||
"\n", | ||
"Add these comprehensive IMD settings to your `.mdp` file:\n", | ||
"```code\n", |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block language identifier should be a valid language (e.g., 'bash', 'text', or 'ini') instead of 'code' for proper syntax highlighting in the documentation.
Copilot uses AI. Check for mistakes.
"### LAMMPS Setup\n", | ||
"\n", | ||
"Use the comprehensive IMD fix in your input script:\n", | ||
"```code\n", |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block language identifier should be a valid language (e.g., 'bash', 'text', or 'lammps') instead of 'code' for proper syntax highlighting in the documentation.
"```code\n", | |
"```lammps\n", |
Copilot uses AI. Check for mistakes.
"### NAMD Setup\n", | ||
"\n", | ||
"Add comprehensive IMD configuration to your NAMD configuration file:\n", | ||
"```code\n", |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code block language identifier should be a valid language (e.g., 'bash', 'text', or 'tcl') instead of 'code' for proper syntax highlighting in the documentation.
Copilot uses AI. Check for mistakes.
@ljwoods2 could you also have a quick look please? |
@@ -0,0 +1,325 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #2. # !pip install imdclient>=0.2.2
Don't we want people to install with Conda for consistency?
Also, will people be running this notebook, or reading it as a guide? I think this cell can potentially be removed, seems like a bit of overkill
Reply via ReviewNB
@@ -0,0 +1,325 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is IMD-coords essential for GROMACS? Will the stream fail out without it? Or do you just mean its essential for this example?
Maybe it's worth specifying exactly what's necessary for this example: you need to have coordinates turned on regardless of simulation engine, you need to have freq set to 1, and you need to turn off coordinate unwrapping (so that there are consistent configs across engines)
Also, should this link the docs for each of the simulation engines?
Reply via ReviewNB
@@ -0,0 +1,325 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #42. # simple_imd_analysis()
Why is this commented out? Will github run it automatically as a part of docstring tests if it's uncommented?
Reply via ReviewNB
@@ -0,0 +1,325 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #41. potential_energy = -1000 + np.random.normal(0, 50)
Not all simulation engines implement energy packet, so maybe remove this?
Reply via ReviewNB
@@ -0,0 +1,325 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #51. ax1.clear()
Just double checking, did you test this method of live updating plot locally? Looks a bit different than what we did in workshop
https://github.com/MDAnalysis/imd-workshop-2024/blob/main/activity/graph_utils.py
Reply via ReviewNB
Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data. | ||
|
||
.. note:: | ||
MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and allows gaps in the data stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and allows gaps in the data stream. | |
MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and doesn't enforce a consistent number of integration steps between transmitted frames |
|
||
Streaming involves processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means sending simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk. | ||
|
||
This can be achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol. | |
In IMDv3, this is achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information. |
|
||
.. include:: classes/IMD.txt | ||
|
||
Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data. | |
IMDv2 and IMDv3 enable real-time streaming of simulation data between molecular dynamics engines and receiving clients. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol. |
|
||
.. code-block:: bash | ||
|
||
pip install imdclient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we encourage conda/mamba installation
print(f"WARNING: Large displacement detected at {ts.time} ps: {max_displacement:.2f} Å") | ||
|
||
# Monitor energies if available | ||
if 'potential' in ts.data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key is "potential_energy"
|
||
* ``dt``: Time step size in picoseconds | ||
* ``step``: Current simulation step number | ||
* Energy terms: ``potential``, ``total``, etc. (engine-dependent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these keys are wrong, see:
I think we actually merged them into MDA develop docstring for IMDReader incorrectly:
https://imdclient.readthedocs.io/en/latest/protocol_v3.html#energies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are these engine-dependent? It's engine depended whether the energy packet is implemented, but the exact energy keys should be consistent whenever the packet is implemented, right?
if key not in ['dt', 'step']: | ||
print(f" {key}: {value}") | ||
|
||
Multiple Client Connections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a ton of edge cases when using multiple clients that we don't test, like when one client tries to change blocking behavior, so I think we should just say it isn't recommended
Discussion here
Maybe should update MDA main codebase docs to say this as well:
|
||
Most MDAnalysis analysis classes work with streaming data, but some limitations apply: | ||
|
||
**Compatible Analysis** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should start by outlining what kinds of analyses will and won't work before jumping into distance and contact analysis
analysis_data.append(calculate_something(ts)) | ||
|
||
# This will NOT work - random access | ||
ts = u.trajectory[10] # TypeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading docstring of __getitem__
of StreamReaderBase, I think these should both be ValueErrors
def __getitem__(self, frame):
"""Return an iterator for slicing a streaming trajectory.
Parameters
----------
frame : slice
Slice object. Only the step parameter is meaningful for streams.
Returns
-------
FrameIteratorAll or StreamFrameIteratorSliced
Iterator for the requested slice.
Raises
------
TypeError
If frame is not a slice object.
ValueError
If slice contains start or stop values.
If it does throw a TypeError, something is wrong...
# This will NOT work - restarting iteration | ||
for ts in u.trajectory: | ||
break | ||
for ts in u.trajectory: # Won't start from beginning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggest it will start from somewhere else other than the beginning, won't it just fail?
print(f"Time: {ts.time:.2f} ps, Step: {ts.data.get('step', 'N/A')}") | ||
|
||
# Your analysis code here | ||
selected_atoms = u.select_atoms("protein and name CA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify -- is there something about IMD that mandates repeating the selection of the AtomGroup
in the loop, or can this be hoisted up for efficiency as usual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing unique about IMD here, can be changed
print(f"Protein COM: {center_of_mass}") | ||
|
||
# Optional: break on some condition | ||
if ts.time > 1000: # Stop after 1000 ps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify: this only stops the analysis, not the running simulation/engine the analysis is connected to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this break is the end of the script, it will trigger the IMDClient to cleanup via atext
That cleanup might put the simulation engine in a waiting state for another client connection, but won't kill it.
Whether it makes the simulation engine wait or not depends on 1. how the simulation engine was configured and 2. whether the IMDClient overrode the simulation engine's waiting behavior
continue_after_disconnect
kwarg is the IMDClient switch for changing waiting behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the need to look out for this/think about this might be sensible to mention somewhere inline here then, if only to refer to more detailed docs for someone who may be copy/pasting the sample code as a template.
|
||
# Contact analysis | ||
selection1 = u.select_atoms("resid 1-10") | ||
selection2 = u.select_atoms("resid 50-60") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoist selections if IMD allows it?
|
||
* **No timeseries methods**: Cannot use ``trajectory.timeseries()`` | ||
* **No bulk operations**: Cannot extract all data at once | ||
* **Limited multiprocessing**: Cannot split across processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a concrete example of something you can do with multiprocessing
with "vanilla" MDAnalysis that doesn't work with IMD? I would have figured that if you can get data into a NumPy array you're probably "ok" to split off additional processes if you want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By vanilla MDA, does that exclude analysis classes with dask/multiprocessing backend?
You could copy the entire traj into np array as it goes, then create universe with traj from array, but that would defeat the purpose I think
And you can use multiprocessing/parallelism on a single frame / sliding window of frames, I know @HeydenLabASU has tried some windowing and then using parallel algorithms to process an i.e. 200 frame window
I guess maybe this should be changed to say that it only works with the serial analysis backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would defeat the purpose I think
I guess you could "buffer"/store a few frames in an array that way without doing IO, and possibly only for a select subset of particles, which does seem within the remit/spirit of what IMD enables, or possibly even one of the more interesting use cases. I think that's what you're saying Matthias is doing.
I might guess that accessing multiple frames in parallel is a problem for the same reason that random access is not available, and that some analyses that require multiple passes and divide work over random-access frame access are also banned, so yeah just trying to think of cases that block multiprocessing usage that aren't already covered by the elementary limitations mentioned. Maybe the multiple clients in different processes thing, but I think you also already ruled them out in comments above, whether in different processes or not.
The following PR address Issue #427 and adds UserGuide documentation for the
IMDReader
viaimd.rst
and a tutorial/examplesimulation_imd.ipynb
.The template for the documentation followed is deatiled in the original Issue #427
📚 Documentation preview 📚: https://mdanalysisuserguide--430.org.readthedocs.build/en/430/