Skip to content

Conversation

amruthesht
Copy link

@amruthesht amruthesht commented Oct 10, 2025

The following PR address Issue #427 and adds UserGuide documentation for the IMDReader via imd.rst and a tutorial/example simulation_imd.ipynb.

The template for the documentation followed is deatiled in the original Issue #427


📚 Documentation preview 📚: https://mdanalysisuserguide--430.org.readthedocs.build/en/430/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive UserGuide documentation for the IMDReader class, enabling users to perform real-time streaming analysis of molecular dynamics simulations using the Interactive Molecular Dynamics (IMD) v3 protocol. The documentation follows the template specified in Issue #427.

  • Adds complete documentation for IMD streaming including setup, usage, and limitations
  • Provides a comprehensive tutorial notebook with practical examples for different MD engines
  • Integrates the new documentation into the existing documentation structure

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
doc/source/formats/reference/imd.rst Complete IMD format documentation with setup instructions, usage examples, and API reference
doc/source/examples/other/streaming_imd.ipynb Tutorial notebook demonstrating real-time streaming analysis with practical examples
doc/source/examples/other/README.rst Updated to include the new streaming_imd tutorial in the documentation index

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

MD Engine Configuration
-----------------------

We provide below example configurations for enabling IMDv3 streaming in popular MD engines.
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected grammar: "We provide example configurations below for enabling IMDv3 streaming in popular MD engines."

Suggested change
We provide below example configurations for enabling IMDv3 streaming in popular MD engines.
We provide example configurations below for enabling IMDv3 streaming in popular MD engines.

Copilot uses AI. Check for mistakes.

Connection Management
---------------------

Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks et al.:
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "et al." is typically used for citing multiple authors in academic references, not for listing examples. Consider using "and similar environments" or "and other interactive environments" instead.

Suggested change
Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks et al.:
Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks and other interactive environments:

Copilot uses AI. Check for mistakes.

"### GROMACS Setup\n",
"\n",
"Add these comprehensive IMD settings to your `.mdp` file:\n",
"```code\n",
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block language identifier should be a valid language (e.g., 'bash', 'text', or 'ini') instead of 'code' for proper syntax highlighting in the documentation.

Copilot uses AI. Check for mistakes.

"### LAMMPS Setup\n",
"\n",
"Use the comprehensive IMD fix in your input script:\n",
"```code\n",
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block language identifier should be a valid language (e.g., 'bash', 'text', or 'lammps') instead of 'code' for proper syntax highlighting in the documentation.

Suggested change
"```code\n",
"```lammps\n",

Copilot uses AI. Check for mistakes.

"### NAMD Setup\n",
"\n",
"Add comprehensive IMD configuration to your NAMD configuration file:\n",
"```code\n",
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code block language identifier should be a valid language (e.g., 'bash', 'text', or 'tcl') instead of 'code' for proper syntax highlighting in the documentation.

Copilot uses AI. Check for mistakes.

@orbeckst
Copy link
Member

@ljwoods2 could you also have a quick look please?

@orbeckst orbeckst self-assigned this Oct 10, 2025
@@ -0,0 +1,325 @@
{
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    # !pip install imdclient>=0.2.2

Don't we want people to install with Conda for consistency?

Also, will people be running this notebook, or reading it as a guide? I think this cell can potentially be removed, seems like a bit of overkill


Reply via ReviewNB

@@ -0,0 +1,325 @@
{
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is IMD-coords essential for GROMACS? Will the stream fail out without it? Or do you just mean its essential for this example?

Maybe it's worth specifying exactly what's necessary for this example: you need to have coordinates turned on regardless of simulation engine, you need to have freq set to 1, and you need to turn off coordinate unwrapping (so that there are consistent configs across engines)

Also, should this link the docs for each of the simulation engines?


Reply via ReviewNB

@@ -0,0 +1,325 @@
{
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #42.    # simple_imd_analysis()

Why is this commented out? Will github run it automatically as a part of docstring tests if it's uncommented?


Reply via ReviewNB

@@ -0,0 +1,325 @@
{
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #41.                potential_energy = -1000 + np.random.normal(0, 50)

Not all simulation engines implement energy packet, so maybe remove this?


Reply via ReviewNB

@@ -0,0 +1,325 @@
{
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #51.                    ax1.clear()

Just double checking, did you test this method of live updating plot locally? Looks a bit different than what we did in workshop

https://github.com/MDAnalysis/imd-workshop-2024/blob/main/activity/graph_utils.py


Reply via ReviewNB

Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data.

.. note::
MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and allows gaps in the data stream.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and allows gaps in the data stream.
MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and doesn't enforce a consistent number of integration steps between transmitted frames


Streaming involves processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means sending simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk.

This can be achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This can be achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol.
In IMDv3, this is achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information.


.. include:: classes/IMD.txt

Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data.
IMDv2 and IMDv3 enable real-time streaming of simulation data between molecular dynamics engines and receiving clients. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol.


.. code-block:: bash

pip install imdclient

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we encourage conda/mamba installation

print(f"WARNING: Large displacement detected at {ts.time} ps: {max_displacement:.2f} Å")

# Monitor energies if available
if 'potential' in ts.data:
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


* ``dt``: Time step size in picoseconds
* ``step``: Current simulation step number
* Energy terms: ``potential``, ``total``, etc. (engine-dependent)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these keys are wrong, see:

https://github.com/Becksteinlab/imdclient/blob/07069e1ff08d1488936c61e52f922116ecddf210/imdclient/IMDClient.py#L956

I think we actually merged them into MDA develop docstring for IMDReader incorrectly:

https://imdclient.readthedocs.io/en/latest/protocol_v3.html#energies

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are these engine-dependent? It's engine depended whether the energy packet is implemented, but the exact energy keys should be consistent whenever the packet is implemented, right?

if key not in ['dt', 'step']:
print(f" {key}: {value}")

Multiple Client Connections
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a ton of edge cases when using multiple clients that we don't test, like when one client tries to change blocking behavior, so I think we should just say it isn't recommended

Discussion here

Becksteinlab/imdclient#114

Maybe should update MDA main codebase docs to say this as well:

https://github.com/amruthesht/mdanalysis/blob/ea85bb510f684ef1f6048ad0a50219665298cbb3/package/MDAnalysis/coordinates/IMD.py#L108


Most MDAnalysis analysis classes work with streaming data, but some limitations apply:

**Compatible Analysis**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should start by outlining what kinds of analyses will and won't work before jumping into distance and contact analysis

analysis_data.append(calculate_something(ts))

# This will NOT work - random access
ts = u.trajectory[10] # TypeError
Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading docstring of __getitem__ of StreamReaderBase, I think these should both be ValueErrors

def __getitem__(self, frame):
        """Return an iterator for slicing a streaming trajectory.

        Parameters
        ----------
        frame : slice
            Slice object. Only the step parameter is meaningful for streams.

        Returns
        -------
        FrameIteratorAll or StreamFrameIteratorSliced
            Iterator for the requested slice.

        Raises
        ------
        TypeError
            If frame is not a slice object.
        ValueError
            If slice contains start or stop values.

If it does throw a TypeError, something is wrong...

# This will NOT work - restarting iteration
for ts in u.trajectory:
break
for ts in u.trajectory: # Won't start from beginning

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggest it will start from somewhere else other than the beginning, won't it just fail?

print(f"Time: {ts.time:.2f} ps, Step: {ts.data.get('step', 'N/A')}")

# Your analysis code here
selected_atoms = u.select_atoms("protein and name CA")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify -- is there something about IMD that mandates repeating the selection of the AtomGroup in the loop, or can this be hoisted up for efficiency as usual?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing unique about IMD here, can be changed

print(f"Protein COM: {center_of_mass}")

# Optional: break on some condition
if ts.time > 1000: # Stop after 1000 ps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify: this only stops the analysis, not the running simulation/engine the analysis is connected to?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this break is the end of the script, it will trigger the IMDClient to cleanup via atext

https://github.com/Becksteinlab/imdclient/blob/07069e1ff08d1488936c61e52f922116ecddf210/imdclient/IMDClient.py#L146

That cleanup might put the simulation engine in a waiting state for another client connection, but won't kill it.

Whether it makes the simulation engine wait or not depends on 1. how the simulation engine was configured and 2. whether the IMDClient overrode the simulation engine's waiting behavior

continue_after_disconnect kwarg is the IMDClient switch for changing waiting behavior

https://imdclient.readthedocs.io/en/latest/api.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the need to look out for this/think about this might be sensible to mention somewhere inline here then, if only to refer to more detailed docs for someone who may be copy/pasting the sample code as a template.


# Contact analysis
selection1 = u.select_atoms("resid 1-10")
selection2 = u.select_atoms("resid 50-60")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoist selections if IMD allows it?


* **No timeseries methods**: Cannot use ``trajectory.timeseries()``
* **No bulk operations**: Cannot extract all data at once
* **Limited multiprocessing**: Cannot split across processes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concrete example of something you can do with multiprocessing with "vanilla" MDAnalysis that doesn't work with IMD? I would have figured that if you can get data into a NumPy array you're probably "ok" to split off additional processes if you want to.

Copy link

@ljwoods2 ljwoods2 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By vanilla MDA, does that exclude analysis classes with dask/multiprocessing backend?

You could copy the entire traj into np array as it goes, then create universe with traj from array, but that would defeat the purpose I think

And you can use multiprocessing/parallelism on a single frame / sliding window of frames, I know @HeydenLabASU has tried some windowing and then using parallel algorithms to process an i.e. 200 frame window

I guess maybe this should be changed to say that it only works with the serial analysis backend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would defeat the purpose I think

I guess you could "buffer"/store a few frames in an array that way without doing IO, and possibly only for a select subset of particles, which does seem within the remit/spirit of what IMD enables, or possibly even one of the more interesting use cases. I think that's what you're saying Matthias is doing.

I might guess that accessing multiple frames in parallel is a problem for the same reason that random access is not available, and that some analyses that require multiple passes and divide work over random-access frame access are also banned, so yeah just trying to think of cases that block multiprocessing usage that aren't already covered by the elementary limitations mentioned. Maybe the multiple clients in different processes thing, but I think you also already ruled them out in comments above, whether in different processes or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add UserGuide documentation for IMDReader - reading streaming data with IMDv3 protocol

4 participants