Skip to content

Conversation

dwinston
Copy link
Collaborator

@dwinston dwinston commented Sep 22, 2025

On this branch, I

  • configure a logging system using logging.config.dictConfig and using third-party middleware to inject correlation IDs into log entries (to correlate same-request async operations for load testing and for effective debugging of deployments) (96080a9)
  • ...

Details

...

Related issue(s)

Closes #787

Related subsystem(s)

  • Runtime API (except the Minter)
  • Minter
  • Dagster
  • Project documentation (in the docs directory)
  • Translators (metadata ingest pipelines)
  • MongoDB migrations
  • Other

Testing

  • I tested these changes (explain below)
  • I did not test these changes

I tested these changes by...

Documentation

  • I have not checked for relevant documentation yet (e.g. in the docs directory)
  • I have updated all relevant documentation so it will remain accurate
  • Other (explain below)

Maintainability

  • Every Python function I defined includes a docstring (test functions are exempt from this)
  • Every Python function parameter I introduced includes a type hint (e.g. study_id: str)
  • All "to do" or "fix me" Python comments I added begin with either # TODO or # FIXME
  • I used black to format all the Python files I created/modified
  • The PR title is in the imperative mood (e.g. "Do X") and not the declarative mood (e.g. "Does X" or "Did X")

supply correlation ID to logging formatter via third-party package
@dwinston
Copy link
Collaborator Author

Basing my approach here on https://www.mongodb.com/docs/languages/python/pymongo-driver/current/reference/migration/#migrate-from-pymongo:

Migrate from PyMongo

The PyMongo Async API behaves similarly to PyMongo, but all methods that perform network operations are coroutines and must be awaited. To migrate from PyMongo to PyMongo Async, you must update your code in the following ways:

  • Replace all uses of MongoClient with AsyncMongoClient.

  • Add the await keyword to all asynchronous method calls.

  • If you call an asynchronous method inside a function, mark the function as async.

Keep the following points in mind when migrating from synchronous PyMongo to the PyMongo Async API:

  • To convert an AsyncCursor to a list, you must use the asynchronous cursor.to_list() method.

  • The AsyncCollection.find() method in the PyMongo Async API is synchronous, but returns an AsyncCursor. To iterate through the cursor, you must use an async for loop.

  • The AsyncMongoClient object does not support the connect keyword argument.

  • You cannot share AsyncMongoClient objects across threads or event loops.

  • To access a property or method of a result returned by an asynchronous call, you must properly wrap the call in parentheses, as shown in the following example:

    <script class="structured_data" type="application/ld+json">{"@context":"https://schema.org","@type":"SoftwareSourceCode","codeSampleType":"code snippet","text":"id = (await posts.insert_one(doc)).inserted_id","programmingLanguage":"Python"}</script>
    id = (await posts.insert_one(doc)).inserted_id

@dwinston
Copy link
Collaborator Author

Re: using asyncio with dagster (reference):

When defining an asset or op, you can write an async def, e.g.:

@asset
async def asset1():
    ...

This allows using asyncio for concurrency within the asset / op.

We don't support asyncio for concurrency across asset / ops. Here's an issue where we're tracking the request to add this: dagster-io/dagster#4041. Using aiohttp within an asset/op is a great fit if you need to make N concurrent web fetches for your calculations.

If you are using an async def with the the default multi-process executor you will end up with process-level concurrency across ops / assets and the ability to use asnycio for concurrency within the assets/ops.

/metadata and /nmdcschema endpoints appear to work via manual inspection.
@dwinston
Copy link
Collaborator Author

I pushed a WIP commit. Not expecting tests to pass.

/metadata and /nmdcschema endpoints appear to work via manual inspection.

I've done a lot of work in nmdc_runtime.mongo_util, creating a RuntimeAsyncMongoDatabase class that I'm using to migrate code to async/await. For code that is too big a lift now to migrate to async, I've created an AwaitableSyncMongoDatabase class that is designed to be accessible via RuntimeAsyncMongoDatabase.from_synchronous_database such that one can still migrate to async/await syntax for using a synchronous database, leaving less to change later.

@dwinston
Copy link
Collaborator Author

I've also done work in the nmdc_runtime.api.core.idgen module and its callers to align runtime-internal ID minting better with that of the nmdc_runtime.minter package. This wasn't planned, but was prompted by many duplicate-ID errors during development, making me realize that ID namespacing wasn't properly implemented in order to prevent duplicate-ID collisions, that namespaces could function as typecodes, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore migrating from pymongo MongoClient to AsyncMongoClient

1 participant