Skip to content

function to apply #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lucasdicioccio
Copy link

Correct me if I'm wrong :).

Current API allows to use some_cursor >>= rest to get a full list of documents.
If we want to run some action on every document, we can then use rest >>= mapM_ f .
A drawback of this approach is that rest must read all documents before returning. Indeed, rest must handle the case where there is an error before finishing to read all documents from the cursor. My understanding is that given the types prevent rest to return both an error and a list of documents read before the error occured.

This behavior leaves a gap to apply a function as soon as the documents get read from the cursor (e.g., when the query returns a large number of documents and we apply a streaming algorithm on the documents). I think there should be a iterateCursor function inside the mongoDB package to handle this use case. This is my pull request. The function takes a cursor, an initial state, and a function in the Action monad that takes a document, a state and return an updated state.

My intuition is that there may be a way to write an instance for Traversable Cursor (I'm not yet strong enough in Haskell to realize whether it is useful/possible).

Cheers <3

iterateCursor :: (MonadIO m, MonadBaseControl IO m, Functor m) => (Document -> a -> Action m a) -> a -> Cursor -> Action m a
-- ^ iteratively runs an action while consuming documents
iterateCursor f st0 c = next c >>= maybe (closeCursor c >> return st0) go
where go doc = f doc st0 >>= (\st1 -> iterateCursor f st1 c)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a potential stack overflow to me, st1 should be strict. Also, this solution seems to have non-deterministic behaviour, inherent to lazy IO. For instance, what if for some reason the consumer didn't read until the end, when will the cursor be closed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use 'next' to iterate or 'rest' to get all results. A cursor is closed when it is explicitly closed, all results have been read from it, garbage collected, or not used for over 10 minutes". It is already possible to shoot yourself in the foot with current API with "next". But using "next" by hand is a pain in the neck when processing a large stream of docs. That's why I think the API should provide a smart function like this one. My attempt may be broken but I'm willing to improve on it :). Making st1 strict is definitely a good idea because one may expect "imperative" style from this function. I'll look at gracefully handling errors happening in the Action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants