Skip to content

Request to ensure event processing order in parallel RDF execution #18014

Open
@acampove

Description

@acampove

Check duplicate issues.

  • Checked for duplicates

Description

It seems that AsNumpy (and prossibly other functions) process data in random and inconsistent orders when using multithreading.

Reproducer

from ROOT import EnableImplicitMT, RDataFrame

EnableImplicitMT(10)

rdf=RDataFrame(100_000)
rdf=rdf.Define('a', 'rdfentry_')
arr_dat = rdf.AsNumpy(['a'])['a']

print(arr_dat[:10])
print(arr_dat[10:])

The code above shows me:

(rx_data) [acampove@thinkbook enableMT]$ python test.py
[65000 65001 65002 65003 65004 65005 65006 65007 65008 65009]
[65010 65011 65012 ... 99997 99998 99999]
(rx_data) [acampove@thinkbook enableMT]$
(rx_data) [acampove@thinkbook enableMT]$
(rx_data) [acampove@thinkbook enableMT]$ python test.py
[90000 90001 90002 90003 90004 90005 90006 90007 90008 90009]
[90010 90011 90012 ... 24997 24998 24999]

I expect to see the first 10and the last 10 digits of the 0-100K sequence every time

ROOT version


| Welcome to ROOT 6.32.10 https://root.cern |
| (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
| Built for linuxx8664gcc on Feb 12 2025, 01:47:45 |
| From tags/6-32-10@6-32-10 |
| With |
| Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q' |

Installation method

micromamba

Operating system

Almalinux 9

Additional context

This is a very dangerous bug and needs to be fixed ASAP.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions