Skip to content

Unexpected high overhead in threading.run when profiling with Scalene (even for simple scripts) #924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MotingLiuu opened this issue Jun 2, 2025 · 0 comments

Comments

@MotingLiuu
Copy link

While analyzing a package I wrote myself using --profile-all, I noticed that the threading module had significant amount of overhead. (I’m still not very familiar with threads, so please forgive me if my question is naive.)

I wrote a simple loop script and found that as along as I enabled --profile-all, it showed heavy usage under threading.run. That portion of overhead are likely attributed to Python part without using --profile-all.

for i in range(100000000):
    pass
PYTHONPATH=. scalene --cli --profile-all --reduced-profile --outfile "$REPORT_PATH" "$SCRIPT_PATH"
                                                     /home/bigorange/miniconda3/envs/huggingface/lib/python3.11/threading.py: % of time =  66.67% (3.023s) out of 4.534s.                                                      
       ╷       ╷       ╷      ╷       ╷       ╷       ╷               ╷       ╷                                                                                                                                                
       │Time   │–––––– │––––… │–––––– │Memory │–––––– │–––––––––––    │Copy   │                                                                                                                                                
  Line │Python │native │syst… │GPU    │Python │peak   │timeline/%     │(MB/s) │/home/bigorange/miniconda3/envs/huggingface/lib/python3.11/threading.py                                                                         
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │      │       │       │       │               │       │                                                                                                                                                
   982 │       │   64% │   2% │       │       │       │               │       │                self._target(*self._args, **self._kwargs)                                                                                       
   ... │       │       │      │       │       │       │               │       │                                                                                                                                                
       │       │       │      │       │       │       │               │       │                                                                                                                                                
╶──────┼───────┼───────┼──────┼───────┼───────┼───────┼───────────────┼───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╴
       │       │       │      │       │       │       │               │       │function summary for /home/bigorange/miniconda3/envs/huggingface/lib/python3.11/threading.py                                                    
   971 │       │   64% │   2% │       │       │       │               │       │Thread.run                                                                                                                                      
       ╵       ╵       ╵      ╵       ╵       ╵       ╵               ╵       ╵                                                                                                                                                
                                                          /home/bigorange/projects/CS336/experiments/scripts/scalene1.py: % of time =  33.33% (1.511s) out of 4.534s.                                                          
       ╷       ╷       ╷      ╷       ╷       ╷       ╷               ╷       ╷                                                                                                                                                
       │Time   │–––––– │––––… │–––––– │Memory │–––––– │–––––––––––    │Copy   │                                                                                                                                                
  Line │Python │native │syst… │GPU    │Python │peak   │timeline/%     │(MB/s) │/home/bigorange/projects/CS336/experiments/scripts/scalene1.py                                                                                  
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
   ... │       │       │      │       │       │       │               │       │                                                                                                                                                
     3 │   31% │    1% │   1% │       │       │       │               │       │    pass                                                                                                                                        
   ... │       │       │      │       │       │       │               │       │                                                                                                                                                
       ╵       ╵       ╵      ╵       ╵       ╵       ╵               ╵       ╵                                                                                                                                                

I'm really confused—does this overhead come from Scalene's profile itself, or is it from my own program (since I use multiprocessing in my package)?

The total runtime is about 4.699 seconds using --profile-all. According to the report, the Python part takes up about 30% of the time (around 1.55 seconds). However, when I run the script by itself (without profiling), it takes about 2 seconds to complete. I am confused about this difference in execution time.

OS: Ubuntu 22.04.5 LTS
Python: Python 3.11.11
Scalene: Scalene version 1.5.51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant