bendichter · bendichter · May 22, 2025 · May 22, 2025 · May 22, 2025 · May 22, 2025
diff --git a/content/_index.md b/content/_index.md
@@ -0,0 +1,34 @@
+---
+title: "About me"
+---
+
+I am a Research Software Engineer and the Founder of [CatalystNeuro](http://catalystneuro.com), where I work to transform how neuroscience labs collaborate and share data.
+
+## Vision & Work
+
+At CatalystNeuro, we're revolutionizing neuroscience collaboration through better data standardization and tool sharing. Our work focuses on:
+
+- Developing standardized data formats for neuroscience
+- Creating tools for seamless data sharing between labs
+- Building bridges between different analysis platforms
+- Consulting with labs to optimize their data workflows
+
+We believe the future of neuroscience lies in open collaboration, and we're actively shaping how data and tools are shared across the international neuroscience community.
+
+## Background
+
+I received my Ph.D. in Bioengineering from the [UC Berkeley - UCSF Joint Program in Bioengineering](http://bioegrad.berkeley.edu/), working in [Dr. Edward Chang's lab](http://changlab.ucsf.edu/). My research focused on using electrocorticography (ECoG) to understand speech control in humans, particularly the neural mechanisms of voice pitch control in speaking and singing.
+
+During my undergraduate years at the University of Pittsburgh's [SMILE lab](https://smile.pitt.edu/) under [Dr. Aaron Batista](https://www.engineering.pitt.edu/AaronBatista/), I developed probabilistic models of neural activity. This early work shaped my approach to neural data analysis and eventually led to my interest in standardizing data practices across the field.
+
+## Alternative Paths in Science
+
+I'm passionate about exploring non-traditional careers in science. Through CatalystNeuro, I've found a way to contribute to neuroscience beyond the conventional academic path. I work with a talented team of neuroscientists and software developers who share this vision.
+
+If you're interested in exploring alternative careers in science or want to learn about different paths, feel free to reach out. I'm always happy to share experiences and discuss possibilities.
+
+## Beyond the Lab
+
+When not working on neuroscience data, I enjoy:
+- Dancing West Coast Swing
+- Traveling and exploring new cultures
diff --git a/content/cv.md b/content/cv.md
@@ -0,0 +1,5 @@
+---
+title: "CV"
+---
+
+<embed src="../files/BenDichterCV.pdf" width="800px" height="2100px" />
diff --git a/content/portfolio/_index.md b/content/portfolio/_index.md
@@ -0,0 +1,6 @@
+---
+title: "Portfolio"
+layout: list
+---
+
+Project portfolio and featured work.
diff --git a/content/portfolio/portfolio-1.md b/content/portfolio/portfolio-1.md
@@ -0,0 +1,4 @@
+---
+title: "Broken axes"
+excerpt: "Package for creating broken axis plots in matplotlib<br/><img src='/images/500x300.png'>"
+---
diff --git a/content/portfolio/portfolio-2.md b/content/portfolio/portfolio-2.md
@@ -0,0 +1,6 @@
+---
+title: "Portfolio item number 2"
+excerpt: "Short description of portfolio item number 2 <br/><img src='/images/500x300.png'>"
+---
+
+This is an item in your portfolio. It can be have images or nice text. If you name the file .md, it will be parsed as markdown. If you name the file .html, it will be parsed as HTML.
diff --git a/content/posts/2018-03-29-tenseflow.md b/content/posts/2018-03-29-tenseflow.md
@@ -0,0 +1,41 @@
+---
+title: 'tenseflow'
+date: 2018-03-29
+tags:
+  - python
+---
+<img width="400" src="https://github.com/bendichter/tenseflow/blob/master/static/screenshot.png?raw=true" title="tenseflow app" alt="tenseflow app"/>
+
+
+I was frustrated while changing the tense of a document, and decided to go down the deep dark rabbit hole of creating an
+ automatic tense changer. The basic usage is:
+
+ ```python
+from tenseflow import change_tense
+
+change_tense('I will go to the store.', 'past')
+u'I went to the store.'
+```
+
+Little did I know, this is a really tough task, for a few reasons. For anyone who wants to venture down this path,
+here are a few of the finer points you'll need to deal with:
+1. Identifying verbs is harder than it looks. For instance, take the word "<u>vacuum</u>." This word could be used as a noun,
+("Please hand me the <u>vacuum</u>.") or verb ("Please <u>vacuum</u> the dining room.") Vacuum is not a special word-
+in fact if you think about it, **most** verbs in the English language can be used as nouns and **most** nouns can be used as verbs.
+If you blindly convert any word that could be a verb, you'll get nonsense like "Please hand me the <u>vacuumed</u>."
+Therefore, in order to properly tense-alter a passage, you need to first parse the sentence to determine what words are
+are being used as verbs. You also need to parse their role in the sentence. For instance, infinitives do not change with
+tense. (We don't want e.g. "You asked me to <u>vacuumed</u>)".
+2. Once you have identified which word you want to change, there are so many irregular verbs and special rules, you
+really need an entire dictionary to do this properly.
+3. There are more tenses in English than you might realize. Common wisdom is that we have 3: past, present, and future.
+In fact, there are 12, and each of them has three modes: affirmative, negative, and interrogative.
+
+<img width="400" src="https://lessonsforenglish.com/wp-content/uploads/2019/12/12-Tenses-Formula-With-Examples.png" title="table of tenses" alt="table of tenses"/>
+
+4. There are all sorts of cases where you would want to have multiple tenses in the same sentence, and there isn't really
+a good way to infer this automatically.
+
+
+Despite these obstacles, I managed to make a tool that works... OK. It comes with a web-app.
+Check it out on GitHub [here](https://github.com/bendichter/tenseflow).
diff --git a/content/posts/2018-04-23-osx-jupyter-launcher.md b/content/posts/2018-04-23-osx-jupyter-launcher.md
@@ -0,0 +1,33 @@
+---
+title: 'OSX Jupyter Launcher'
+date: 2018-04-23
+tags:
+  - Jupyter
+  - OSX
+  - python
+---
+
+If you use Jupyter on a regular basis, the steps to launch a notebook are probably second nature, but if you take a step back, it involves a lot of prior knowledge. A few times I've tried to bring brand new eager programmers into the glorious land of Python and Jupyter, but each time I found that the whole flow was really bogged down by this preamble that is pretty technical. I'll give them an .ipynb file and then show them how to open it
+
+1. Open Terminal (What's Terminal? It looks scary.) 
+2. Use `cd` to navigate to where you want.
+3. Now run this special command...
+and **finally** you are in the user-friendly land of Jupyter.
+
+Now of course all of these skills are useful, and necessary eventually, but it really bogs down the first lesson in minutia and inevitably leaves the student feeling a bit overwhelmed. There must be a better way! One solution is to set your student up with Jupyter Hub. They'll just need to click a link and they'll be up and running in no time! This is a great solution for a lot of cases, but it requires the instructor to set up a server and the student to have internet access, so this doesn't fit all cases. "Why can't I just double-click the notebook?" the student will ask (or be too embarrassed to ask). Well... um... why can't you? Now you can. Here's how.
+
+[Download me!](../../files/run_jupyter_notebook.zip) and double-click to unpack and drag to Applications or where ever you want to keep it.
+
+Navigate to a notebook in Finder, right-click and choose "Get Info", then expand "Open with:" choose "Other..." from the dropdown menu. Now navigate to and select run_jupyter_notebook. Now select "Change All..." 
+
+<img width="200" src="../../images/run_jupyter_notebook.png" title="change jupyter notebook settings" alt="change notebook settings"/>
+
+Now you can double-click your notebooks to start them!
+
+## Caveats
+
+* This only works on Macs right now (sorry Windows. Linux users, y'all chose this life.)
+* Every time you double-click, it opens a new Terminal window.
+* This won't run a notebook in a virtual or conda environment.
+
+You can still open notebooks the normal way if you need to have more control over how the notebook is launched.
diff --git a/content/posts/2020-07-12-brokenaxes.md b/content/posts/2020-07-12-brokenaxes.md
@@ -0,0 +1,17 @@
+---
+title: 'brokenaxes'
+date: 2020-07-12
+tags:
+  - matplotlib
+  - python
+---
+
+<img width="200" src="https://raw.githubusercontent.com/bendichter/brokenaxes/master/broken_python_snake.png" title="broken python snake" alt="broken python snake"/>
+
+I created a Python package for creating broken axes plots like this one:
+
+<img width="400" src="https://raw.githubusercontent.com/bendichter/brokenaxes/master/example2.png" title="brokenaxes example" alt="brokenaxes example"/>
+
+You can create discontinuities along the x and/or y axis.
+It also has compatibility for a number of other useful features like subplots and non-standard axes like log and datetime.
+Check out the documentation with plenty of examples on the [GitHub repo](https://github.com/bendichter/brokenaxes).
diff --git a/content/posts/2022-07-17-spiral-plot.md b/content/posts/2022-07-17-spiral-plot.md
@@ -0,0 +1,118 @@
+---
+title: 'Spiral Plot'
+date: 2022-06-17
+tags:
+- matplotlib
+- python
+---
+
+Let's use for example Google Trends results for the search term "gifts."
+Google offers this plot:
+
+![gifts-google-trends-plot](../../images/google_trends_gifts.png)
+
+It should be no surprise that these results show a cyclical trend. It looks 
+like this might be an annual cycle with the max around 
+Christmas time. It can be hard to create visualizations that bring out
+this cyclic pattern. Stacking years on top of each other will require
+you to break the year at a certain point, breaking continuous data and
+potentially creating the impression of two different spikes when there
+is really just one.
+
+I have created way to plot cyclic that I call a "spiral plot."
+The data starts at the center of a circle and proceeds out in a spiral.
+Each year of time forms a ring around the spiral so that a given angle 
+of the circle has data from the same time of year on every loop. Here
+is the google trend for "gifts" shown as a spiral plot:
+
+![gifts spiral plot with no donut](../../images/spiral_plot_no_donut.png)
+
+This plot is more compact than the line version and may highlight some trends
+more clearly. The drawback of this approach is that earlier years are smaller
+than more recent years. You can make this less dramatic by giving the circle an
+empty center (setting `origin=-2`).
+
+![gifts spiral plot](../../images/spiral_plot_w_donut.png)
+
+Code:
+
+```python
+from typing import Optional
+
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.collections import PatchCollection
+from matplotlib.patches import Polygon
+
+def spiral_plot(
+    data,
+    num_cycles: int,
+    num_points_per_seg: int = 100,
+    angle: float = 0.,
+    origin: float = 0.,
+    cmap=None,
+    show_legend: bool = True,
+    ax: Optional[plt.Axes] = None
+):
+
+    if ax is None:
+        _, ax = plt.subplots(subplot_kw={'projection': 'polar'})
+
+    n_segments = len(data)
+    num_points = num_points_per_seg * n_segments
+
+    inner_rs = np.linspace(0, num_cycles, num_points)
+    outer_rs = inner_rs + 1
+    thetas = np.linspace(0, 2*np.pi*num_cycles, num_points) + angle
+
+    patches = []
+    for i in range(n_segments):
+        tt = np.hstack(
+            (
+                thetas[i*num_points_per_seg:(i+1)*num_points_per_seg],
+                thetas[i*num_points_per_seg:(i+1)*num_points_per_seg][::-1]
+            )
+        )
+        rr = np.hstack(
+            (
+                inner_rs[i*num_points_per_seg:(i+1)*num_points_per_seg],
+                outer_rs[i*num_points_per_seg:(i+1)*num_points_per_seg][::-1]
+            )
+        )
+        patch = Polygon(np.c_[tt, rr])
+        patches.append(patch)
+
+    patches = PatchCollection(patches, cmap=cmap)
+    patches.set_array(data)
+    ax.add_collection(patches)
+
+    ax.set_rlim((None, num_cycles+1))
+    ax.grid(False)
+
+    ax.set_rorigin(origin)
+
+    if show_legend:
+        ax.figure.colorbar(patches, shrink=0.6)
+
+    ax.spines.polar.set_visible(False)
+    ax.spines.inner.set_visible(False)
+
+    return ax, patches
+
+# Example usage
+import pandas as pd
+
+# data from any google trend
+fpath = "multiTimeline.csv"
+trend = "gifts"
+
+data = pd.read_csv(fpath, header=1)[f"{trend}: (United States)"].values
+
+ax, patches = spiral_plot(data, 5, angle=2*np.pi*7/12)
+
+# make it prettier
+ax.set_xticklabels(["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"])
+ax.set_xticks([2*np.pi*i/12 for i in range(0, 12)])
+ax.tick_params(axis='x', which='major', pad=-5)
+ax.tick_params(axis='y', colors='white')
+```
diff --git a/content/posts/2022-07-18-git-timesheet.md b/content/posts/2022-07-18-git-timesheet.md
@@ -0,0 +1,116 @@
+---
+title: 'Git Timesheet'
+date: 2022-07-18
+tags:
+- matplotlib
+- python
+---
+
+I recently faced a situation where I needed to assess the amount of work
+done by each member of a team on a project that has spanned over a year.
+That project has a git repo, and I could see when each person made a commit.
+I decided to break it down by weeks. Whenever a person submitted any commit
+to the repo on any branch, I counted them as working on the project for that week.
+Of course this is imperfect- someone could work a lot and make no commits for
+that week and someone could have submitted a commit but might have worked very
+little. Still, this seems like the most fair way to assess work I could think of.
+
+
+The code will work on any locally cloned git repo. `skip` allows you to remove
+contributors, and is ideal for handling bots. `author_map` allows you to tranform
+handles. This is ideal if members of your team make some contributions through PRs
+from a clone of the repo and some of their PRs through GitHub directly, or if
+they have multiple usernames.
+
+```python
+import os
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import tqdm
+import datetime
+import matplotlib
+
+def git_timesheet(git_dir, skip=None, author_map=None):
+
+    if skip is None:
+        skip = [
+            "dependabot[bot]",
+            "!git for-each-ref --format='%(refname:short)' `git symbolic-ref HEAD`",
+        ]
+
+    if author_map is None:
+        author_map = dict()
+
+    os.system(f"git --git-dir {git_dir}/.git log --all --numstat --pretty=format:'--%h--%ad--%aN' --no-renames > git.log")
+
+    commits = pd.read_csv("git.log", sep="\u0012", header=None, names=['raw'])
+
+    commit_marker = commits[commits['raw'].str.startswith("--",na=False)]
+    commit_info = commit_marker['raw'].str.extract(r"^--(?P<sha>.*?)--(?P<date>.*?)--(?P<author>.*?)$", expand=True)
+    commit_info['date'] = pd.to_datetime(commit_info['date'])
+
+    file_stats_marker = commits[~commits.index.isin(commit_info.index)]
+    file_stats = file_stats_marker['raw'].str.split("\t", expand=True)
+    file_stats = file_stats.rename(columns={0: "insertions", 1: "deletions", 2: "filename"})
+    file_stats['insertions'] = pd.to_numeric(file_stats['insertions'], errors='coerce')
+    file_stats['deletions'] = pd.to_numeric(file_stats['deletions'], errors='coerce')
+
+    commit_data = commit_info.reindex(commits.index).fillna(method="ffill")
+    commit_data = commit_data[~commit_data.index.isin(commit_info.index)]
+    commit_data = commit_data.join(file_stats)
+
+    # get total authors and weeks
+    all_authors = commit_data["author"].unique()
+    all_authors = list(np.unique([author_map.get(x, x) for x in all_authors if x not in skip]))
+
+    dates = commit_data["date"]
+    start = dates.min()
+    stop = dates.max()
+
+    n_weeks = (stop-start).days // 7
+
+    timesheet = np.zeros((len(all_authors), n_weeks))
+
+    # iterate over commits and timesheet per week
+    for week_n in tqdm.trange(n_weeks):
+        week_start = start + datetime.timedelta(7 * (week_n-1))
+        week_stop = start + datetime.timedelta(7 * week_n)
+        commit_data_for_week = commit_data[(week_start < commit_data["date"]) & (commit_data["date"] < week_stop)]
+        authors_for_week = commit_data_for_week["author"].unique()
+        # handle different usernames
+        authors_for_week = list(np.unique([author_map.get(x, x) for x in authors_for_week]))
+        for i, author in enumerate(all_authors):
+            if author in authors_for_week:
+                timesheet[i, week_n] = 1
+
+    fig, ax = plt.subplots(figsize=(30, 10))
+    ax.imshow(timesheet, cmap="Greys")
+    ax.set_yticks(range(len(all_authors)))
+    _ = ax.set_yticklabels(all_authors)
+    ax.set_xlabel("weeks")
+
+    plt.minorticks_on()
+    plt.gca().xaxis.set_minor_locator(matplotlib.ticker.MultipleLocator(1))
+    plt.gca().yaxis.set_minor_locator(matplotlib.ticker.MultipleLocator(1))
+    plt.grid(which="both", linewidth=0.25, color="k")
+```
+I developed a function for parsing the git log and creating a visualization
+of weeks worked by each member. The repo I used this for is private, so I will
+demonstrate it on a separate repo from CatalystNeuro that is public.
+```python
+git_timesheet(
+    "path/to/nwb-conversion-tools",
+    author_map={
+        "bendichter": "Ben Dichter",
+        "luiz": "Luiz Tauffer",
+        "luiztauffer": "Luiz Tauffer",
+        "CodyCBakerPhD": "Cody Baker",
+        "h-mayorquin": "Heberto Mayorquin",
+        "sbuergers": "Steffen Bürgers",
+        "weiglszonja": "Szonja Weigl",
+    }
+)
+```
+
+![git-timesheet](../../images/git-timesheet.png)