Skip to content

Migrate from Jekyll to Hugo #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions content/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: "About me"
---

I am a Research Software Engineer and the Founder of [CatalystNeuro](http://catalystneuro.com), where I work to transform how neuroscience labs collaborate and share data.

## Vision & Work

At CatalystNeuro, we're revolutionizing neuroscience collaboration through better data standardization and tool sharing. Our work focuses on:

- Developing standardized data formats for neuroscience
- Creating tools for seamless data sharing between labs
- Building bridges between different analysis platforms
- Consulting with labs to optimize their data workflows

We believe the future of neuroscience lies in open collaboration, and we're actively shaping how data and tools are shared across the international neuroscience community.

## Background

I received my Ph.D. in Bioengineering from the [UC Berkeley - UCSF Joint Program in Bioengineering](http://bioegrad.berkeley.edu/), working in [Dr. Edward Chang's lab](http://changlab.ucsf.edu/). My research focused on using electrocorticography (ECoG) to understand speech control in humans, particularly the neural mechanisms of voice pitch control in speaking and singing.

During my undergraduate years at the University of Pittsburgh's [SMILE lab](https://smile.pitt.edu/) under [Dr. Aaron Batista](https://www.engineering.pitt.edu/AaronBatista/), I developed probabilistic models of neural activity. This early work shaped my approach to neural data analysis and eventually led to my interest in standardizing data practices across the field.

## Alternative Paths in Science

I'm passionate about exploring non-traditional careers in science. Through CatalystNeuro, I've found a way to contribute to neuroscience beyond the conventional academic path. I work with a talented team of neuroscientists and software developers who share this vision.

If you're interested in exploring alternative careers in science or want to learn about different paths, feel free to reach out. I'm always happy to share experiences and discuss possibilities.

## Beyond the Lab

When not working on neuroscience data, I enjoy:
- Dancing West Coast Swing
- Traveling and exploring new cultures
5 changes: 5 additions & 0 deletions content/cv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: "CV"
---

<embed src="../files/BenDichterCV.pdf" width="800px" height="2100px" />
6 changes: 6 additions & 0 deletions content/portfolio/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: "Portfolio"
layout: list
---

Project portfolio and featured work.
4 changes: 4 additions & 0 deletions content/portfolio/portfolio-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: "Broken axes"
excerpt: "Package for creating broken axis plots in matplotlib<br/><img src='/images/500x300.png'>"
---
6 changes: 6 additions & 0 deletions content/portfolio/portfolio-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
title: "Portfolio item number 2"
excerpt: "Short description of portfolio item number 2 <br/><img src='/images/500x300.png'>"
---

This is an item in your portfolio. It can be have images or nice text. If you name the file .md, it will be parsed as markdown. If you name the file .html, it will be parsed as HTML.
41 changes: 41 additions & 0 deletions content/posts/2018-03-29-tenseflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: 'tenseflow'
date: 2018-03-29
tags:
- python
---
<img width="400" src="https://github.com/bendichter/tenseflow/blob/master/static/screenshot.png?raw=true" title="tenseflow app" alt="tenseflow app"/>


I was frustrated while changing the tense of a document, and decided to go down the deep dark rabbit hole of creating an
automatic tense changer. The basic usage is:

```python
from tenseflow import change_tense

change_tense('I will go to the store.', 'past')
u'I went to the store.'
```

Little did I know, this is a really tough task, for a few reasons. For anyone who wants to venture down this path,
here are a few of the finer points you'll need to deal with:
1. Identifying verbs is harder than it looks. For instance, take the word "<u>vacuum</u>." This word could be used as a noun,
("Please hand me the <u>vacuum</u>.") or verb ("Please <u>vacuum</u> the dining room.") Vacuum is not a special word-
in fact if you think about it, **most** verbs in the English language can be used as nouns and **most** nouns can be used as verbs.
If you blindly convert any word that could be a verb, you'll get nonsense like "Please hand me the <u>vacuumed</u>."
Therefore, in order to properly tense-alter a passage, you need to first parse the sentence to determine what words are
are being used as verbs. You also need to parse their role in the sentence. For instance, infinitives do not change with
tense. (We don't want e.g. "You asked me to <u>vacuumed</u>)".
2. Once you have identified which word you want to change, there are so many irregular verbs and special rules, you
really need an entire dictionary to do this properly.
3. There are more tenses in English than you might realize. Common wisdom is that we have 3: past, present, and future.
In fact, there are 12, and each of them has three modes: affirmative, negative, and interrogative.

<img width="400" src="https://lessonsforenglish.com/wp-content/uploads/2019/12/12-Tenses-Formula-With-Examples.png" title="table of tenses" alt="table of tenses"/>

4. There are all sorts of cases where you would want to have multiple tenses in the same sentence, and there isn't really
a good way to infer this automatically.


Despite these obstacles, I managed to make a tool that works... OK. It comes with a web-app.
Check it out on GitHub [here](https://github.com/bendichter/tenseflow).
33 changes: 33 additions & 0 deletions content/posts/2018-04-23-osx-jupyter-launcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: 'OSX Jupyter Launcher'
date: 2018-04-23
tags:
- Jupyter
- OSX
- python
---

If you use Jupyter on a regular basis, the steps to launch a notebook are probably second nature, but if you take a step back, it involves a lot of prior knowledge. A few times I've tried to bring brand new eager programmers into the glorious land of Python and Jupyter, but each time I found that the whole flow was really bogged down by this preamble that is pretty technical. I'll give them an .ipynb file and then show them how to open it

1. Open Terminal (What's Terminal? It looks scary.)
2. Use `cd` to navigate to where you want.
3. Now run this special command...
and **finally** you are in the user-friendly land of Jupyter.

Now of course all of these skills are useful, and necessary eventually, but it really bogs down the first lesson in minutia and inevitably leaves the student feeling a bit overwhelmed. There must be a better way! One solution is to set your student up with Jupyter Hub. They'll just need to click a link and they'll be up and running in no time! This is a great solution for a lot of cases, but it requires the instructor to set up a server and the student to have internet access, so this doesn't fit all cases. "Why can't I just double-click the notebook?" the student will ask (or be too embarrassed to ask). Well... um... why can't you? Now you can. Here's how.

[Download me!](../../files/run_jupyter_notebook.zip) and double-click to unpack and drag to Applications or where ever you want to keep it.

Navigate to a notebook in Finder, right-click and choose "Get Info", then expand "Open with:" choose "Other..." from the dropdown menu. Now navigate to and select run_jupyter_notebook. Now select "Change All..."

<img width="200" src="../../images/run_jupyter_notebook.png" title="change jupyter notebook settings" alt="change notebook settings"/>

Now you can double-click your notebooks to start them!

## Caveats

* This only works on Macs right now (sorry Windows. Linux users, y'all chose this life.)
* Every time you double-click, it opens a new Terminal window.
* This won't run a notebook in a virtual or conda environment.

You can still open notebooks the normal way if you need to have more control over how the notebook is launched.
17 changes: 17 additions & 0 deletions content/posts/2020-07-12-brokenaxes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: 'brokenaxes'
date: 2020-07-12
tags:
- matplotlib
- python
---

<img width="200" src="https://raw.githubusercontent.com/bendichter/brokenaxes/master/broken_python_snake.png" title="broken python snake" alt="broken python snake"/>

I created a Python package for creating broken axes plots like this one:

<img width="400" src="https://raw.githubusercontent.com/bendichter/brokenaxes/master/example2.png" title="brokenaxes example" alt="brokenaxes example"/>

You can create discontinuities along the x and/or y axis.
It also has compatibility for a number of other useful features like subplots and non-standard axes like log and datetime.
Check out the documentation with plenty of examples on the [GitHub repo](https://github.com/bendichter/brokenaxes).
118 changes: 118 additions & 0 deletions content/posts/2022-07-17-spiral-plot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
title: 'Spiral Plot'
date: 2022-06-17
tags:
- matplotlib
- python
---

Let's use for example Google Trends results for the search term "gifts."
Google offers this plot:

![gifts-google-trends-plot](../../images/google_trends_gifts.png)

It should be no surprise that these results show a cyclical trend. It looks
like this might be an annual cycle with the max around
Christmas time. It can be hard to create visualizations that bring out
this cyclic pattern. Stacking years on top of each other will require
you to break the year at a certain point, breaking continuous data and
potentially creating the impression of two different spikes when there
is really just one.

I have created way to plot cyclic that I call a "spiral plot."
The data starts at the center of a circle and proceeds out in a spiral.
Each year of time forms a ring around the spiral so that a given angle
of the circle has data from the same time of year on every loop. Here
is the google trend for "gifts" shown as a spiral plot:

![gifts spiral plot with no donut](../../images/spiral_plot_no_donut.png)

This plot is more compact than the line version and may highlight some trends
more clearly. The drawback of this approach is that earlier years are smaller
than more recent years. You can make this less dramatic by giving the circle an
empty center (setting `origin=-2`).

![gifts spiral plot](../../images/spiral_plot_w_donut.png)

Code:

```python
from typing import Optional

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.collections import PatchCollection
from matplotlib.patches import Polygon

def spiral_plot(
data,
num_cycles: int,
num_points_per_seg: int = 100,
angle: float = 0.,
origin: float = 0.,
cmap=None,
show_legend: bool = True,
ax: Optional[plt.Axes] = None
):

if ax is None:
_, ax = plt.subplots(subplot_kw={'projection': 'polar'})

n_segments = len(data)
num_points = num_points_per_seg * n_segments

inner_rs = np.linspace(0, num_cycles, num_points)
outer_rs = inner_rs + 1
thetas = np.linspace(0, 2*np.pi*num_cycles, num_points) + angle

patches = []
for i in range(n_segments):
tt = np.hstack(
(
thetas[i*num_points_per_seg:(i+1)*num_points_per_seg],
thetas[i*num_points_per_seg:(i+1)*num_points_per_seg][::-1]
)
)
rr = np.hstack(
(
inner_rs[i*num_points_per_seg:(i+1)*num_points_per_seg],
outer_rs[i*num_points_per_seg:(i+1)*num_points_per_seg][::-1]
)
)
patch = Polygon(np.c_[tt, rr])
patches.append(patch)

patches = PatchCollection(patches, cmap=cmap)
patches.set_array(data)
ax.add_collection(patches)

ax.set_rlim((None, num_cycles+1))
ax.grid(False)

ax.set_rorigin(origin)

if show_legend:
ax.figure.colorbar(patches, shrink=0.6)

ax.spines.polar.set_visible(False)
ax.spines.inner.set_visible(False)

return ax, patches

# Example usage
import pandas as pd

# data from any google trend
fpath = "multiTimeline.csv"
trend = "gifts"

data = pd.read_csv(fpath, header=1)[f"{trend}: (United States)"].values

ax, patches = spiral_plot(data, 5, angle=2*np.pi*7/12)

# make it prettier
ax.set_xticklabels(["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"])
ax.set_xticks([2*np.pi*i/12 for i in range(0, 12)])
ax.tick_params(axis='x', which='major', pad=-5)
ax.tick_params(axis='y', colors='white')
```
116 changes: 116 additions & 0 deletions content/posts/2022-07-18-git-timesheet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: 'Git Timesheet'
date: 2022-07-18
tags:
- matplotlib
- python
---

I recently faced a situation where I needed to assess the amount of work
done by each member of a team on a project that has spanned over a year.
That project has a git repo, and I could see when each person made a commit.
I decided to break it down by weeks. Whenever a person submitted any commit
to the repo on any branch, I counted them as working on the project for that week.
Of course this is imperfect- someone could work a lot and make no commits for
that week and someone could have submitted a commit but might have worked very
little. Still, this seems like the most fair way to assess work I could think of.


The code will work on any locally cloned git repo. `skip` allows you to remove
contributors, and is ideal for handling bots. `author_map` allows you to tranform
handles. This is ideal if members of your team make some contributions through PRs
from a clone of the repo and some of their PRs through GitHub directly, or if
they have multiple usernames.

```python
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tqdm
import datetime
import matplotlib

def git_timesheet(git_dir, skip=None, author_map=None):

if skip is None:
skip = [
"dependabot[bot]",
"!git for-each-ref --format='%(refname:short)' `git symbolic-ref HEAD`",
]

if author_map is None:
author_map = dict()

os.system(f"git --git-dir {git_dir}/.git log --all --numstat --pretty=format:'--%h--%ad--%aN' --no-renames > git.log")

commits = pd.read_csv("git.log", sep="\u0012", header=None, names=['raw'])

commit_marker = commits[commits['raw'].str.startswith("--",na=False)]
commit_info = commit_marker['raw'].str.extract(r"^--(?P<sha>.*?)--(?P<date>.*?)--(?P<author>.*?)$", expand=True)
commit_info['date'] = pd.to_datetime(commit_info['date'])

file_stats_marker = commits[~commits.index.isin(commit_info.index)]
file_stats = file_stats_marker['raw'].str.split("\t", expand=True)
file_stats = file_stats.rename(columns={0: "insertions", 1: "deletions", 2: "filename"})
file_stats['insertions'] = pd.to_numeric(file_stats['insertions'], errors='coerce')
file_stats['deletions'] = pd.to_numeric(file_stats['deletions'], errors='coerce')

commit_data = commit_info.reindex(commits.index).fillna(method="ffill")
commit_data = commit_data[~commit_data.index.isin(commit_info.index)]
commit_data = commit_data.join(file_stats)

# get total authors and weeks
all_authors = commit_data["author"].unique()
all_authors = list(np.unique([author_map.get(x, x) for x in all_authors if x not in skip]))

dates = commit_data["date"]
start = dates.min()
stop = dates.max()

n_weeks = (stop-start).days // 7

timesheet = np.zeros((len(all_authors), n_weeks))

# iterate over commits and timesheet per week
for week_n in tqdm.trange(n_weeks):
week_start = start + datetime.timedelta(7 * (week_n-1))
week_stop = start + datetime.timedelta(7 * week_n)
commit_data_for_week = commit_data[(week_start < commit_data["date"]) & (commit_data["date"] < week_stop)]
authors_for_week = commit_data_for_week["author"].unique()
# handle different usernames
authors_for_week = list(np.unique([author_map.get(x, x) for x in authors_for_week]))
for i, author in enumerate(all_authors):
if author in authors_for_week:
timesheet[i, week_n] = 1

fig, ax = plt.subplots(figsize=(30, 10))
ax.imshow(timesheet, cmap="Greys")
ax.set_yticks(range(len(all_authors)))
_ = ax.set_yticklabels(all_authors)
ax.set_xlabel("weeks")

plt.minorticks_on()
plt.gca().xaxis.set_minor_locator(matplotlib.ticker.MultipleLocator(1))
plt.gca().yaxis.set_minor_locator(matplotlib.ticker.MultipleLocator(1))
plt.grid(which="both", linewidth=0.25, color="k")
```
I developed a function for parsing the git log and creating a visualization
of weeks worked by each member. The repo I used this for is private, so I will
demonstrate it on a separate repo from CatalystNeuro that is public.
```python
git_timesheet(
"path/to/nwb-conversion-tools",
author_map={
"bendichter": "Ben Dichter",
"luiz": "Luiz Tauffer",
"luiztauffer": "Luiz Tauffer",
"CodyCBakerPhD": "Cody Baker",
"h-mayorquin": "Heberto Mayorquin",
"sbuergers": "Steffen Bürgers",
"weiglszonja": "Szonja Weigl",
}
)
```

![git-timesheet](../../images/git-timesheet.png)
Loading