Please clone this repo, create a branch for yourself, and then explore the scRNAseq data from this paper. Data is provided in the data directory as an RDS for a monocle3 object holding unprocessed matrices + metadata. Note: to reduce the RDS to within github size limits, we've filtered some non/lowly expressed genes and downsampled the data to 65% of the original cellcount per sample.
Please perform the following tasks and present your work product as a knitted RMD (knit to html).
- Preprocess and cluster the data - annotate decisions in the RMD, we'd like to see your process.
- Roughly label celltypes -- this should include at least granulosa, soma, and germ cells. Can you get any more specific? Are there other additional celltypes? Show plots supporting your celltype labels.
- How do celltypes compare across ages or across in vivo/in vitro samples? (open ended)
Submit by pushing to your code & html files to your branch.
We suggest to not spend more than ~3hrs total on this. Feel free to explore beyond the starter questions here. Find anything particularly interesting? Let's discuss!