Skip to content
View MichaBiriuchinskii's full-sized avatar
:shipit:
Good vibes
:shipit:
Good vibes

Highlights

  • Pro

Block or report MichaBiriuchinskii

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
MichaBiriuchinskii/README.md

๐Ÿ‘‹ Welcome

I'm Mikhail Biriuchinskii, an R&D Data Scientist and NLP specialist in Paris, with expertise in:

  • ๐Ÿงพ Text & documents โ€” OCR/HTR, multilingual, historical archives
  • ๐Ÿง  LLMs โ€” Fine-tuning, prompt design, retrieval-augmented generation
  • ๐Ÿ—ฃ Speech โ€” Low-resource languages, Whisper pipelines
  • ๐Ÿงฐ Annotation & evaluation โ€” FAIR data, human-in-the-loop workflows
  • ๐Ÿ›  Deployment & tooling โ€” Docker, FastAPI, open-access apps

I build tools at the intersection of language, AI, and data, with a focus on open-source, multimodal processing (text, speech, image), and large language models (LLMs).

I value clarity, rigor, and collaborationโ€”especially across tech and the humanities.


๐Ÿ“ On This GitHub

Youโ€™ll find:

  • ๐Ÿ” NLP demos (classification, NER, RAG, etc.)
  • ๐Ÿ“š Tools for linguists and researchers
  • ๐Ÿ“ฆ Open-source contributions (TAL, OCR/HTR, LLMs)
  • ๐Ÿงช Experiments with Transformers, embeddings, and vector DBs

๐Ÿงญ Open to new opportunities from September 2025.

Pinned Loading

  1. LAAC-LSCP/ChildProject LAAC-LSCP/ChildProject Public

    Python package for the management of day-long recordings of children.

    Python 15 4

  2. htr_llm_evaluation_pipeline htr_llm_evaluation_pipeline Public

    HTR-LLM Evaluation Pipeline: A tool for comparing LLM-generated JSON documents against gold standard references, providing detailed accuracy metrics and visual analysis of extraction quality.

    Python

  3. obtic-sorbonne/Toolbox-site obtic-sorbonne/Toolbox-site Public

    Pandore offers a set of tools that facilitate the most common corpus processing tasks for digital humanities research. Automatic pipelines for a set of tasks are also available

    HTML 6 1

  4. AlinaMV/interface_web AlinaMV/interface_web Public

    HTML