Wingman brings GitHub Copilot-style inline completions to Emacs, offering two powerful modes: fast, automatic completions via a local (or remote) llama.cpp server, in addition to manual completions through any backend supported by gptel.
This package is, primarily, a direct port of the excellent llama.vim plugin (see technical design details for more background on how this works).
- Inline "Ghost Text" Completions: Suggestions appear directly in your buffer as you type.
- Dual Completion Modes:
- Native FIM: Fast, efficient completions via
llama.cpp's specialized/infillendpoint for "fill-in-the-middle" (FIM) models. By default, these completions appear automatically as you type. - Emulated FIM: Access a broader range of models via
gptel. These completions must always be manually triggered.
- Native FIM: Fast, efficient completions via
- Local First: No code leaves your machine when using a self-hosted
llama.cppserver with the Native FIM completion model. - Asynchronous by Default: Never blocks your editing while waiting for a completion.
- Response Caching: Repeated requests for the same context are answered instantly from an in-memory cache.
- Project-Aware Context: Uses a ring buffer of text chunks from recently used files (scoped to the current project) to provide more relevant suggestions.
This package requires a running llama.cpp server instance, accessible at wingman-llama-endpoint.
Additionally, to use the Emulated FIM completion mode, you must configure gptel accordingly for each LLM provider you intend to use.
(use-package wingman
:straight (:type git :host github :repo "mjrusso/wingman")Note that this package depends on transient and gptel, which will be installed automatically if necessary.
Clone this repository:
git clone https://github.com/mjrusso/wingman.git ~/.emacs.d/lisp/wingmanAdd then add the directory to your Emacs load-path:
(add-to-list 'load-path "~/.emacs.d/lisp/wingman")
(require 'wingman)Note that dependencies (transient and gptel must also be manually installed in this case.
The behaviour of Wingman can be customized through its customization group (M-x customize-group RET wingman RET), or by setting variables in your init.el.
A minimal configuration:
(use-package wingman
:straight (:type git :host github :repo "mjrusso/wingman")
;; Enable wingman-mode in all programming modes
:hook (prog-mode . wingman-mode))An example of a more advanced configuration:
(use-package wingman
:straight (:type git :host github :repo "mjrusso/wingman")
:ensure t
:defer t
:init
(setq wingman-prefix-key (kbd "C-c w"))
:hook (prog-mode . wingman-mode)
:config
(setq wingman-log-level 4)
(setq wingman-ring-n-chunks 16)
;; default llama.cpp server port is 8012
(setq wingman-llama-endpoint "http://127.0.0.1:8080/infill")
;; assumes use of Modus Themes; substitute with preferred color scheme
(set-face-attribute 'wingman-overlay-face nil
:foreground (modus-themes-get-color-value 'red-warmer)
:background (modus-themes-get-color-value 'bg-inactive))
;; don't provide completions in files that typically contain secrets
(add-to-list 'wingman-disable-predicates
(lambda ()
(or (derived-mode-p 'envrc-file-mode)
(derived-mode-p 'direnv-envrc-mode)
(when buffer-file-name
(let ((fname (file-name-nondirectory buffer-file-name)))
(or (string-equal ".env" fname)
(string-equal ".envrc" fname)
(string-prefix-p ".localrc" fname)))))))
:bind
(:map wingman-mode-prefix-map
("TAB" . wingman-fim) ; Request Native FIM
("S-TAB" . wingman-fim-emulated) ; Request Emulated FIM
("d" . wingman-fim-debug))
:map wingman-mode-completion-transient-map
("TAB" . wingman-accept-full)
("S-TAB" . wingman-accept-line)
("M-S-TAB" . wingman-accept-word)))-
Enable the mode:
wingman-modeis a buffer-local minor mode. You can enable it withM-x wingman-mode, or globally in all applicable buffers withglobal-wingman-mode. (Alternatively, you may enablewingman-modeautomatically via the hook as shown in the configuration examples above.) -
Get Completions:
- Native FIM (
wingman-fim):- If
wingman-auto-fimist(the default), completions will appear automatically as you type using the configuredllama.cppserver. - To manually request a completion from the configured
llama.cppserver, useM-x wingman-fim(bound toC-c w TABin the example config).
- If
- Emulated FIM (
wingman-fim-emulated):**- This is always triggered manually. Use
M-x wingman-fim-emulated(bound toC-c w S-TABin the example config). - Once triggered, a
transientmenu will appear, letting you choose from your configuredgptelbackends and models. Select one to send the request.
- This is always triggered manually. Use
- Native FIM (
-
Accept a Completion:
- Full: Press the "accept full" key (default:
<tab>) to insert the entire suggestion. - Line: Press the "accept line" key (default:
S-TAB) to insert only the first line of the suggestion. - Word: Press the "accept word" key (default:
M-S-TAB) to insert only the first word.
- Full: Press the "accept full" key (default:
-
Dismiss a Completion:
- Keep typing, or
- Move the cursor, or
- Press the manual trigger key again.
brew install llama.cppwinget install llama.cppInstall from your preferred package manager, build from source, or use the latest binaries.
Recommended settings, depending on the amount of available VRAM:
-
More than 16GB VRAM:
llama-server --fim-qwen-7b-default
-
Less than 16GB VRAM:
llama-server --fim-qwen-3b-default
-
Less than 8GB VRAM:
llama-server --fim-qwen-1.5b-default
Note that a FIM ("fill-in-the-middle")-compatible model is required.
To use Emulated FIM mode, configure gptel backends as per the setup instructions.
This project is a direct port of llama.vim to Emacs.
Also see: copilot.el and emacs-copilot for alternative approaches and inspiration.
This project is licensed under the GPL-3.0-or-later License. See the LICENSE file for details.
