EnterpriseDB · iakinsey · Jun 13, 2025 · Jun 16, 2025 · timwaizenegger · Jun 16, 2025
@@ -8,11 +8,13 @@ navigation:
 - openai-completions
 - bert
 - clip
+- llama
 ---
 
 This section provides details of the supported models in EDB Postgres AI - AI Accelerator Pipelines and their capabilities.
 
 * [T5](t5).
+* [Llama](llama).
 * [Embeddings](embeddings), including `openai-embeddings` and `nim-embeddings`.
 * [Completions](completions), including `openai-completions` and `nim-completions`.
 * [BERT](bert).

@@ -0,0 +1,67 @@
+---
+title: Llama
+navTitle: Llama
+description: "Llama is a series of large language models developed by Meta."
+---
+
+Model name: `llama_instruct_local`
+
+## About Llama
+
+LLaMA is a decoder-only transformer model designed for text generation tasks. It uses the standard Transformer architecture without an encoder, processing input tokens autoregressively to predict the next token in sequence. Pre-trained on a large-scale corpus of publicly available text, LLaMA is capable of handling various natural language tasks, including chat, code generation, summarization, and question answering.
+
+Read more about [Llama on Wikipedia](https://en.wikipedia.org/wiki/Llama_(language_model)).
+
+## Supported aidb operations
+
+* decode_text
+* decode_text_batch
+
+## Supported models
+
+* [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
+* [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
+* [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct)
+* [HuggingFaceTB/SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)
+
+## Creating the default model
+
+```sql
+SELECT aidb.create_model('my_llama_model', 'llama_instruct_local');
+```
+
+## Creating a specific model
+
+```sql
+SELECT aidb.create_model(
+  'another_llama_model',
+  'llama_instruct_local',
+  '{"model": "HuggingFaceTB/SmolLM2-135M-Instruct", "revision": "main"}'::JSONB
+)
+```
+
+## Running the model
+
+```sql
+SELECT aidb.decode_text('llama_instruct', 'Why is the sky blue?');
+```
+
+## Model configuration settings
+
+The following configuration settings are available for Llama models:
+
+* `model` &mdash; The Llama model to use. The default is `TinyLlama/TinyLlama-1.1B-Chat-v1.0`.
+* `revision` &mdash; The revision of the model to use. The default is `main`.
+* `system_prompt` &mdash; Optional. Foundational instructions to guide general LLM responses.
+* `use_flash_attention` &mdash; Indicate if the model uses flash attention. The default is `false`.
+* `seed` &mdash; The random seed to use for sampling. The default is `1599222198345926291`.
+* `temperature` &mdash; The temperature to use for sampling. The default is `0.2`.
+* `sample_len` &mdash; The maximum number of tokens to generate. The default is `64`.
+* `repeat_last_n` &mdash; The number of tokens to consider for the repetition penalty. The default is `64`.
+* `repeat_penalty` &mdash; The repetition penalty to use. The default is `1.1`.
+* `top_p` &mdash; Cumulative probability threshold for filtering the token distribution. The default is `0.9`.
+* `use_kv_cache` &mdash; Enables reuse of attention key/value pairs during generation for faster decoding. The default is `true`.
+
+## Model credentials
+
+No credentials are required for local Llama models.