Open
Description
Hi, thanks for your great work.
Any plan to support generate() function like vllm or transformer? Without docker, user can also run generation code with python script. Like this:
from vllm import LLM
llm = LLM("facebook/opt-13b", tensor_parallel_size=4)
output = llm.generate("San Franciso is a")
If the framework already supports it, please give me an example, thank you very much.