온프레미스 환경에서 LLM을 호스팅합니다. 관련 설정은 @허상호에게 문의하세요.

Ollama 기본

https://www.youtube.com/watch?v=VkcaigvTrug&list=LL&index=17

https://wikidocs.net/book/14314

https://colab.research.google.com/drive/1DOzV_a5at8Llz8mFbHcthvy3dNb0UI3V?usp=sharing - 전체 튜토리얼 코랩 공유

ollama 설치

https://ollama.com/download

pip install huggingface-hub

gguf 파일 다운로드

huggingface-cli download heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF ggml-model-Q5_K_M.gguf --local-dir C:\\Users\\Playdata\\github\\ --local-dir-use-symlinks False

Modelfile 생성

FROM ggml-model-Q5_K_M.gguf

TEMPLATE """{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human:
{{ .Prompt }}</s>
<s>Assistant:
"""

SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""

PARAMETER stop <s>
PARAMETER stop </s>

템플릿을 통해 시작과 끝을 명시해서 헛소리 방지(ollama 문법) 모델별로 차이가 있으며 만약 템플릿이 없다고 하면 Base 모델울 참고한다.

ollama create EEVE-Korean-10.8B -f github/Modelfile
ollama list

실행

ollama run EEVE-Korean-10.8B:latest

파인튜닝 (LoRA, QLoRA)

https://www.youtube.com/watch?v=oZY0D8N6bC8&list=LL&index=1

https://huggingface.co/unsloth