온프레미스 환경에서 LLM을 호스팅합니다. 관련 설정은 @허상호에게 문의하세요.
https://www.youtube.com/watch?v=VkcaigvTrug&list=LL&index=17
https://wikidocs.net/book/14314
https://colab.research.google.com/drive/1DOzV_a5at8Llz8mFbHcthvy3dNb0UI3V?usp=sharing - 전체 튜토리얼 코랩 공유
ollama 설치
pip install huggingface-hub
gguf 파일 다운로드
huggingface-cli download heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF ggml-model-Q5_K_M.gguf --local-dir C:\\Users\\Playdata\\github\\ --local-dir-use-symlinks False
Modelfile 생성
FROM ggml-model-Q5_K_M.gguf
TEMPLATE """{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human:
{{ .Prompt }}</s>
<s>Assistant:
"""
SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""
PARAMETER stop <s>
PARAMETER stop </s>
템플릿을 통해 시작과 끝을 명시해서 헛소리 방지(ollama 문법) 모델별로 차이가 있으며 만약 템플릿이 없다고 하면 Base 모델울 참고한다.
ollama create EEVE-Korean-10.8B -f github/Modelfile
ollama list
실행
ollama run EEVE-Korean-10.8B:latest