198 lines
6.6 KiB
Markdown
198 lines
6.6 KiB
Markdown
|
|
|||
|
# Local Privatization/Large Model Interface Access
|
|||
|
|
|||
|
Leveraging open-source LLMs (Large Language Models) and Embedding models, this project enables offline private deployment based on open-source models.
|
|||
|
|
|||
|
In addition, the project supports the invocation of OpenAI API.
|
|||
|
|
|||
|
## Local Privatization Model Access
|
|||
|
|
|||
|
<br>Example of model address configuration, modification of the model_config.py configuration:
|
|||
|
|
|||
|
```bash
|
|||
|
# Recommendation: Use Hugging Face models, preferably the chat models, and avoid using base models, which may not produce correct outputs.
|
|||
|
# Note: When both `llm_model_dict` and `VLLM_MODEL_DICT` are present, the model configuration in `VLLM_MODEL_DICT` takes precedence.
|
|||
|
# Example of `llm_model_dict` configuration:
|
|||
|
|
|||
|
# 1. If the model is placed under the ~/codefuse-chatbot/llm_models path
|
|||
|
# Suppose the model address is as follows
|
|||
|
model_dir: ~/codefuse-chatbot/llm_models/THUDM/chatglm-6b
|
|||
|
|
|||
|
# The reference configuration is as follows
|
|||
|
llm_model_dict = {
|
|||
|
"chatglm-6b": {
|
|||
|
"local_model_path": "THUDM/chatglm-6b",
|
|||
|
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
|
|||
|
"api_key": "EMPTY"
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
VLLM_MODEL_DICT = {
|
|||
|
'chatglm2-6b': "THUDM/chatglm-6b",
|
|||
|
}
|
|||
|
|
|||
|
# or If the model address is as follows
|
|||
|
model_dir: ~/codefuse-chatbot/llm_models/chatglm-6b
|
|||
|
llm_model_dict = {
|
|||
|
"chatglm-6b": {
|
|||
|
"local_model_path": "chatglm-6b",
|
|||
|
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
|
|||
|
"api_key": "EMPTY"
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
VLLM_MODEL_DICT = {
|
|||
|
'chatglm2-6b': "chatglm-6b",
|
|||
|
}
|
|||
|
|
|||
|
# 2. If you do not wish to move the model to ~/codefuse-chatbot/llm_models
|
|||
|
# Also, delete the related code below `Model Path Reset`, see model_config.py for details.
|
|||
|
# Suppose the model address is as follows
|
|||
|
model_dir: ~/THUDM/chatglm-6b
|
|||
|
# The reference configuration is as follows
|
|||
|
llm_model_dict = {
|
|||
|
"chatglm-6b": {
|
|||
|
"local_model_path": "your personl dir/THUDM/chatglm-6b",
|
|||
|
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
|
|||
|
"api_key": "EMPTY"
|
|||
|
}
|
|||
|
}
|
|||
|
|
|||
|
VLLM_MODEL_DICT = {
|
|||
|
'chatglm2-6b': "your personl dir/THUDM/chatglm-6b",
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
```bash
|
|||
|
# 3. Specify the model service to be launched, keeping both consistent
|
|||
|
LLM_MODEL = "chatglm-6b"
|
|||
|
LLM_MODELs = ["chatglm-6b"]
|
|||
|
```
|
|||
|
|
|||
|
```bash
|
|||
|
# Modification of server_config.py configuration, if LLM_MODELS does not have multiple model configurations, no additional settings are needed.
|
|||
|
# Modify the configuration of server_config.py#FSCHAT_MODEL_WORKERS
|
|||
|
"model_name": {'host': DEFAULT_BIND_HOST, 'port': 20057}
|
|||
|
```
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<br>量化模型接入
|
|||
|
|
|||
|
```bash
|
|||
|
# If you need to support the codellama-34b-int4 model, you need to patch fastchat
|
|||
|
cp examples/gptq.py ~/site-packages/fastchat/modules/gptq.py
|
|||
|
# If you need to support the qwen-72b-int4 model, you need to patch fastchat
|
|||
|
cp examples/gptq.py ~/site-packages/fastchat/modules/gptq.py
|
|||
|
|
|||
|
# Quantization requires modification of the llm_api.py configuration
|
|||
|
# Uncomment `kwargs["gptq_wbits"] = 4` in examples/llm_api.py#559
|
|||
|
```
|
|||
|
|
|||
|
## Public Large Model Interface Access
|
|||
|
|
|||
|
```bash
|
|||
|
# Modification of model_config.py configuration
|
|||
|
# ONLINE_LLM_MODEL
|
|||
|
# Other interface development comes from the langchain-chatchat project, untested due to lack of relevant accounts.
|
|||
|
# Specify the model service to be launched, keeping both consistent
|
|||
|
LLM_MODEL = "gpt-3.5-turbo"
|
|||
|
LLM_MODELs = ["gpt-3.5-turbo"]
|
|||
|
```
|
|||
|
|
|||
|
外部大模型接口接入示例
|
|||
|
|
|||
|
```bash
|
|||
|
# 1. Implement a new model access class
|
|||
|
# Refer to ~/examples/model_workers/openai.py#ExampleWorker
|
|||
|
# Implementing the do_chat function will enable the use of LLM capabilities
|
|||
|
|
|||
|
class XXWorker(ApiModelWorker):
|
|||
|
def __init__(
|
|||
|
self,
|
|||
|
*,
|
|||
|
controller_addr: str = None,
|
|||
|
worker_addr: str = None,
|
|||
|
model_names: List[str] = ["gpt-3.5-turbo"],
|
|||
|
version: str = "gpt-3.5",
|
|||
|
**kwargs,
|
|||
|
):
|
|||
|
kwargs.update(model_names=model_names, controller_addr=controller_addr, worker_addr=worker_addr)
|
|||
|
kwargs.setdefault("context_len", 16384) #TODO 16K模型需要改成16384
|
|||
|
super().__init__(**kwargs)
|
|||
|
self.version = version
|
|||
|
|
|||
|
def do_chat(self, params: ApiChatParams) -> Dict:
|
|||
|
'''
|
|||
|
执行Chat的方法,默认使用模块里面的chat函数。
|
|||
|
:params.messages : [
|
|||
|
{"role": "user", "content": "hello"},
|
|||
|
{"role": "assistant", "content": "hello"}
|
|||
|
]
|
|||
|
:params.xx: 详情见 ApiChatParams
|
|||
|
要求返回形式:{"error_code": int, "text": str}
|
|||
|
'''
|
|||
|
return {"error_code": 500, "text": f"{self.model_names[0]}未实现chat功能"}
|
|||
|
|
|||
|
|
|||
|
# Finally, complete the registration in ~/examples/model_workers/__init__.py
|
|||
|
# from .xx import XXWorker
|
|||
|
|
|||
|
# 2. Complete access through an existing model access class
|
|||
|
# Or directly use the existing relevant large model class for use (lacking relevant account testing, community contributions after testing are welcome)
|
|||
|
```
|
|||
|
|
|||
|
|
|||
|
```bash
|
|||
|
# Modification of model_config.py#ONLINE_LLM_MODEL configuration
|
|||
|
# Enter exclusive model details: version, api_base_url, api_key, provider (consistent with the class name above)
|
|||
|
ONLINE_LLM_MODEL = {
|
|||
|
# Online models. Please set different ports for each online API in server_config.
|
|||
|
"openai-api": {
|
|||
|
"model_name": "gpt-3.5-turbo",
|
|||
|
"api_base_url": "https://api.openai.com/v1",
|
|||
|
"api_key": "",
|
|||
|
"openai_proxy": "",
|
|||
|
},
|
|||
|
"example": {
|
|||
|
"version": "gpt-3.5", # Using openai interface as an example
|
|||
|
"api_base_url": "https://api.openai.com/v1",
|
|||
|
"api_key": "",
|
|||
|
"provider": "ExampleWorker",
|
|||
|
},
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
## Launching Large Model Services
|
|||
|
```bash
|
|||
|
# start llm-service (optional) - Launch the large model service separately
|
|||
|
python examples/llm_api.py
|
|||
|
```
|
|||
|
|
|||
|
```bash
|
|||
|
# Test
|
|||
|
import openai
|
|||
|
# openai.api_key = "EMPTY" # Not support yet
|
|||
|
openai.api_base = "http://127.0.0.1:8888/v1"
|
|||
|
# Select the model you launched
|
|||
|
model = "example"
|
|||
|
# create a chat completion
|
|||
|
completion = openai.ChatCompletion.create(
|
|||
|
model=model,
|
|||
|
messages=[{"role": "user", "content": "Hello! What is your name? "}],
|
|||
|
max_tokens=100,
|
|||
|
)
|
|||
|
# print the completion
|
|||
|
print(completion.choices[0].message.content)
|
|||
|
# Once the correct output is confirmed, LLM can be accessed normally.
|
|||
|
```
|
|||
|
|
|||
|
|
|||
|
|
|||
|
or
|
|||
|
|
|||
|
```bash
|
|||
|
# model_config.py#USE_FASTCHAT - Determine whether to integrate local models via fastchat
|
|||
|
USE_FASTCHAT = "gpt" not in LLM_MODEL
|
|||
|
python start.py #221 Automatically executes python llm_api.py
|
|||
|
```
|