codefuse-chatbot/sources/readme_docs/fastchat-en.md

198 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Local Privatization/Large Model Interface Access
Leveraging open-source LLMs (Large Language Models) and Embedding models, this project enables offline private deployment based on open-source models.
In addition, the project supports the invocation of OpenAI API.
## Local Privatization Model Access
<br>Example of model address configuration, modification of the model_config.py configuration:
```bash
# Recommendation: Use Hugging Face models, preferably the chat models, and avoid using base models, which may not produce correct outputs.
# Note: When both `llm_model_dict` and `VLLM_MODEL_DICT` are present, the model configuration in `VLLM_MODEL_DICT` takes precedence.
# Example of `llm_model_dict` configuration:
# 1. If the model is placed under the ~/codefuse-chatbot/llm_models path
# Suppose the model address is as follows
model_dir: ~/codefuse-chatbot/llm_models/THUDM/chatglm-6b
# The reference configuration is as follows
llm_model_dict = {
"chatglm-6b": {
"local_model_path": "THUDM/chatglm-6b",
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
"api_key": "EMPTY"
}
}
VLLM_MODEL_DICT = {
'chatglm2-6b': "THUDM/chatglm-6b",
}
# or If the model address is as follows
model_dir: ~/codefuse-chatbot/llm_models/chatglm-6b
llm_model_dict = {
"chatglm-6b": {
"local_model_path": "chatglm-6b",
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
"api_key": "EMPTY"
}
}
VLLM_MODEL_DICT = {
'chatglm2-6b': "chatglm-6b",
}
# 2. If you do not wish to move the model to ~/codefuse-chatbot/llm_models
# Also, delete the related code below `Model Path Reset`, see model_config.py for details.
# Suppose the model address is as follows
model_dir: ~/THUDM/chatglm-6b
# The reference configuration is as follows
llm_model_dict = {
"chatglm-6b": {
"local_model_path": "your personl dir/THUDM/chatglm-6b",
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
"api_key": "EMPTY"
}
}
VLLM_MODEL_DICT = {
'chatglm2-6b': "your personl dir/THUDM/chatglm-6b",
}
```
```bash
# 3. Specify the model service to be launched, keeping both consistent
LLM_MODEL = "chatglm-6b"
LLM_MODELs = ["chatglm-6b"]
```
```bash
# Modification of server_config.py configuration, if LLM_MODELS does not have multiple model configurations, no additional settings are needed.
# Modify the configuration of server_config.py#FSCHAT_MODEL_WORKERS
"model_name": {'host': DEFAULT_BIND_HOST, 'port': 20057}
```
<br>量化模型接入
```bash
# If you need to support the codellama-34b-int4 model, you need to patch fastchat
cp examples/gptq.py ~/site-packages/fastchat/modules/gptq.py
# If you need to support the qwen-72b-int4 model, you need to patch fastchat
cp examples/gptq.py ~/site-packages/fastchat/modules/gptq.py
# Quantization requires modification of the llm_api.py configuration
# Uncomment `kwargs["gptq_wbits"] = 4` in examples/llm_api.py#559
```
## Public Large Model Interface Access
```bash
# Modification of model_config.py configuration
# ONLINE_LLM_MODEL
# Other interface development comes from the langchain-chatchat project, untested due to lack of relevant accounts.
# Specify the model service to be launched, keeping both consistent
LLM_MODEL = "gpt-3.5-turbo"
LLM_MODELs = ["gpt-3.5-turbo"]
```
外部大模型接口接入示例
```bash
# 1. Implement a new model access class
# Refer to ~/examples/model_workers/openai.py#ExampleWorker
# Implementing the do_chat function will enable the use of LLM capabilities
class XXWorker(ApiModelWorker):
def __init__(
self,
*,
controller_addr: str = None,
worker_addr: str = None,
model_names: List[str] = ["gpt-3.5-turbo"],
version: str = "gpt-3.5",
**kwargs,
):
kwargs.update(model_names=model_names, controller_addr=controller_addr, worker_addr=worker_addr)
kwargs.setdefault("context_len", 16384) #TODO 16K模型需要改成16384
super().__init__(**kwargs)
self.version = version
def do_chat(self, params: ApiChatParams) -> Dict:
'''
执行Chat的方法默认使用模块里面的chat函数。
:params.messages : [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hello"}
]
:params.xx: 详情见 ApiChatParams
要求返回形式:{"error_code": int, "text": str}
'''
return {"error_code": 500, "text": f"{self.model_names[0]}未实现chat功能"}
# Finally, complete the registration in ~/examples/model_workers/__init__.py
# from .xx import XXWorker
# 2. Complete access through an existing model access class
# Or directly use the existing relevant large model class for use (lacking relevant account testing, community contributions after testing are welcome)
```
```bash
# Modification of model_config.py#ONLINE_LLM_MODEL configuration
# Enter exclusive model details: version, api_base_url, api_key, provider (consistent with the class name above)
ONLINE_LLM_MODEL = {
# Online models. Please set different ports for each online API in server_config.
"openai-api": {
"model_name": "gpt-3.5-turbo",
"api_base_url": "https://api.openai.com/v1",
"api_key": "",
"openai_proxy": "",
},
"example": {
"version": "gpt-3.5", # Using openai interface as an example
"api_base_url": "https://api.openai.com/v1",
"api_key": "",
"provider": "ExampleWorker",
},
}
```
## Launching Large Model Services
```bash
# start llm-service (optional) - Launch the large model service separately
python examples/llm_api.py
```
```bash
# Test
import openai
# openai.api_key = "EMPTY" # Not support yet
openai.api_base = "http://127.0.0.1:8888/v1"
# Select the model you launched
model = "example"
# create a chat completion
completion = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": "Hello! What is your name? "}],
max_tokens=100,
)
# print the completion
print(completion.choices[0].message.content)
# Once the correct output is confirmed, LLM can be accessed normally.
```
or
```bash
# model_config.py#USE_FASTCHAT - Determine whether to integrate local models via fastchat
USE_FASTCHAT = "gpt" not in LLM_MODEL
python start.py #221 Automatically executes python llm_api.py
```