6.6 KiB
6.6 KiB
Local Privatization/Large Model Interface Access
Leveraging open-source LLMs (Large Language Models) and Embedding models, this project enables offline private deployment based on open-source models.
In addition, the project supports the invocation of OpenAI API.
Local Privatization Model Access
Example of model address configuration, modification of the model_config.py configuration:
# Recommendation: Use Hugging Face models, preferably the chat models, and avoid using base models, which may not produce correct outputs.
# Note: When both `llm_model_dict` and `VLLM_MODEL_DICT` are present, the model configuration in `VLLM_MODEL_DICT` takes precedence.
# Example of `llm_model_dict` configuration:
# 1. If the model is placed under the ~/codefuse-chatbot/llm_models path
# Suppose the model address is as follows
model_dir: ~/codefuse-chatbot/llm_models/THUDM/chatglm-6b
# The reference configuration is as follows
llm_model_dict = {
"chatglm-6b": {
"local_model_path": "THUDM/chatglm-6b",
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
"api_key": "EMPTY"
}
}
VLLM_MODEL_DICT = {
'chatglm2-6b': "THUDM/chatglm-6b",
}
# or If the model address is as follows
model_dir: ~/codefuse-chatbot/llm_models/chatglm-6b
llm_model_dict = {
"chatglm-6b": {
"local_model_path": "chatglm-6b",
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
"api_key": "EMPTY"
}
}
VLLM_MODEL_DICT = {
'chatglm2-6b': "chatglm-6b",
}
# 2. If you do not wish to move the model to ~/codefuse-chatbot/llm_models
# Also, delete the related code below `Model Path Reset`, see model_config.py for details.
# Suppose the model address is as follows
model_dir: ~/THUDM/chatglm-6b
# The reference configuration is as follows
llm_model_dict = {
"chatglm-6b": {
"local_model_path": "your personl dir/THUDM/chatglm-6b",
"api_base_url": "http://localhost:8888/v1", # "name"修改为fastchat服务中的"api_base_url"
"api_key": "EMPTY"
}
}
VLLM_MODEL_DICT = {
'chatglm2-6b': "your personl dir/THUDM/chatglm-6b",
}
# 3. Specify the model service to be launched, keeping both consistent
LLM_MODEL = "chatglm-6b"
LLM_MODELs = ["chatglm-6b"]
# Modification of server_config.py configuration, if LLM_MODELS does not have multiple model configurations, no additional settings are needed.
# Modify the configuration of server_config.py#FSCHAT_MODEL_WORKERS
"model_name": {'host': DEFAULT_BIND_HOST, 'port': 20057}
量化模型接入
# If you need to support the codellama-34b-int4 model, you need to patch fastchat
cp examples/gptq.py ~/site-packages/fastchat/modules/gptq.py
# If you need to support the qwen-72b-int4 model, you need to patch fastchat
cp examples/gptq.py ~/site-packages/fastchat/modules/gptq.py
# Quantization requires modification of the llm_api.py configuration
# Uncomment `kwargs["gptq_wbits"] = 4` in examples/llm_api.py#559
Public Large Model Interface Access
# Modification of model_config.py configuration
# ONLINE_LLM_MODEL
# Other interface development comes from the langchain-chatchat project, untested due to lack of relevant accounts.
# Specify the model service to be launched, keeping both consistent
LLM_MODEL = "gpt-3.5-turbo"
LLM_MODELs = ["gpt-3.5-turbo"]
外部大模型接口接入示例
# 1. Implement a new model access class
# Refer to ~/examples/model_workers/openai.py#ExampleWorker
# Implementing the do_chat function will enable the use of LLM capabilities
class XXWorker(ApiModelWorker):
def __init__(
self,
*,
controller_addr: str = None,
worker_addr: str = None,
model_names: List[str] = ["gpt-3.5-turbo"],
version: str = "gpt-3.5",
**kwargs,
):
kwargs.update(model_names=model_names, controller_addr=controller_addr, worker_addr=worker_addr)
kwargs.setdefault("context_len", 16384) #TODO 16K模型需要改成16384
super().__init__(**kwargs)
self.version = version
def do_chat(self, params: ApiChatParams) -> Dict:
'''
执行Chat的方法,默认使用模块里面的chat函数。
:params.messages : [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hello"}
]
:params.xx: 详情见 ApiChatParams
要求返回形式:{"error_code": int, "text": str}
'''
return {"error_code": 500, "text": f"{self.model_names[0]}未实现chat功能"}
# Finally, complete the registration in ~/examples/model_workers/__init__.py
# from .xx import XXWorker
# 2. Complete access through an existing model access class
# Or directly use the existing relevant large model class for use (lacking relevant account testing, community contributions after testing are welcome)
# Modification of model_config.py#ONLINE_LLM_MODEL configuration
# Enter exclusive model details: version, api_base_url, api_key, provider (consistent with the class name above)
ONLINE_LLM_MODEL = {
# Online models. Please set different ports for each online API in server_config.
"openai-api": {
"model_name": "gpt-3.5-turbo",
"api_base_url": "https://api.openai.com/v1",
"api_key": "",
"openai_proxy": "",
},
"example": {
"version": "gpt-3.5", # Using openai interface as an example
"api_base_url": "https://api.openai.com/v1",
"api_key": "",
"provider": "ExampleWorker",
},
}
Launching Large Model Services
# start llm-service (optional) - Launch the large model service separately
python examples/llm_api.py
# Test
import openai
# openai.api_key = "EMPTY" # Not support yet
openai.api_base = "http://127.0.0.1:8888/v1"
# Select the model you launched
model = "example"
# create a chat completion
completion = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": "Hello! What is your name? "}],
max_tokens=100,
)
# print the completion
print(completion.choices[0].message.content)
# Once the correct output is confirmed, LLM can be accessed normally.
or
# model_config.py#USE_FASTCHAT - Determine whether to integrate local models via fastchat
USE_FASTCHAT = "gpt" not in LLM_MODEL
python start.py #221 Automatically executes python llm_api.py