langchain 模型加载HuggingFaceEmbeddings、文本切割RecursiveCharacterTextSplitter与向量数据库使用FAISS
•
数据库
参考:
https://github.com/TommyTang930/LangChain_LLM_ChatBot
https://python.langchain.com/docs/integrations/vectorstores/faiss
1、文本切割RecursiveCharacterTextSplitter
这里对着类进行了改写,对中文切分更友好
import re
from typing import List, Optional, Any
from langchain.text_splitter import RecursiveCharacterTextSplitter
import logging
logger = logging.getLogger(__name__)
def _split_text_with_regex_from_end(
text: str, separator: str, keep_separator: bool
) -> List[str]:
# Now that we have the separator, split the text
if separator:
if keep_separator:
# The parentheses in the pattern keep the delimiters in the result.
_splits = re.split(f"({separator})", text)
splits = ["".join(i) for i in zip(_splits[0::2], _splits[1::2])]
if len(_splits) % 2 == 1:
splits += _splits[-
本文来自网络,不代表协通编程立场,如若转载,请注明出处:https://www.net2asp.com/316acee5e2.html
