58일차 - Fine Tuning - LLM 평가지표 & 프롬프트 엔지니어링

58일 차 회고.

오늘부터 LangChain에 들어가서 수업 시간 내내 바빴다. 그래서 오늘 풀어야 할 양을 많이 끝내지 못해서 걱정이 된다. 공부할 시간은 내일까지밖에 없어서 최대한 할 수 있는 만큼 해봐야 할 것 같다. 그리고 LLM 프로젝트에서 유튜브 캡션 정보를 다운받아야 하는데 bot이 아니라고 증명하라고 해서 Colab에서 하다가 Runpod로 옮겨서 했다.

1. LLM 평가지표

1-1. 기계적 평가지표

기계적 평가지표

사람이 개입하지 않고, 알고리즘을 통해 자동으로 점수를 산출한다.

BLEU(BiLingual Evaluation Understudy)

기계 번역의 품질 평가
생성된 문장과 정답 문장의 n-gram 일치율을 기반으로 평가한다.
- n-gram Precision = (생성된 문장의 n-gram 수 중 정답과 일치하는 개수) / (생성된 문장의 전체 n-gram 수)
단어의 표면적 일치 여부만 고려한다.
문맥이나 의미적 유사성을 반영하지 못한다.
동의어나 유의어를 제대로 평가하지 못한다.

ROUGE(Recall-Oriented Understudy for Gisting Evaluation)

텍스트 요약 성능 평가
정답 요약과 생성된 요약이 얼마나 유사한지를 측정한다.
ROUGE 지표
- ROUGE-N
  - n-gram 기반
- ROUGE-L
  - 문장 내 최장 공통 부분열 기반
- ROUGE-S
  - Skip-bigram 기반

METEOR(Metric for the Evaluation of Machine Translation Output)

BLEU의 단점을 보완한다.
단순한 n-gram 매칭을 넘어, 의미적 유사성까지 반영한다.
unigram(단어) 매칭을 활용한다.
Precision과 Recall의 조화평균(F-score)을 활용한다.
동의어 매칭을 지원한다.
Chunk Penalty를 적용한다.
- 연속된 단어 그룹(Chunk)이 얼마나 많은지를 반영한다.

BERT Score

Pre-trained BERT 모델을 이용하여 문장 유사도 평가
단어 단위가 아니라, 문장 전체 의미를 비교한다.
계산 방식
- 사전 학습된 BERT 모델을 사용하여 각 단어의 벡터 표현을 생성한다.
- 정답 문장과 생성된 문장의 각 단어벡터 간 유사도를 측정한다.
- 가장 유사한 단어끼리 정렬하여 점수를 산출한다.

Sem Score

의미론적 유사성을 평가하기 위해 임베딩 모델을 사용한다.

1-2. 벤치마크

2. LangChain

2-1. LangChain

LangChain

LLM 기반 애플리케이션 구축 오픈소스 프레임워크
언어 모델 학습 데이터 외에 새로운 사용자 데이터를 인식할 수 있다.
다양한 LLM을 선택적으로 사용하며, 다른 기능과 연동하여 확장할 수 있다.
체인(Chain) 기반 설계
- 여러 기능 모듈을 체인으로 연결한다.
- 한 모듈의 출력을 다른 모듈이 입력으로 사용한다.

LangChain 주요 모듈

Model I/O
- LLM 호출 인터페이스
Data Connection
- 애플리케이션별로 데이터 소스와 연결한다.
- RAG를 구현할 수 있다.
Chains
- 여러 기능 모듈을 조합하여 체인처럼 연결한다.
Agents
- 외부 리소스와 상호작용을 할 수 있다.
Memory
- LLM이 데이터를 저장하고 검색한다.
- 단기/장기 기억 기능을 활용하여 연속적인 대화를 할 수 있다.
Callbacks
- 체인의 중간 과정 기록 및 스트리밍을 처리한다.
- LLM의 내부 과정을 모니터링한다.

2-2. Prompt

!pip install -U langchain langchain-core langchain_openai

Prompt Template

단일 문장 또는 간단한 명령을 입력하여 단일 문장 또는 간단한 응답을 생성하는 데 사용되는 프롬프트를 구성할 수 있는 문자열 템플릿

input_text = '안녕하세요. 제 이름은 {name}이고, 나이는 {age}살 입니다.'
input_text
# 안녕하세요. 제 이름은 {name}이고, 나이는 {age}살 입니다.

input_text.format(name='홍길동', age=20)
# 안녕하세요. 제 이름은 홍길동이고, 나이는 20살 입니다.

from langchain_core.prompts import PromptTemplate

# PromptTemplate 생성
prompt_template = PromptTemplate.from_template(input_text)
prompt_template
'''
PromptTemplate(input_variables=['age', 'name'], input_types={}, partial_variables={},
               template='안녕하세요. 제 이름은 {name}이고, 나이는 {age}살 입니다.')
'''

# PromptTemplate 변수 조회
prompt_template.input_variables
# ['age', 'name']

# PromptTemplate Template 조회
# '안녕하세요. 제 이름은 {name}이고, 나이는 {age}살 입니다.'

# 데이터 채우기
filled_prompt = prompt_template.invoke({
    'name': '홍길동',
    'age': 20
})
filled_prompt
# StringPromptValue(text='안녕하세요. 제 이름은 홍길동이고, 나이는 20살 입니다.')

filled_prompt.text
# '안녕하세요. 제 이름은 홍길동이고, 나이는 20살 입니다.'

# PromptTemplate 결합
new_prompt_template = prompt_template + ' ' + '주소는 {address} 입니다.'
new_prompt_template
'''
PromptTemplate(input_variables=['address', 'age', 'name'], input_types={},
               partial_variables={}, template='안녕하세요. 제 이름은 {name}이고, 나이는 {age}살 입니다.
                                               주소는 {address} 입니다.')
'''

combined_prompt_template = (
    prompt_template + ' ' +
    PromptTempalte.from_template(
        '아버지를 아버지라 부를 수 없습니다.'
    ) + ' ' +
    '{language}로 번역해주세요.'
)
combined_prompt_template
'''
PromptTemplate(input_variables=['age', 'language', 'name'], input_types={},
               partial_variables={}, template='안녕하세요. 제 이름은 {name}이고, 나이는 {age}살 입니다.
                                               아버지를 아버지라 부를 수 없습니다. {language}로 번역해주세요.')
'''

ChatPromptTemplate

대화형 상황에서 여러 메시지 입력을 기반으로 단일 메시지 응답을 생성하는 데 사용된다.
입력은 여러 메시지를 원소로 갖는 리스트로 구성되며, 각 메시지는 역할(role)과 내용(content)으로 구성된다.

from langchain_core.messages import SystemMessage, HumanMessage

# SystemMessage
system_message = [
    SystemMessage(
        content='You are a helpful assistant.'
    )
]
system_message
# SystemMessage(content='You are a helpful assistant.', additional_kwargs={}, response_metadata={})

# HumanMessage
human_message = [
    HumanMessage(
        content='안녕하세요. 저의 이름은 홍길동입니다.'
    )
]
human_message
# HumanMessage(content='안녕하세요. 저의 이름은 홍길동입니다.', additional_kwargs={}, response_metadata={})

# SystemMessage & HumanMessage
chat_prompt = ChatPromptTemplate(
    messages=[
        system_message,
        human_message
    ]
)
chat_prompt
'''
ChatPromptTemplate(input_variables=[], input_types={}, partial_variables={},
                   messages=[SystemMessage(content='너는 선생님이야.',
                                           additional_kwargs={}, response_metadata={}),
                             HumanMessage(content='안녕하세요. 저의 이름은 홍길동입니다.',
                                          additional_kwargs={}, response_metadata={})])
'''

from langchain_core.prompts import ChatPromptTemplate

chat_prompt = ChatPromptTemplate(
    messages=[
        ('system', 'You are a helpful assistant.'),
        ('human', '{user_input}')
    ]
)
chat_prompt
'''
ChatPromptTemplate(input_variables=['user_input'], input_types={}, partial_variables={},
                   messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={},
                                                                               template='You are a helpful assistant.'), additional_kwargs={}),
                             HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['user_input'], input_types={}, partial_variables={},
                                                                              template='{user_input}'), additional_kwargs={})])
'''

chat_prompt.invoke({
    'user_input': '대한민국의 수도는?'
})
'''
ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant.', additional_kwargs={}, response_metadata={}),
                          HumanMessage(content='대한민국의 수도는?', additional_kwargs={}, response_metadata={})])
'''

2-3. Model

ChatGPT

!pip install langchain langchain_core langchain_openai langchain-community

import os

os.environ['OPENAI_API_KEY'] = ''

# Model
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model='gpt-4o-mini')
result = llm.invoke(
    input=''
)

# Chat Model
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model='gpt-4o-mini')
chat_prompt = ChatPromptTemplate([
    ('system', 'You are a helpful assistant.'),
    ('user', '{user_input}')
])
chat_prompt.input_variables
# ['user_input']

chain = chat_prompt | chat
response = chain.invoke({
    'user_input': ''
})

# LLM 파라미터
from langchain_openai import ChatOpenAI

params = {
    'temperature': 0.7,
    'max_tokens': 500,
    'frequency_penalty': 0.5,
    'presence_penalty': 0.5
}
model = ChatOpenAI(model='gpt-4o-mini', **params)
response = model.invoke({
    'user_input': ''
})

# 토큰 사용량 확인
from langchain.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name='gpt-4o-mini')

with get_openai_callback() as cb:
    result = llm.invoke('대한민국의 수도는?')
    cb
'''
Tokens Used: 22
	Prompt Tokens: 13
		Prompt Tokens Cached: 0
	Completion Tokens: 9
		Reasoning Tokens: 0
Successful Requests: 1
Total Cost (USD): $
'''

with get_openai_callback() as cb:
    result = llm.invoke("대한민국의 수도는?")
    print(f"총 사용된 토큰수: \t\t{cb.total_tokens}")
    print(f"프롬프트에 사용된 토큰수: \t{cb.prompt_tokens}")
    print(f"답변에 사용된 토큰수: \t\t{cb.completion_tokens}")
    print(f"호출에 청구된 금액(USD): \t${cb.total_cost}")
'''
총 사용된 토큰수: 		22
프롬프트에 사용된 토큰수: 	13
답변에 사용된 토큰수: 		9
호출에 청구된 금액(USD): 	$
'''

# Model Serialization - Model Save
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.load import dumpd
import pickle

chat = ChatOpenAI(model='gpt-4o-mini')
chat_prompt = ChatPromptTemplate([
    ('system', 'You are a helpful assistant.'),
    ('user', '{user_input}')
])
chain = chat_prompt | chat
dumped_chain = dumpd(chain)

with open('chatbot.pkl', 'wb') as f:
    pickle.dump(dumped_chain, f)
    
with open('chatbot.pkl', 'rb') as f:
    load_chain = pickle.load(f)

Hugging Face

Setup

!pip install typing_extensions==4.12.2 --upgrade
!pip install langchain langchain_core langchain-community huggingface_hub transformers

import os

os.environ['HF_TOKEN'] = ''
os.environ['TRANSFORMERS_CACHE'] = './cache/'
os.environ['HF_HOME'] = './cache/'

Hugging Face Pipeline

# Hugging Face Pipeline
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id='beomi/KoAlpaca-Polyglot-5.8B',
    task='text-generation',
    pipeline_kwargs={'max_new_tokens': 512},
    device=0
)

# Prompt
from langchain.prompts import PromptTemplate

template = '''
Answer the following question in Korean.

# Question:
{question}

# Answer: '''
prompt = PromptTemplate.from_template(template)
result = prompt.invoke({
    'question': ''
})

# Chain
chain = prompt | hf
response = chain.invoke({
    'question': ''
})

Hugging Face Model

# Flush Memory
import torch, gc

del chain, prompt, hf

gc.collect()
torch.cuda.empty_cache()

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# LLM
model_id = 'beomi/KoAlpaca-Polyglot-5.8B'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)

#pipeline
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512
)

# Hugging Face Pipeline
hf = HuggingFacePipeline(pipeline=pipe)

# Prompt Template
from langchain.prompts import PromptTemplate

template = '''
Answer the following question in Korean.

# Question:
{question}

# Answer: '''
prompt = PromptTemplate.from_template(template)

# Chain
chain = prompt | hf
response = chain.invoke({
    'question': question
})

3. LLM 프로젝트

3-1. 개발 과정

영상 캡션 정보 csv 파일 다운로드

너무 많이 요청할 경우 막히는 경우가 있다.

'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글

60일차. RAG - Native RAG(Loader, Splitter) (1)	2025.04.07
59일차. 프롬프트 엔지니어링 - LangChain(Output Parser & LCEL) & LLM 프로젝트 (0)	2025.04.04
57일차. Fine Tuning - PPO & DPO & LLM 프로젝트 (1)	2025.04.02
56일차. Fine Tuning - Prompt & Quantization & PEFT & GGUF (0)	2025.04.01
55일차. Fine Tuning - DeepSpeed & Accelerate & LLM 프로젝트 (0)	2025.03.31

이네의 개발 노트

58일차 - Fine Tuning - LLM 평가지표 & 프롬프트 엔지니어링 - LangChain(Prompt & Model)

1. LLM 평가지표

1-1. 기계적 평가지표

1-2. 벤치마크

2. LangChain

2-1. LangChain

2-2. Prompt

2-3. Model

3. LLM 프로젝트

3-1. 개발 과정

'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글

티스토리툴바

58일차 - Fine Tuning - LLM 평가지표 & 프롬프트 엔지니어링 - LangChain(Prompt & Model)

1. LLM 평가지표

1-1. 기계적 평가지표

1-2. 벤치마크

2. LangChain

2-1. LangChain

2-2. Prompt

2-3. Model

3. LLM 프로젝트

3-1. 개발 과정

'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글

'SK네트웍스 Family AI캠프 10기/Daily 회고' Related Articles

티스토리툴바