더보기
59일 차 회고.
오늘 SQLD 최종 합격 발표가 나서 이제 교육기간 동안 취득한 자격증이 2개가 됐다. 너무 쓸데없이 많이 자격증을 딴 것 같기도 한데 그래도 이 지식들을 기반으로 데이터 분석을 해보면 좋을 것 같다. 그리도 내일 빅데이터분석기사 시험이 있어서 일단 끝까지 열심히 공부해야 할 것 같다. 기출문제랑 모의고사를 풀었더니 합격점은 넘었는데 그래도 불안해서 최대한 많이 풀고 오답을 줄여야겠다.
1. LangChain
1-1. Output Parser
Install
!pip install -U langchain langchain-community langchain-core langchain-openai
OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = ''
Output Parser
from langchain_openai import OpenAI
model = ChatOpenAI(model='gpt-4o-mini')
result = model.invoke('한국의 수도는?')
type(result)
# langchain_core.messages.ai.AIMessage
- StrOutputParser
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()
chain = model | parser
result = chain.invoke('한국의 수도는?')
type(result)
# Str
- StructuredOutputParser
from langchain_core.prompts import PromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
response_schema = [
ResponseSchema(name='answer', description='Answer to the user\s question.')
ResponseSchema(name='source', description = 'Source used to answer the user\s question. Should be a website.')
]
output_parser = StructuredOutputParser.from_response_schemas(response_schema)
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template='''Answer the user\'s question as best as possible.
{format_instructions}
{question}''',
input_vatiables=['quesion'],
partial_variables={'format_instructions': format_instructions)
)
chain = prompt | model | output_parser
result = chain.invoke({
'question': 'What\'s the capital of Korea?'
})
result
'''
{'answer': 'The capital of Korea is Seoul.',
'source': 'https://en.wikipedia.org/wiki/Seoul'}
'''
- CommaSeparatedListOutputParser
- CSV 형식의 데이터 리스트
from langchain.output_parsers import CommaSeparatedListOutputParser
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template='''Answer the user\'s question as best as possible.
{user_input}
{format_instructions}''',
input_variables=['user_input'],
partial_variables={
'format_instructions': format_instructions
}
)
chain = prompt | model | output_parser
result = chain.invoke({
'user_input': 'Please recommend 3 Korean food.'
})
result
# ['Kimchi', 'Bibimbap', 'Bulgogi']
- PydanticOutputParser
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, validate_call, Field
class MyOutput(BaseModel),
name:str = Field(description='name of a cuisne')
recipe:str = Field(description='recipe to cook the cusine')
output_parser = PydanticOutputParser(pydantic_object=MyOutput)
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template='''Answer the user\'s question as best as possible.
{user_input}
{format_instructions}''',
input_variables=['user_input'],
partial_variables={
'format_instructions': format_instructions
}
)
chain = prompt | model | output_parser
result = chain.invoke({
'user_input': 'Let me know how to cook \'Bibimbap\'.'
print(result)
'''
name='Bibimbap'
recipe='1. Cook 1 cup of rice according to package instructions. \n
2. While the rice is cooking, prepare the vegetables: julienne or slice carrots, zucchini, and cucumber. Sauté them separately in a bit of oil until tender. \n
3. Blanch spinach and bean sprouts in boiling water. Drain and season with sesame oil, salt, and garlic. \n
4. In a small pan, fry an egg sunny side up. \n
5. Once the rice is cooked, divide it into bowls. \n
6. Arrange the sautéed vegetables, spinach, bean sprouts, and the fried egg on top of the rice. \n
7. Drizzle with gochujang (Korean red pepper paste) and a dash of sesame oil. \n
8. Mix everything together before eating.'
'''
- PandasDataFrameOutputParser
import pprint
import pandas as pd
from typing import Any, Dict
from langchain.output_parsers import PandasDataFrameOutputParser
def format_parser_output(parser_output:Dict[str,Any]) -> None:
for key in parser_output.keys():
parser_output[key] = parser_output[key].to_dict()
return pprint.PrettyPrinter(width=4, compact=True).pprint(parser_output)
df = pd.read_csv('titanic.csv')
output_parser = PandasDataFrameOutputParser(dataframe=df)
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template='''Answer the user\'s question as best as possible.
{user_input}
{format_instructions}''',
input_variables=['user_input'],
partial_variables=[
'format_instructions': format_instructions
}
)
chain = prompt | model | output_parser
result = chain.invoke({
'user_input': Retrieve the mean of Age.'
})
result
# {'mean': np.float64(28.0)}
1-2. LCEL
Install
!pip install -U langchain langchain-community langchain-core langchain-openai
OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = ''
Chain
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
prompt = PromptTemplate.from_template(
'{topic}에 대해서 3문장으로 설명해줘.'
)
model = ChatOpenAI(
model='gpt-4o-mini'
)
chain = prompt | model | StrOutputParser()
Synchronous
# 1
result = chain.invoke({
'topic': 'multi-modal'
})
result
# 2
import time
for token in chain.stream({'topic': 'multi-modal'}):
print(token, end='', flush=True)
time.sleep(0.1)
# 3
chain.batch(
[
{'topic': 'Python'}, {'topic': 'Multi-Modal'},
{'topic': 'Machine Learning'}, {'topic': 'Deep Learning'}
],
congit={
'max_concurrency': 2
}
)
Asynchronous
# 1
result = chain.ainvoke({
'topic': 'multi-modal'
})
await result
# 2
import time
sync for token in chain.astream({'topic': 'multi-modal'}):
print(token, end='', flush=True)
time.sleep(0.1)
Parallel
from langchain_core.runnables import RunnableParalled
chain1 = (
PromptTemplate.from_template('{country}의 수도는?')
| model
| StrOutputParser()
)
chain2 = (
PromptTemplate.from_template('{country}의 면적은?')
| model
| StrOutputParser()
)
combined_chain = RunnableParallel(
capital=chain1,
area=chain2
)
combined_chain.invoke({
'country': '대한민국'
})
combined_chain.invoke([
{
'country': '대한민국'
},
{
'country': '일본'
}
])
- RunnablePassthrough
- 입력받은 값을 그대로 전달한다.
from langchain_core.runnables import RunnablePassthrough
run = RunnablePassthrough()
run.invoke({
'num': 10
})
# {'num': 10}
prompt = PromptTemplate.from_template('{number}의 2배는?')
chain = {
'number': RunnablePassthrough()
} | prompt | model | StrOutputParser()
chain.invoke(10)
# '10의 2배는 20입니다.'
run = RunnablePassthrough.assign(num=lambda x: x['num'] * 2)
run.invoke({
'num': 10
})
# {'num': 20}
prompt = PromptTemplate.from_template('{num1} * {num2} = ')
chain(
RunnablePassthrough.assign(num2=lambda x: x['num1'] * 2)
| prompt | model | StrOutputParser()
)
chain.invoke({
'num1': 2
})
# '2 * 4 = 8'
- RunnableLambda
from langchain_core.runnables import RunnableLambda
from datetime import datetime
def get_today(_):
return datetime.today().strftime('%b-%d')
prompt = PromptTemplate.from_template(
'{today}가 생일인 유명인 {number}명을 나열하세요. 또한, 생년월일을 표기해주세요.'
)
chain = (
{
'today': RunnableLambda(get_today),
'number': RunnablePassthrough()
}
| prompt | model | StrOutputParser()
)
chain.invoke({
'number': 3
})
2. LLM 프로젝트
2-1. 개발 과정
영상 csv 파일
- Data Cleaning
- RAGAs
- Persona
- 출력 유형 지정
- Q&A 데이터셋 생성 및 저장
- Persona
- Q&A 데이터셋을 통해 모델 학습
'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글
| 61일차. RAG - Native RAG(Vector DB) (0) | 2025.04.08 |
|---|---|
| 60일차. RAG - Native RAG(Loader, Splitter) (1) | 2025.04.07 |
| 58일차 - Fine Tuning - LLM 평가지표 & 프롬프트 엔지니어링 - LangChain(Prompt & Model) (0) | 2025.04.03 |
| 57일차. Fine Tuning - PPO & DPO & LLM 프로젝트 (1) | 2025.04.02 |
| 56일차. Fine Tuning - Prompt & Quantization & PEFT & GGUF (0) | 2025.04.01 |