본문 바로가기

SK네트웍스 Family AI캠프 10기/Daily 회고

59일차. 프롬프트 엔지니어링 - LangChain(Output Parser & LCEL) & LLM 프로젝트

더보기

 

59일 차 회고.

 

 오늘 SQLD 최종 합격 발표가 나서 이제 교육기간 동안 취득한 자격증이 2개가 됐다. 너무 쓸데없이 많이 자격증을 딴 것 같기도 한데 그래도 이 지식들을 기반으로 데이터 분석을 해보면 좋을 것 같다. 그리도 내일 빅데이터분석기사 시험이 있어서 일단 끝까지 열심히 공부해야 할 것 같다. 기출문제랑 모의고사를 풀었더니 합격점은 넘었는데 그래도 불안해서 최대한 많이 풀고 오답을 줄여야겠다. 

 

 

 

 

1. LangChain

 

 

1-1. Output Parser

 

Install

!pip install -U langchain langchain-community langchain-core langchain-openai

 

OpenAI Key

import os

os.environ['OPENAI_API_KEY'] = ''

 

Output Parser

from langchain_openai import OpenAI

model = ChatOpenAI(model='gpt-4o-mini')

result = model.invoke('한국의 수도는?')
type(result)
# langchain_core.messages.ai.AIMessage
  • StrOutputParser
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()
chain = model | parser

result = chain.invoke('한국의 수도는?')
type(result)
# Str
  • StructuredOutputParser
from langchain_core.prompts import PromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

response_schema = [
    ResponseSchema(name='answer', description='Answer to the user\s question.')
    ResponseSchema(name='source', description = 'Source used to answer the user\s question. Should be a website.')
]
output_parser = StructuredOutputParser.from_response_schemas(response_schema)
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template='''Answer the user\'s question as best as possible.
    
    {format_instructions}
    
    {question}''',
    input_vatiables=['quesion'],
    partial_variables={'format_instructions': format_instructions)
)

chain = prompt | model | output_parser

result = chain.invoke({
    'question': 'What\'s the capital of Korea?'
})
result
'''
{'answer': 'The capital of Korea is Seoul.',
 'source': 'https://en.wikipedia.org/wiki/Seoul'}
'''
  • CommaSeparatedListOutputParser
    • CSV 형식의 데이터 리스트
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template='''Answer the user\'s question as best as possible.
    
    {user_input}
    
    {format_instructions}''',
    input_variables=['user_input'],
    partial_variables={
        'format_instructions': format_instructions
    }
)

chain = prompt | model | output_parser

result = chain.invoke({
    'user_input': 'Please recommend 3 Korean food.'
})
result
# ['Kimchi', 'Bibimbap', 'Bulgogi']
  • PydanticOutputParser
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, validate_call, Field

class MyOutput(BaseModel),
    name:str = Field(description='name of a cuisne')
    recipe:str = Field(description='recipe to cook the cusine')

output_parser = PydanticOutputParser(pydantic_object=MyOutput)
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template='''Answer the user\'s question as best as possible.

    {user_input}

    {format_instructions}''',
    input_variables=['user_input'],
    partial_variables={
        'format_instructions': format_instructions
    }
)

chain = prompt | model | output_parser

result = chain.invoke({
    'user_input': 'Let me know how to cook \'Bibimbap\'.'
print(result)
'''
name='Bibimbap'
recipe='1. Cook 1 cup of rice according to package instructions. \n
        2. While the rice is cooking, prepare the vegetables: julienne or slice carrots, zucchini, and cucumber. Sauté them separately in a bit of oil until tender. \n
        3. Blanch spinach and bean sprouts in boiling water. Drain and season with sesame oil, salt, and garlic. \n
        4. In a small pan, fry an egg sunny side up. \n
        5. Once the rice is cooked, divide it into bowls. \n
        6. Arrange the sautéed vegetables, spinach, bean sprouts, and the fried egg on top of the rice. \n
        7. Drizzle with gochujang (Korean red pepper paste) and a dash of sesame oil. \n
        8. Mix everything together before eating.'
'''
  • PandasDataFrameOutputParser
import pprint
import pandas as pd
from typing import Any, Dict
from langchain.output_parsers import PandasDataFrameOutputParser

def format_parser_output(parser_output:Dict[str,Any]) -> None:
    for key in parser_output.keys():
        parser_output[key] = parser_output[key].to_dict()
    
    return pprint.PrettyPrinter(width=4, compact=True).pprint(parser_output)

df = pd.read_csv('titanic.csv')
output_parser = PandasDataFrameOutputParser(dataframe=df)
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template='''Answer the user\'s question as best as possible.

    {user_input}

    {format_instructions}''',
    input_variables=['user_input'],
    partial_variables=[
        'format_instructions': format_instructions
    }
)

chain = prompt | model | output_parser

result = chain.invoke({
    'user_input': Retrieve the mean of Age.'
})
result
# {'mean': np.float64(28.0)}

 

 

1-2. LCEL

 

Install

!pip install -U langchain langchain-community langchain-core langchain-openai

 

OpenAI Key

import os

os.environ['OPENAI_API_KEY'] = ''

 

Chain

from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

prompt = PromptTemplate.from_template(
    '{topic}에 대해서 3문장으로 설명해줘.'
)

model = ChatOpenAI(
    model='gpt-4o-mini'
)

chain = prompt | model | StrOutputParser()

 

Synchronous

# 1
result = chain.invoke({
    'topic': 'multi-modal'
})

result

# 2
import time

for token in chain.stream({'topic': 'multi-modal'}):
    print(token, end='', flush=True)
    time.sleep(0.1)

# 3
chain.batch(
    [
        {'topic': 'Python'}, {'topic': 'Multi-Modal'},
        {'topic': 'Machine Learning'}, {'topic': 'Deep Learning'}
    ],
    congit={
        'max_concurrency': 2
    }
)

 

Asynchronous

# 1
result = chain.ainvoke({
    'topic': 'multi-modal'
})

await result

# 2
import time

sync for token in chain.astream({'topic': 'multi-modal'}):
    print(token, end='', flush=True)
    time.sleep(0.1)

 

Parallel

from langchain_core.runnables import RunnableParalled

chain1 = (
    PromptTemplate.from_template('{country}의 수도는?')
    | model
    | StrOutputParser()
)
chain2 = (
    PromptTemplate.from_template('{country}의 면적은?')
    | model
    | StrOutputParser()
)
combined_chain = RunnableParallel(
    capital=chain1,
    area=chain2
)

combined_chain.invoke({
    'country': '대한민국'
})

combined_chain.invoke([
    {
        'country': '대한민국'
    },
    {
        'country': '일본'
    }
])

 

Runnable

  • RunnablePassthrough
    • 입력받은 값을 그대로 전달한다.
from langchain_core.runnables import RunnablePassthrough

run = RunnablePassthrough()
run.invoke({
    'num': 10
})
# {'num': 10}

prompt = PromptTemplate.from_template('{number}의 2배는?')
chain = {
    'number': RunnablePassthrough()
} | prompt | model | StrOutputParser()
chain.invoke(10)
# '10의 2배는 20입니다.'

run = RunnablePassthrough.assign(num=lambda x: x['num'] * 2)
run.invoke({
    'num': 10
})
# {'num': 20}

prompt = PromptTemplate.from_template('{num1} * {num2} = ')
chain(
    RunnablePassthrough.assign(num2=lambda x: x['num1'] * 2)
    | prompt | model | StrOutputParser()
)
chain.invoke({
    'num1': 2
})
# '2 * 4 =  8'
  • RunnableLambda
from langchain_core.runnables import RunnableLambda
from datetime import datetime

def get_today(_):
    return datetime.today().strftime('%b-%d')

prompt = PromptTemplate.from_template(
    '{today}가 생일인 유명인 {number}명을 나열하세요. 또한, 생년월일을 표기해주세요.'
)

chain = (
    {
        'today': RunnableLambda(get_today),
        'number': RunnablePassthrough()
    }
    | prompt | model | StrOutputParser()
)
chain.invoke({
    'number': 3
})

 

 

 

2. LLM 프로젝트

 

 

2-1. 개발 과정

 

영상 csv 파일

  • Data Cleaning
  • RAGAs
    •  Persona
      • 출력 유형 지정
    • Q&A 데이터셋 생성 및 저장
  • Q&A 데이터셋을 통해 모델 학습