48일차. Vision - Generative Model & 자연어 딥러닝

48일 차 회고.

자리가 바뀌었는데 뭔가 묘하게 불편해서 심히 공부만 했던 것 같다. 이제 빅데이터분석기사 시험까지 16일 남아서 매일매일 바쁘게 공부를 해야 할 것 같다. 그리고 최대한 스트레스를 받지 않도록 해야 할 것 같은데 그게 힘들어서 고민이다.

1. Generative Model

1-1. Vanilla GAN

GAN(Generative Adversarial Networks)

실제 데이터와 유사한 가짜 데이터를 생성하는 생성 모델
생성자(Generator)와 구분자(Discriminator) 두 개의 모델이 서로 경쟁하는 방식으로 학습한다.

GAN 학습 과정

구분자(Discriminator) 모델이 진짜 데이터를 진짜로 분류하도록 학습시킨다.
생성자(Generator) 모델이 생성한 데이터를 가짜로 분류하도록 학습시킨다.
학습된 구분자(Discriminator) 모델을 속이는 방향으로 생성자(Generator) 모델을 학습시킨다.

Loss Function

$ \underset{G} \min \underset{D} \max V(D, G) = \mathbb{E}_{x \sim p_{data}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)} [\log (1 - D(G(z)))] $
- 구분자(Discriminator) 학습
  - $ \underset{G} \min \underset{D} \max V(D, G) $
    - $ V(D, G) $를 최대화하는 $ D $를 찾는다.
  - $ \mathbb{E}_{x \sim p_{data}(x)} [\log D(x)] $
    - $ D(x) = 1 $ 일 때 $ \log 1 $로 최대
  - $ \mathbb{E}_{z \sim p_{z}(z)} [\log (1 - D(G(z)))] $
    - $ D(G(z)) = 0 $ 일 때 $ \log 1 $로 최대
- 생성자(Generator) 학습
  - $ \underset{G} \min \underset{D} \max V(D, G) $
    - $ V(D, G) $를 최소화하는 $ G $를 찾는다.
  - $ \mathbb{E}_{z \sim p_{z}(z)} [\log (1 - D(G(z)))] $
    - $ D(G(z)) = 1 $ 일 때 $ \log 0 $으로 최소

Model

Discriminator

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        
        self.model = nn.Sequential(
            nn.Linear(in_features=n_features, out_features=512),
            nn.LeakyReLU(negative_slope=0.2, inplace=True),
            
            nn.Linear(in_features=512, out_features=256),
            nn.LeakyReLU(negative_slope=0.2, inplace=True),
            
            nn.Linear(in_features=256, out_features=1),
            nn.Sigmoid()
        )
    
    def forward(self, img):				# [batch_size, color, height, weight]
        flat_img = img.view(img.size(0), -1)		# [batch_size, n_features]
        prob = self.model(flat_img)			# [batch_size, 1]
        return prob

Generator

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        
        def ganlayer(n_input, n_output, dropout=True):
            pipeline = [
                nn.Linear(in_features=n_input, out_features=n_output),
                nn.LeakyReLU(negative_slope=0.2, inplace=True)
            ]
            
            if dropout:
                pipeline.append(nn.Dropout(p=0.25))
            
            return pipeline
        
        self.model = nn.Sequential(
            *ganlayer(n_input=opt.latent_dim, n_output=128, dropout=False),
            *ganlayer(n_input=128, n_output=256),
            *ganlayer(n_input=256, n_output=512),
            *ganlayer(n_input=512, n_output=1024),
            nn.Linear(in_features=1024, out_features=n_features),
            nn.Tanh()
        )
    
    def forward(self, z):				# [batch_size, latent_dim]
        img = self.model(z)				# [batch_size, n_features]
        img = img.view(img.size(0), *img_dims)		# [batch_size, color, height, weight]
        return img

Train

gan_loss = nn.BCELoss()

discriminator = Discriminator().to(device)
generator = Generator().to(device)

optimizer_D = optim.Adam(
    discriminator.parameters(),
    lr=opt.lr,
    betas=(opt.b1, opt.b2)
)
optimizer_G = optim.Adam(
    generator.parameters(),
    lr=opt.lr,
    betas=(opt.b1, opt.b2)
)

for epoch in tqdm(range(opt.n_epochs), desc='epoch'):
    for batch_index, (images, _) in enumerate(batch_iterator):
        # Target 데이터 생성
        real = torch.ones(images.size(0), 1, requires_grad=False).to(device)
        fake = torch.zeros(images.size(0), 1, requires_grad=False).to(device)
        
        # Discriminator Model
        imgs_real = images.to(device)
        
        noise = torch.randn(images.size(0), opt.latent_dim).normal_(0, 1).to(device)
        imgs_fake = Variable(generator(noise), requires_grad=False)
        
        d_real_pred = discriminator(imgs_real)
        d_fake_pred = discriminator(imgs_fake)
        
        d_loss = gan_loss(d_real_pred, real) + gan_loss(d_fake_pred, fake)
        
        optimizer_D.zero_grad()
        d_loss.backward()
        optimizer_D.step()
        
        # Generator Model
        noise = torch.randn(images.size(0), opt.latent_dim).normal_(0, 1).to(device)
        
        imgs_fake = generator(noise)
        
        d_fake_pred = discriminator(imgs_fake)
        g_loss = gan_loss(d_fake_pred, real)
        
        optimizer_G.zero_grad()
        g_loss.backward()
        optimizer_G.step()

1-2. CGAN

Conditional GAN

기본 GAN에 조건(Label) 정보를 추가하여 특정 클래스의 데이터를 생성하도록 확장된 모델
생성자(Generator)와 구분자(Discriminator) 모두 Label 정보를 입력으로 받아 학습한다.
사용자가 지정한 클래스의 데이터를 생성한다.

Model

Discriminator

class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.model = nn.Sequential(
            nn.Linear(in_features=240, out_features=1),
            nn.Sigmoid()
        )
        
        self.x_map = nn.Sequential(
            nn.Linear(in_features=n_features, out_features=240*5)
        )
        self.y_map = nn.Sequential(
            nn.Linear(in_features=opt.n_classes, out_features=50*5)
        )
        self.j_map = nn.Sequential(
            nn.Linear(in_features=240+50, out_features=240*4)
        )
    
    def forward(self, x, y):				# x: [batch_size, color, height, weight] / y: [batch_size, n_classes]
        x = x.view(-1, n_features)			# [batch_size, n_features]
        x = self.x_map(x)				# [batch_size, 240*5]
        x, _ = x.view(-1, 240, 5).max(dim=2)		# [batch_size, 240]
        
        y = y.view(-1, opt.n_classes)			# [batch_size, n_classes]
        y = self.y_map(y)				# [batch_size, 50*5]
        y, _ = y.view(-1, 50, 5).max(dim=2)		# [batch_size, 50]
        
        jmx = torch.cat((x, y), dim=1)			# [batch_size, 240+50]
        jmx = self.j_map(jmx)				# [batch_size, 240*4]
        jmx, _ = jmx.view(-1, 240, 4).max(dim=2)	# [batch_size, 240]
        
        prob = self.model(jmx)				# [batch_size, 1]
        return prob

Generator

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.z_map = nn.Sequential(
            nn.Linear(in_features=opt.latent_dim, out_features=200),
            nn.BatchNorm1d(num_features=200),
            nn.ReLU(inplace=True)
        )
        self.y_map = nn.Sequential(
            nn.Linear(in_features=opt.n_classes, out_features=1000),
            nn.BatchNorm1d(num_features=1000),
            nn.ReLU(inplace=True)
        )
        self.zy_map = nn.Sequential(
            nn.Linear(in_features=200+1000, out_features=1200),
            nn.BatchNorm1d(num_features=1200),
            nn.ReLU(inplace=True)
        )
        self.model = nn.Sequential(
            nn.Linear(in_features=1200, out_features=n_features),
            nn.Tanh()
        )
    
    def forward(self, z, y):				# z: [batch_size, latent_dim] / y: [batch_size, n_classes]
        zh = self.a_map(z)				# [batch_size, 200]
        yh = self.y_map(y)				# [batch_size, 1000]
        zy = torch.cat((zh, yh), dim=1)			# [batch_size, 1200]
        zyh = self.zy_map(zy)				# [batch_size, 1200]
        
        x = self.model(zyh)				# [batch_zise, n_features]
        x = x.view(x.size(0), *img_dims)		# [batch_size, color, height, width]
        return x

Training

gan_loss = nn.BCELoss()

discriminator = Discriminator().to(device)
generator = Generator().to(device)

optimizer_D = optim.Adam(
    discriminator.parameters(),
    lr=opt.lr,
    betas=(opt.b1, opt.b2)
)
optimizer_G = optim.Adam(
    generator.parameters(),
    lr=opt.lr,
    betas=(opt.b1, opt.b2)
)

for epoch in tqdm(range(opt.n_epochs)):
    for batch_index, (batch, labels) in enumerate(batc_iterator):
        # Target 생성
        real = torch.ones(batch.size(0), 1, requires_grad=False).to(device)
        fake = torch.zeros(batch.size(0), 1, requires_grad=False).to(device)
        
        # Input Data 생성
        labels_onehot = torch.zeros(batch.size(0), opt.n_classes).to(device)
        labels_ = labels.long().to(device)
        labels_ = labels_.view(batch.size(0), 1)
        labels_onehot = labels_onehot.scatter_(1, labels_, 1)
        
        imgs_real = batch.to(device)
        
        noise = torch.randn(batch.size(0), opt.latent_dim).normal_(0, 1).to(device)
        imgs_fake = generator(noise, labels_onehot)
        
        # Discriminator
        d_real_pred = discriminator(imgs_real, labels_onehot)
        d_fake-pred = discriminator(imgs_fake, labels_onehot)
        
        d_loss = gan_loss(d_real_pred, real) + gan_loss(d_fake_pred, fake)
        
        optimizer_D.zero_grad()
        d_loss.backward()
        optimizer_D.step()
        
        # Generator
        noise = torch.randn(batch.size(0), opt.latent_dim).normal_(0, 1).to(device)
        imgs_fake = generator(noise, labels_onehot)
        g_loss = gan_loss(discriminator(imgs_fake, labels_onehot), real)
        
        optimizer_G.zero_grad()
        g_loss.backward()
        optimizer_G.step()

1-3. DCGAN

Deep Convolutional GAN

GAN에 CNN 구조를 도입한 모델
기존 GAN의 Fully Connected Layer 대신 Convolutional Layer와 Deconvolution Layer를 사용한다.
구분자(Discriminator)는 이미지를 입력으로 받아 Binary Classification을 수행하는 CNN 구조를 갖는다.
생성자(Generator)는 Random Noise Vector를 입력으로 받아 이미지를 생성하는 Deconvolutional Network 구조를 갖는다.

Model

Discriminator

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        
        self.model = nn.Sequential(
            nn.Conv2d(
                in_channels=opt.channels,
                out_channels=64,
                kernel_size=4,
                stride=2,
                padding=1,
                bias=False
            ),
            nn.LeakyReLU(negative_slope=0.2, inplace=True)
            
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=4, stride=2, padding=1, bias=False),
            nn.LeakyReLU(negative_slope=0.2, inplace=True),
            nn.BatchNorm2d(num_features=256),
            
            nn.Conv2d(in_channels=256, out_channels=1, kernel_size=4, stride=1, padding=0, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, img):				# img: [batch_size, color, height, width]
        prob = self.model(img)				# [batch_size, 1, 1, 1]
        return prob

Generator

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        
        def convlayer(n_input, n_output, k_size=4, stride=2, padding=0):
            block = [
                nn.ConvTranspose2d(
                    in_channels=n_input,
                    out_channels=n_output,
                    kernel_size=k_size,
                    stride=stride,
                    padding=padding,
                    bias=False
                ),
                nn.ReLU(inplace=True),
                nn.BatchNorm2d(num_features=n_output),
            ]
        
        self.model = nn.Sequential(
            *convlayer(n_input=opt.latent_dim, n_output=256, k_size=4, stride=1, padding=0),
            *convlayer(n_input=256, n_output=128, k_size=4, stride=2, padding=1),
            *convlayer(n_input=128, n_output=64, k_size=4, stride=2, padding=1),
            nn.ConvTranspose2d(
                in_channels=64,
                out_channels=opt.channels,
                kernel_size=4,
                stride=2,
                padding=1
            ),
            nn.Tanh()
        )
    
    def forward(self, z):				# z: [batch_size, latent_dim, 1, 1]
        img = self.model(z)				# [batch_size, channels, height, width]
        img = img.view(z.size(0), *img_dims)		# [batch_size, channels, height, width]
        return img

Training

gan_loss = nn.BCELoss()

discriminator = Discriminator().to(device)
generator = Generator().to(device)

optimizer_D = optim.Adam(
    discriminator.parameters(),
    lr=opt.lr,
    betase=(opt.b1, opt.b2)
)
optimizer_G = optim.Adam(
    generator.parameters(),
    lr=opt.lr,
    betas=(opt.b1, opt.b2)
)

for epoch in tqdm(range(opt.n_epochs)):
    for batch_index, (batch, _) in enumerate(batch_iterator):
        # Target 생성
        real = torch.ones(batch.size(0), 1, 1, 1, requires_grad=False).to(device)
        fake = torch.zeros(batch.size(0), 1, 1, 1, requires_grad=False).to(device)
        
        # Features 생성
        imgs_real = batch.to(device)
        
        noise = torch.zeros(batch.size(0), opt.latent_dim, 1, 1, requires_grad=False).normal_(0, 1).to(device)
        imgs_fake = generator(noise)
        
        # Discriminator
        d_real_pred = discriminator(imgs_real)
        d_fake_pred = discriminator(imgs_fake)
        
        d_loss = gan_loss(d_real_pred, real) + gan_loss(d_fake_pred, fake)
        
        optimizer_D.zero_grad()
        d_loss.backward()
        optimizer_D.step()
        
        # Generator
        noise = torch.zeros(batch.size(0), opt.latent_dim, 1, 1, requires_grad=False).normal_(0, 1).to(device)
        
        imgs_fake = generator(noise)
        
        d_fake_pred = discriminator(imgs_fake)
        g_loss = gan_loss(d_fake_pred, real)
        
        optimizer_G.zero_grad()
        g_loss.backward()
        optimizer_G.step()

1-4. InfoGAN

Information Maximizing GAN

기존의 GAN에 조건부 정보(Latent Variable)를 추가하여 이미지 생성의 다양성과 해석력을 높인다.
생성된 이미지에서 특정한 특성을 제어할 수 있다.

Model

Discriminator

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        
        def convlayer(n_input, n_output, k_size=4, stride=2, padding=0, normalize=True):
            block = [
                nn.Conv2d(
                    in_channels=n_input,
                    out_channels=n_output,
                    kernel_size=k_size,
                    stride=stride,
                    padding=padding,
                    bias=False
                )
            ]
            
            if normalize:
                block.append(nn.BatchNorm2d(num_features=n_output))
            block.append(nn.LeakyReLU(negative_slope=0.1, inplace=True))
            return block
        
        self.model = nn.Sequential(
            *convlayer(n_input=opt.channels, n_output=64, k_size=4, stride=2, padding=1, normalize=False),
            *convlayer(n_input=64, n_output=128, k_size=4, stride=2, padding=1),
            *convlayer(n_input=128, n_output=1024, k_size=7, stride=1, padding=0),
        )
        
        self.d_head = nn.Sequential(
            nn.Linear(in_features=1024, out_features=1),
            nn.Sigmoid()
        )
        self.q_head_C = nn.Sequential(
            nn.Linear(in_features=1024, out_features=128),
            nn.BatchNorm1d(num_features=128),
            nn.LeakyReLU(negative_slope=0.1, inplace=True),
            nn.Linear(in_features=128, out_features=2)
        )
        self.q_head_D = nn.Sequential(
            nn.Linear(in_features=1024, out_features=128),
            nn.BatchNorm1d(num_features=128),
            nn.LeakyReLU(negative_slope=0.1, inplace=True),
            nn.Linear(in_features=128, out_features=10),
            nn.Softmax(dim=1)
        )
    
    def forward(self, img):					# [batch_size, channels, height, width]
        conv_out = self.model(img)				# [batch_size, 1024, 1, 1]
        conv_out = conv_out.squeeze(dim=3).squeeze(dim=2)	# [batch_size, 1024]
        prob = self.d_head(conv_out)				# [batch_size, 1]
        
        q = self.q_head_C(conv_out)				# [batch_size, 2]
        
        digit_probs = self.q_head_D(conv_out)			# [batch_size, 10]
        
        return prob, digit_probs, q

Generator

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        
        def convlayer(n_input, n_output, k_size=5, stride=2, padding=0, output_padding=0):
            block = [
                nn.ConvTranspose2d(
                    in_channels=n_input,
                    out_channels=n_output,
                    kernel_size=k_size,
                    stride=stride,
                    padding=padding,
                    output_padding=output_padding),
                    bias=False
                ),
                nn.BatchNorm2d(num_features=n_output),
                nn.ReLU(inplace=True)
            ]
            return block
        
        self.conv_block = nn.Sequential(
            *convlayer(n_input=opt.latent_dim, n_output=1024, k_size=1, stride=1, padding=0, output_padding=0),
            *convlayer(n_input=1024, n_output=128, k_size=7, stride=1, padding=0, output_padding=0),
            *convlayer(n_input=128, n_output=64, k_size=4, stride=2, padding=1, output_padding=0),
            nn.ConvTranspose2d(
                in_channels=64,
                out_channels=opt.channels,
                kernel_size=4,
                stride=2,
                padding=1,
                bias=False
            ),
            nn.Tanh()
        )
    
    def forward(self, z):				# z: [batch_size, latent_dim]
        z = z.view(-1, opt.latent_dim, 1, 1)		# [batch_size, latent_dim, 1, 1]
        img = self.conv_block(z)			# [batch_size, channels, 28, 28]
        return img

Training

gan_loss = nn.BCELoss()

discriminator = Discriminator()
generator = Generator()

optimizer_D = optim.Adam(
    discriminator.parameters(),
    lr=opt.d_lr,
    betas=(opt.b1, opt.b2)
)
optimizer_G = optim.Adam(
    generator.parameters(),
    lr=opt.g_lr,
    betas=(opt.b1, opt.b2)
)

for epoch in tqdm(range(opt.n_epochs)):
    for batch_index, (batch, _) in enumerate(batch_iterator):
        # Target 생성
        real = Variable(Tensor(batch.size(0), 1).fill_(1.0), requires_grad=False)
        fake = Variable(Tensor(batch.size(0), 1).fill_(0.0), requires_grad=False)
        
        # Discriminator
        optimizer_D.zero_grad()
        
        noise = Variable(Tensor(batch.size(0), opt.latent_dim-10-2).normal_(0, 1))
        digits = to_categorical(list(torch.randint(0, 9, (batch.size(0), ))))
        cis = Variable(Tensor(batch.size(0), 2).uniform_(-1, 1))
        
        z = torch.cat((noise, digits, cis), dim=1)
        
        imgs_fake = generator(z)
        
        prob_real, _, _ = discriminator(imgs_real)
        prob_fake, _, _ = discriminator(imgs_fake.data)
        
        d_loss = 0.5 * gan_loss(prob_real, real) + 0.5 * gan_loss(prob_fake, fake)
        
        d_loss.backward()
        optimizer_D.step()
        
        # Generator
        optimizer_G.zero_grad()
        
        d_labels = torch.randint(0, 9, (batch.size(0), )).data.numpy()
        c_labels = Variable(Tensor(batch.size(0), 2).uniform_(-1, 1), requires_grad=False)
        d_targets = Variable(LongTensor(d_labels), requires_grad=False)
        
        noise = Variable(Tensor(batch.size(0), opt.latent_dim-10-2).normal_(0, 1))
        digits = Variable(to_categorical(d_labels))
        
        z = torch.cat((noise, digits, c_labels), dim=1)
        
        imgs_fake = generator(z)
        prob_fake, logits, q = discriminator(imgs_fake)
        
        g_vanilla_loss = gan_loss(prob_fake, real)
        g_discrete_loss = discrete_loss(logits, d_targets)
        g_continuous_loss = continuous_loss(q, c_labels)
        g_loss = g_vanilla_loss + g_discrete_loss + opt.lambda_C * g_continuous_loss
        
        g_loss.backward()
        optimizer_G.step()

2. Image2Text

2-1. Image Captioning

Model

Encoder
- 이미지에서 중요한 특징을 추출한다.
- 보통 CNN을 사용하여 이미지의 특성(사물의 종류, 질감, 관계 등)을 추출하고, 이를 Feature Vector로 변환한다.
Decoder
- 순차적인 모델(RNN, LSTM, GRU 등)을 사용하여, 이미지의 Feature Vector와 현재까지의 Word Embedding을 입력으로 받아 문장을 생성한다.

Attention

이미지의 각 영역에 특정 가중치를 부여하는 방식으로 작동하여 중요한 시각적 정보를 강조한다.

Inference

훈련된 모델을 사용하여 실제 이미지를 입력으로 받아 캡션을 생성한다.
- 입력된 이미지를 Encoder로 처리하여 Feature Vector를 얻는다.
- Decoder가 이 Feature Vector를 바탕으로 문장을 생성한다.
- Greedy Search 또는 Beam Search와 같은 방법을 사용하여 가장 적합한 단어를 순차적으로 선택한다.
  - Greedy Search
    - 매 단계에서 가장 높은 확률을 가진 단어를 선택한다.
    - 매번 가장 높은 확률을 가진 단어를 선택하기 때문에 전체 문맥을 고려하지 못할 수 있다.
  - Beam Search
    - 여러 후보를 고려하여 더 나은 문장을 찾기 위해 여러 단어의 조합을 평가한다.
    - 여러 후보를 동시에 탐색하며, 결과적으로 더 정확한 캡션을 생성할 수 있다.
    - 각 단계에서 가장 좋은 k개의 후보를 고려하여 문장을 생성한다.

2-2. Model

Load Model

Encoder -
- 이미지를 입력받아 벡터 형태로 변환하여 Decoder에 전달한다.
Decoder
- 벡터를 입력받아 자연어 텍스트를 생성한다.

encoder_model = 'microsoft/swin-base-patch4-window7-224-in22k'
decoder_model = 'gpt2'
model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
    encoder_model,
    decoder_model
).to(device)

Load Image Processor and Tokenizer

image_processor
- 이미지를 다차원 벡터 또는 Feature Vector로 변환한다.
  - 이미지를 모델이 이해할 수 있는 형태로 전처리하고, Encoder가 이미지를 받아들이는 방식에 맞게 변환한다.
tokenizer
- 텍스트를 토큰 ID(숫자 리스트)로 변환한다.
  - 벡터를 자연어 텍스트로 변환하는 데 필요한 토큰화를 수행한다.

image_processor = ViTImageProcessor.from_pretrained(encoder_model)
tokenizer = GPT2TokenizerFast.from_pretrained(decoder_model)

'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글

50일차. LLM - LLM 프로젝트(Chatbot) (0)	2025.03.24
49일차. Huggingface - Diffusers & 자연어 딥러닝 - Text2Image & uv & LLM - OpenAI (0)	2025.03.21
47일차. 자연어 딥러닝 - Transformer & 자연어-이미지 멀티모달 - OCR(CRNN) & Vision - Generative Model (0)	2025.03.19
46일차. 자연어 딥러닝 - Seq2Seq & Attention (0)	2025.03.18
45일차. 자연어 딥러닝 - GRU & Seq2Seq (0)	2025.03.17

이네의 개발 노트

48일차. Vision - Generative Model & 자연어 딥러닝 - Image2Text

1. Generative Model

1-1. Vanilla GAN

1-2. CGAN

1-3. DCGAN

1-4. InfoGAN

2. Image2Text

2-1. Image Captioning

2-2. Model

'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글

티스토리툴바

48일차. Vision - Generative Model & 자연어 딥러닝 - Image2Text

1. Generative Model

1-1. Vanilla GAN

1-2. CGAN

1-3. DCGAN

1-4. InfoGAN

2. Image2Text

2-1. Image Captioning

2-2. Model

'SK네트웍스 Family AI캠프 10기 > Daily 회고' 카테고리의 다른 글

'SK네트웍스 Family AI캠프 10기/Daily 회고' Related Articles

티스토리툴바