目录

1 INTRODUCTION

2 RELATED WORK

2.1 REPRESENTATION LEARNING FROM UNLABELED DATA

2.2 GENERATING NATURAL IMAGES

3 APPROACH AND MODEL ARCHITECTURE

4 DETAILS OF ADVERSARIAL TRAINING

5 EMPIRICAL VALIDATION OF DCGANS CAPABILITIES

6 INVESTIGATING AND VISUALIZING THE INTERNALS OF THE NETWORKS

6.1 WALKING IN THE LATENT SPACE

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

ACKNOWLEDGMENTS

REFERENCES


UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS (基于深度卷积生成对抗网络的无监督表示学习)

1 INTRODUCTION

Learning reusable feature representations from large unlabeled datasets has been an area of active research.
  1. We propose that one way to build good image representations is by training Generative Adversarial Networks (GANs) , 
  2. and later reusing parts of the generator and discriminator networks as feature extractors for supervised tasks.
  3. GANs provide an attractive alternative to maximum likelihood techniques.

defect:

  1. GANs have been known to be unstable to train,
  2. often resulting in generators that produce nonsensical outputs.

In this paper, they make the following contributions :

  1. We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to train in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN)
  2. We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms.
  3. We visualize the filters learnt by GANs and empirically show that specific filters have learned to draw specific objects.
  4. We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples .

2 RELATED WORK

2.1 REPRESENTATION LEARNING FROM UNLABELED DATA

从未标记的数据中学习表示

Some methods of unsupervised learning are introduced

2.2 GENERATING NATURAL IMAGES

Generative image models are well studied and fall into two categories: parametric and non-parametric.
 
Some uses of  non-parametric and parametric models
 

3 APPROACH AND MODEL ARCHITECTURE

Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.
 
Architecture guidelines for stable Deep Convolutional GANs
  1.  Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
  2.  Use batchnorm in both the generator and the discriminator.
  3.  Remove fully connected hidden layers for deeper architectures.
  4.  Use ReLU activation in generator for all layers except for the output, which uses Tanh.
  5.  Use LeakyReLU activation in the discriminator for all layers.

4 DETAILS OF ADVERSARIAL TRAINING

  • No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1].
  • All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128.
  • All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02.
  • In the LeakyReLU, the slope of the leak was set to 0.2 in all models.
  • While previous GAN work has used momentum to accelerate training, we used the Adam optimizer with tuned hyperparameters.
  • We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead.
  • Additionally, we found leaving the momentum term β1 at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training.

5 EMPIRICAL VALIDATION OF DCGANS CAPABILITIES

 
DCGAN的能力验证
5.1 C LASSIFYING CIFAR-10 USING GANS AS A FEATURE EXTRACTOR
5.2 C LASSIFYING SVHN DIGITS USING GANS AS A FEATURE EXTRACTOR
 

6 INVESTIGATING AND VISUALIZING THE INTERNALS OF THE NETWORKS

6.1 WALKING IN THE LATENT SPACE

 
The first experiment we did was to understand the landscape of the latent space. If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations. 

意思就是G学习到了潜在的图像(生成和原来风格差异很大的图像),例如:它可以自动的添加或者删除某些物品,那么我们就认为模型不错


Figure 4: Top rows: Interpolation between a series of 9 random points in Z show that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom.
In the 6th row, you see a room without a window slowly transforming into a room with a giant window.
In the 10th row, you see what appears to be a TV slowly being transformed into a window.
 

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

Figure 7: Vector arithmetic for visual concepts. For each column, the Z vectors of samples are averaged. Arithmetic was then performed on the mean vectors creating a new vector Y . The center sample on the right hand side is produce by feeding Y as input to the generator. To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale +-0.25 was added to Y to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples) results in noisy overlap due to misalignment.
Figure 8: A ”turn” vector was created from four averaged samples of faces looking left vs looking right. By adding interpolations along this axis to random samples we were able to reliably transform their pose.
 

ACKNOWLEDGMENTS

REFERENCES

Logo

讨论HarmonyOS开发技术,专注于API与组件、DevEco Studio、测试、元服务和应用上架分发等。

更多推荐