HOME


AI&Deep Learning Solution More Wonderful, More Comfortable

AI&Deep Learning Solution

AI & Deep Learning Solution

Deep Learning YOLO

YOLO(You Only Look Once) is an algorithm that recognizes objects at high speed through a neural network based on deep learning technology.
We use YOLO to provide solutions for high speed object detection.

Object Detection

  • To detect objects with bounding boxes in an image
  • To show a probability of the object inside the box

Yolo-c1 Overview (CVPR 2016)

  • mAP : 63.4%, FPS: 45 fps for VOC2007 (#class: 20)
  • GPU: Titan X, TrainDB: VOC2007+2012, Operations: 8.52 billion
  • Features
    • Extremely fast: You Only Look Once (YOLO) at an image to predict what objects are present and where they are.
    • Globally reasoning: YOLO sees the entire image and covers larger context.
    • Highly generalizable: YOLO detects well even
  • 24 convolutional layers (feature extract layers)
    • 20 layers from GoodLeNet, trained ImageNet 2012 DB
  • 2 fully connected layers (prediction layers)

Yolo-v2 Overview (CVPR 2017)

  • Better, Faster, Stronger
    • Improving recall and localization while maintaining classification accuracy
  • mAP: 76.8%, FPS: 67 fps for VOC2007 (#class: 20), Operations: 5.58 billion
  • New features
    • Simplified network: Darknet-19 (416x416, VGG16-like, 19 layers, fps: +20)
    • Batch normalization (mAP: +2%, mini-batch)
    • High resolution classifier (pre-trained classifier: 224 > 448, mAP: +4%)
    • Convolutional with anchor boxes (mAP: -0.3%, Recall: +7%)
    • Dimension clusters (mAP: +5%)
    • Fine-grained features (mAP: +1%)
    • Multi-scale features (mAP: +1.5%)
  • Convolutional with anchor boxes (Recall: +7%, mAP: -0.3%)
    • Remove 2 fully connected layers (responsible for predicting B.B)
    • Increase bounding boxes up to 5 and introduce B.B templates
    • 5 B.B templates per grid cell: 5 parameters (t x , t y , t w , t o)+20 P(class│obj) per B.B template > 125 parameters per grid cell

Yolo-v3 Overview (Technical Report 2018)

  • Network: Darknet-53+53 more layers (total: 106 conv. layers)
    • Detection at three scales: detection kernel shape > 1x1x(Bx(5+C))