YOLO v2 Net

conv32 maxpool32 conv64 maxpool64 conv128 conv64 conv128 maxpool128 conv256 conv128 conv256 maxpool256 conv512 conv256 conv512 conv256 conv512 concat maxpool512 conv1024 conv512 conv1024 conv512 conv1024 conv1024 conv1024 conv1024 conv425 reorg route conv64 Dectection 64 32 32 64 64 128 64 128 128 256 256 256 512 256 512 512 512 512 512 128 256 1024 1024 1024 1024 1024 256 1024 425
data · Data
transform param
crop size: 416
mirror: true
data param
source: PASCAL VOC 2007 train + 2012 train & test
batch size: 32
blob shapes
data: [ 1, 3, 416, 416 ]
label: [ 1 ]
activation: leakyrelu
Input image size: 416x416
Input image amount: 3
network model: Convolution
layer: 0
Output amount: 32
Stride: 1
weights size: 3x3
Output image size: 416x416
Input image size: 416x416
Input image amount: 32
network model: MaxPool
layer: 1
Output amount: 32
Stride: 2
weights size: 2x2
Output image size: 208x208
Input image size: 208x208
Input image amount: 32
network model: Convolution
layer: 2
Output amount: 64
Stride: 1
weights size: 3x3
Output image size: 208x208
Input image size: 208x208
Input image amount: 64
network model: MaxPool
layer: 3
Output amount: 64
Stride: 2
weights size: 2x2
Output image size: 104x104
Input image size: 104x104
Input image amount: 64
network model: Convolution
layer: 4
Output amount: 128
Stride: 1
weights size: 3x3
Output image size: 104x104
Input image size: 104x104
Input image amount: 64
network model: Convolution
layer: 5
Output amount: 64
Stride: 1
weights size: 3x3
Output image size: 104x104
Input image size: 104x104
Input image amount: 64
network model: Convolution
layer: 6
Output amount: 128
Stride: 1
weights size: 3x3
Output image size: 104x104
Input image size: 104x104
Input image amount: 128
network model: MaxPool
layer: 7
Output amount: 128
Stride: 2
weights size: 2x2
Output image size: 52x52
Input image size: 52x52
Input image amount: 128
network model: Convolution
layer: 8
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 52x52
Input image size: 52x52
Input image amount: 256
network model: Convolution
layer: 9
Output amount: 128
Stride: 1
weights size: 3x3
Output image size: 52x52
Input image size: 52x52
Input image amount: 128
network model: Convolution
layer: 10
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 52x52
Input image size: 52x52
Input image amount: 256
network model: MaxPool
layer: 11
Output amount: 256
Stride: 2
weights size: 2x2
Output image size: 26x26
Input image size: 26x26
Input image amount: 256
network model: Convolution
layer: 12
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 26x26
Input image size: 26x26
Input image amount: 512
network model: Convolution
layer: 13
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 26x26
Input image size: 26x26
Input image amount: 256
network model: Convolution
layer: 14
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 26x26
Input image size: 26x26
Input image amount: 512
network model: Convolution
layer: 15
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 26x26
Input image size: 26x26
Input image amount: 256
network model: Convolution
layer: 16
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 26x26
Input image size: 26x26
Input image amount: 512
network model: MaxPool
layer: 17
Output amount: 512
Stride: 2
weights size: 2x2
Output image size: 13x13
Input image size: 13x13
Input image amount: 512
network model: Convolution
layer: 18
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 19
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 512
network model: Convolution
layer: 20
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 21
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 512
network model: Convolution
layer: 22
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 23
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 24
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
layer: 25
the 16th layer output
Input image size: 26x26
Input image amount: 512
network model: Convolution
layer: 26
Output amount: 64
weights size: 3x3
Output image size: 26x26
Input image size: 26x26
Input image amount: 64
network model: Reorganization
layer: 27
Output amount: 256
Output image size: 13x13
network model: concat 合併
合併layer: 27 24
layer: 28
Input image size: 13x13
Input image amount: 1280
network model: Convolution
layer: 29
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 30
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13
Region ·Region
Dectection
Layer Type Filters Size/Stride Output size
0 Convolutional 32 3x3/1 416x416
1 Maxpool 2x2/2 208x208
2 Convolutional 64 3x3/1 208x208
3 Maxpool 2x2/2 104x104
4 Convolutional 128 3x3/1 104x104
5 Convolutional 64 3x3/1 104x104
6 Convolutional 128 3x3/1 104x104
7 Maxpool 2x2/2 52x52
8 Convolutional 256 3x3/1 52x52
9 Convolutional 128 1x1/1 52x52
10 Convolutional 256 3x3/1 52x52
11 Maxpool 2x2/2 26x26
12 Convolutional 512 3x3/1 26x26
13 Convolutional 256 3x3/1 26x26
14 Convolutional 512 3x3/1 26x26
15 Convolutional 256 1x1/1 26x26
16 Convolutional 512 3x3/1 26x26
17 Maxpool 2x2/2 13x13
18 Convolutional 1024 3x3/1 13x13
19 Convolutional 512 1x1/1 13x13
20 Convolutional 1024 3x3/1 13x13
21 Convolutional 512 1x1/1 13x13
22 Convolutional 1024 3x3/1 13x13
23 Convolutional 1024 3x3/1 13x13
24 Convolutional 1024 3x3/1 13x13
25 Route 16
26 Convolutional 64 1x1/1 26x26
27 Reorg /2 13x13
28 Route 27 24
29 Convolutional 1024 3x3/1 13x13
30 Convolutional 425 1x1/1 13x13
31 Detection

Original

Output

Layer 0 convolution result