YOLO v2 Net

data · Data
transform param
crop size: 416
mirror: true
data param
source: PASCAL VOC 2007 train + 2012 train & test
batch size: 32
blob shapes
data: [ 1, 3, 416, 416 ]
label: [ 1 ]
activation: leakyrelu

Input image size: 416x416
Input image amount: 3
network model: Convolution
layer: 0
Output amount: 32
Stride: 1
weights size: 3x3
Output image size: 416x416

Input image size: 416x416
Input image amount: 32
network model: MaxPool
layer: 1
Output amount: 32
Stride: 2
weights size: 2x2
Output image size: 208x208

Input image size: 208x208
Input image amount: 32
network model: Convolution
layer: 2
Output amount: 64
Stride: 1
weights size: 3x3
Output image size: 208x208

Input image size: 208x208
Input image amount: 64
network model: MaxPool
layer: 3
Output amount: 64
Stride: 2
weights size: 2x2
Output image size: 104x104

Input image size: 104x104
Input image amount: 64
network model: Convolution
layer: 4
Output amount: 128
Stride: 1
weights size: 3x3
Output image size: 104x104

Input image size: 104x104
Input image amount: 64
network model: Convolution
layer: 5
Output amount: 64
Stride: 1
weights size: 3x3
Output image size: 104x104

Input image size: 104x104
Input image amount: 64
network model: Convolution
layer: 6
Output amount: 128
Stride: 1
weights size: 3x3
Output image size: 104x104

Input image size: 104x104
Input image amount: 128
network model: MaxPool
layer: 7
Output amount: 128
Stride: 2
weights size: 2x2
Output image size: 52x52

Input image size: 52x52
Input image amount: 128
network model: Convolution
layer: 8
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 52x52

Input image size: 52x52
Input image amount: 256
network model: Convolution
layer: 9
Output amount: 128
Stride: 1
weights size: 3x3
Output image size: 52x52

Input image size: 52x52
Input image amount: 128
network model: Convolution
layer: 10
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 52x52

Input image size: 52x52
Input image amount: 256
network model: MaxPool
layer: 11
Output amount: 256
Stride: 2
weights size: 2x2
Output image size: 26x26

Input image size: 26x26
Input image amount: 256
network model: Convolution
layer: 12
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 26x26

Input image size: 26x26
Input image amount: 512
network model: Convolution
layer: 13
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 26x26

Input image size: 26x26
Input image amount: 256
network model: Convolution
layer: 14
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 26x26

Input image size: 26x26
Input image amount: 512
network model: Convolution
layer: 15
Output amount: 256
Stride: 1
weights size: 3x3
Output image size: 26x26

Input image size: 26x26
Input image amount: 256
network model: Convolution
layer: 16
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 26x26

Input image size: 26x26
Input image amount: 512
network model: MaxPool
layer: 17
Output amount: 512
Stride: 2
weights size: 2x2
Output image size: 13x13

Input image size: 13x13
Input image amount: 512
network model: Convolution
layer: 18
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 19
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 512
network model: Convolution
layer: 20
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 21
Output amount: 512
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 512
network model: Convolution
layer: 22
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 23
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 24
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

layer: 25
the 16th layer output

Input image size: 26x26
Input image amount: 512
network model: Convolution
layer: 26
Output amount: 64
weights size: 3x3
Output image size: 26x26

Input image size: 26x26
Input image amount: 64
network model: Reorganization
layer: 27
Output amount: 256
Output image size: 13x13

network model: concat 合併
合併layer: 27 24
layer: 28

Input image size: 13x13
Input image amount: 1280
network model: Convolution
layer: 29
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

Input image size: 13x13
Input image amount: 1024
network model: Convolution
layer: 30
Output amount: 1024
Stride: 1
weights size: 3x3
Output image size: 13x13

Region ·Region

Dectection

Layer	Type	Filters	Size/Stride	Output size
0	Convolutional	32	3x3/1	416x416
1	Maxpool		2x2/2	208x208
2	Convolutional	64	3x3/1	208x208
3	Maxpool		2x2/2	104x104
4	Convolutional	128	3x3/1	104x104
5	Convolutional	64	3x3/1	104x104
6	Convolutional	128	3x3/1	104x104
7	Maxpool		2x2/2	52x52
8	Convolutional	256	3x3/1	52x52
9	Convolutional	128	1x1/1	52x52
10	Convolutional	256	3x3/1	52x52
11	Maxpool		2x2/2	26x26
12	Convolutional	512	3x3/1	26x26
13	Convolutional	256	3x3/1	26x26
14	Convolutional	512	3x3/1	26x26
15	Convolutional	256	1x1/1	26x26
16	Convolutional	512	3x3/1	26x26
17	Maxpool		2x2/2	13x13
18	Convolutional	1024	3x3/1	13x13
19	Convolutional	512	1x1/1	13x13
20	Convolutional	1024	3x3/1	13x13
21	Convolutional	512	1x1/1	13x13
22	Convolutional	1024	3x3/1	13x13
23	Convolutional	1024	3x3/1	13x13
24	Convolutional	1024	3x3/1	13x13
25	Route	16
26	Convolutional	64	1x1/1	26x26
27	Reorg		/2	13x13
28	Route	27 24
29	Convolutional	1024	3x3/1	13x13
30	Convolutional	425	1x1/1	13x13
31	Detection

YOLO v2 Net

Original

Output

Layer 0 convolution result