kernel - Running script was killed - Ask Ubuntu
i running script on 30k images , killed. have caused this?
mona@pascal:~/computer_vision/deep_learning/darknet$ ./darknet coco test cfg/yolo-coco.cfg yolo-coco.weights images 0: convolutional layer: 448 x 448 x 3 image, 64 filters -> 224 x 224 x 64 image 1: maxpool layer: 224 x 224 x 64 image, 2 size, 2 stride 2: convolutional layer: 112 x 112 x 64 image, 192 filters -> 112 x 112 x 192 image 3: maxpool layer: 112 x 112 x 192 image, 2 size, 2 stride 4: convolutional layer: 56 x 56 x 192 image, 128 filters -> 56 x 56 x 128 image 5: convolutional layer: 56 x 56 x 128 image, 256 filters -> 56 x 56 x 256 image 6: convolutional layer: 56 x 56 x 256 image, 256 filters -> 56 x 56 x 256 image 7: convolutional layer: 56 x 56 x 256 image, 512 filters -> 56 x 56 x 512 image 8: maxpool layer: 56 x 56 x 512 image, 2 size, 2 stride 9: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 10: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 11: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 12: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 13: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 14: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 15: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 16: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 17: convolutional layer: 28 x 28 x 512 image, 512 filters -> 28 x 28 x 512 image 18: convolutional layer: 28 x 28 x 512 image, 1024 filters -> 28 x 28 x 1024 image 19: maxpool layer: 28 x 28 x 1024 image, 2 size, 2 stride 20: convolutional layer: 14 x 14 x 1024 image, 512 filters -> 14 x 14 x 512 image 21: convolutional layer: 14 x 14 x 512 image, 1024 filters -> 14 x 14 x 1024 image 22: convolutional layer: 14 x 14 x 1024 image, 512 filters -> 14 x 14 x 512 image 23: convolutional layer: 14 x 14 x 512 image, 1024 filters -> 14 x 14 x 1024 image 24: convolutional layer: 14 x 14 x 1024 image, 1024 filters -> 14 x 14 x 1024 image 25: convolutional layer: 14 x 14 x 1024 image, 1024 filters -> 7 x 7 x 1024 image 26: convolutional layer: 7 x 7 x 1024 image, 1024 filters -> 7 x 7 x 1024 image 27: convolutional layer: 7 x 7 x 1024 image, 1024 filters -> 7 x 7 x 1024 image 28: local layer: 7 x 7 x 1024 image, 256 filters -> 7 x 7 x 256 image 29: connected layer: 12544 inputs, 4655 outputs 30: detection layer forced: using default '0' loading weights yolo-coco.weights...done!
killed
mona@pascal:~/computer_vision/deep_learning/darknet/src$ dmesg | tail -5 [2265064.961124] [28256] 1007 28256 27449 11 55 271 0 sshd [2265064.961126] [28257] 1007 28257 6906 11 19 888 0 bash [2265064.961128] [32519] 1007 32519 57295584 16122050 62725 15112867 0 darknet [2265064.961130] out of memory: kill process 32519 (darknet) score 941 or sacrifice child [2265064.961385] killed process 32519 (darknet) total-vm:229182336kb, anon-rss:64415788kb, file-rss:72412kb
and
[2265064.961128] [32519] 1007 32519 57295584 16122050 62725 15112867 0 darknet [2265064.961130] out of memory: kill process 32519 (darknet) score 941 or sacrifice child [2265064.961385] killed process 32519 (darknet) total-vm:229182336kb, anon-rss:64415788kb, file-rss:72412kb
after process killed have:
$ top | grep -i mem kib mem: 65942576 total, 8932112 used, 57010464 free, 50440 buffers kib swap: 67071996 total, 6666296 used, 60405700 free. 7794708 cached mem pid user pr ni virt res shr s %cpu %mem time+ command kib mem: 65942576 total, 8932484 used, 57010092 free, 50440 buffers kib swap: 67071996 total, 6666296 used, 60405700 free. 7794736 cached mem kib mem: 65942576 total, 8932608 used, 57009968 free, 50448 buffers kib mem: 65942576 total, 8932480 used, 57010096 free, 50448 buffers
my vmstat is:
$ vmstat -s -sm 64397 m total memory 8722 m used memory 305 m active memory 7566 m inactive memory 55674 m free memory 49 m buffer memory 7612 m swap cache 65499 m total swap 6510 m used swap 58989 m free swap 930702519 non-nice user cpu ticks 33069 nice user cpu ticks 121205290 system cpu ticks 4327558564 idle cpu ticks 4518820 io-wait cpu ticks 148 irq cpu ticks 260645 softirq cpu ticks 0 stolen cpu ticks 315976129 pages paged in 829418865 pages paged out 38599842 pages swapped in 46593418 pages swapped out 2984775555 interrupts 3388511507 cpu context switches 1475266463 boot time 162071 forks
the other time ran script 3000 images instead of 30k got error:
28: local layer: 7 x 7 x 1024 image, 256 filters -> 7 x 7 x 256 image 29: connected layer: 12544 inputs, 4655 outputs 30: detection layer forced: using default '0' loading weights yolo-coco.weights...done! opencv error: insufficient memory (failed allocate 23970816 bytes) in outofmemoryerror, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/alloc.cpp, line 52 terminate called after throwing instance of 'cv::exception' what(): /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/alloc.cpp:52: error: (-4) failed allocate 23970816 bytes in function outofmemoryerror aborted (core dumped)
it used 61g of 64g res memory shown in htop.
it's oom (out of memory) killer of linux kernel killing process.
linux kernel allows processes overcommit memory i.e. process can map (e.g. mmap(2)
) more memory available. defined value of file /proc/sys/vm/overcommit_memory
. possible values:
- 0 : heuristics based overcommit (default)
- 1 : overcommit
- 2 : never overcommit
overcommitting enabled default because considered process not use memory maps, well, @ least not @ same time.
the problem begins, when process asking allocate memory (e.g. malloc(2)
) there not enough memory available. kernel trigger oom killer, , kill process(es) based on oom score(s), defined in file /proc/pid/oom_score
values ranging 0 1000, higher value bigger chance oom killer kill process in case of oom situation.
the oom score calculated complex algorithm considering factors such owns process, how long it's been running, how many children has, how memory using, , on. note that, root
owned process 30 deducted (when >=30) real oom score.
you can influence oom score providing adjustment score in /proc/pid/oom_score_adj
file, allowed values range -1000 +1000, negative keep process, , positive influence killing. can check oom_score
of process in question, , make necessary adjustments oom killer not have in it's priority list when start killing. although note that, not recommended when process trying keep hogging memory (like in case).
the alternate solutions include installing more memory obviously, better check if can done within program changing it's algorithm example, or impose resource limit via e.g. cgroups
, result in same situation i'm afraid.
Comments
Post a Comment