kernel - Running script was killed - Ask Ubuntu


i running script on 30k images , killed. have caused this?

mona@pascal:~/computer_vision/deep_learning/darknet$ ./darknet coco test cfg/yolo-coco.cfg yolo-coco.weights images 0: convolutional layer: 448 x 448 x 3 image, 64 filters -> 224 x 224 x 64 image 1: maxpool layer: 224 x 224 x 64 image, 2 size, 2 stride 2: convolutional layer: 112 x 112 x 64 image, 192 filters -> 112 x 112 x 192 image 3: maxpool layer: 112 x 112 x 192 image, 2 size, 2 stride 4: convolutional layer: 56 x 56 x 192 image, 128 filters -> 56 x 56 x 128 image 5: convolutional layer: 56 x 56 x 128 image, 256 filters -> 56 x 56 x 256 image 6: convolutional layer: 56 x 56 x 256 image, 256 filters -> 56 x 56 x 256 image 7: convolutional layer: 56 x 56 x 256 image, 512 filters -> 56 x 56 x 512 image 8: maxpool layer: 56 x 56 x 512 image, 2 size, 2 stride 9: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 10: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 11: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 12: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 13: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 14: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 15: convolutional layer: 28 x 28 x 512 image, 256 filters -> 28 x 28 x 256 image 16: convolutional layer: 28 x 28 x 256 image, 512 filters -> 28 x 28 x 512 image 17: convolutional layer: 28 x 28 x 512 image, 512 filters -> 28 x 28 x 512 image 18: convolutional layer: 28 x 28 x 512 image, 1024 filters -> 28 x 28 x 1024 image 19: maxpool layer: 28 x 28 x 1024 image, 2 size, 2 stride 20: convolutional layer: 14 x 14 x 1024 image, 512 filters -> 14 x 14 x 512 image 21: convolutional layer: 14 x 14 x 512 image, 1024 filters -> 14 x 14 x 1024 image 22: convolutional layer: 14 x 14 x 1024 image, 512 filters -> 14 x 14 x 512 image 23: convolutional layer: 14 x 14 x 512 image, 1024 filters -> 14 x 14 x 1024 image 24: convolutional layer: 14 x 14 x 1024 image, 1024 filters -> 14 x 14 x 1024 image 25: convolutional layer: 14 x 14 x 1024 image, 1024 filters -> 7 x 7 x 1024 image 26: convolutional layer: 7 x 7 x 1024 image, 1024 filters -> 7 x 7 x 1024 image 27: convolutional layer: 7 x 7 x 1024 image, 1024 filters -> 7 x 7 x 1024 image 28: local layer: 7 x 7 x 1024 image, 256 filters -> 7 x 7 x 256 image 29: connected layer: 12544 inputs, 4655 outputs 30: detection layer forced: using default '0' loading weights yolo-coco.weights...done! 

killed

mona@pascal:~/computer_vision/deep_learning/darknet/src$ dmesg | tail -5 [2265064.961124] [28256]  1007 28256    27449       11      55      271             0 sshd [2265064.961126] [28257]  1007 28257     6906       11      19      888             0 bash [2265064.961128] [32519]  1007 32519 57295584 16122050   62725 15112867             0 darknet [2265064.961130] out of memory: kill process 32519 (darknet) score 941 or sacrifice child [2265064.961385] killed process 32519 (darknet) total-vm:229182336kb, anon-rss:64415788kb, file-rss:72412kb 

and

[2265064.961128] [32519]  1007 32519 57295584 16122050   62725 15112867             0 darknet [2265064.961130] out of memory: kill process 32519 (darknet) score 941 or sacrifice child [2265064.961385] killed process 32519 (darknet) total-vm:229182336kb, anon-rss:64415788kb, file-rss:72412kb 

after process killed have:

$ top | grep -i mem kib mem:  65942576 total,  8932112 used, 57010464 free,    50440 buffers kib swap: 67071996 total,  6666296 used, 60405700 free.  7794708 cached mem   pid user      pr  ni    virt    res    shr s  %cpu %mem     time+ command                                                                                               kib mem:  65942576 total,  8932484 used, 57010092 free,    50440 buffers kib swap: 67071996 total,  6666296 used, 60405700 free.  7794736 cached mem kib mem:  65942576 total,  8932608 used, 57009968 free,    50448 buffers kib mem:  65942576 total,  8932480 used, 57010096 free,    50448 buffers 

my vmstat is:

$ vmstat -s -sm         64397 m total memory          8722 m used memory           305 m active memory          7566 m inactive memory         55674 m free memory            49 m buffer memory          7612 m swap cache         65499 m total swap          6510 m used swap         58989 m free swap     930702519 non-nice user cpu ticks         33069 nice user cpu ticks     121205290 system cpu ticks    4327558564 idle cpu ticks       4518820 io-wait cpu ticks           148 irq cpu ticks        260645 softirq cpu ticks             0 stolen cpu ticks     315976129 pages paged in     829418865 pages paged out      38599842 pages swapped in      46593418 pages swapped out    2984775555 interrupts    3388511507 cpu context switches    1475266463 boot time        162071 forks 

the other time ran script 3000 images instead of 30k got error:

28: local layer: 7 x 7 x 1024 image, 256 filters -> 7 x 7 x 256 image 29: connected layer: 12544 inputs, 4655 outputs 30: detection layer forced: using default '0' loading weights yolo-coco.weights...done! opencv error: insufficient memory (failed allocate 23970816 bytes) in outofmemoryerror, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/alloc.cpp, line 52 terminate called after throwing instance of 'cv::exception'   what():  /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/alloc.cpp:52: error: (-4) failed allocate 23970816 bytes in function outofmemoryerror  aborted (core dumped) 

it used 61g of 64g res memory shown in htop.

it's oom (out of memory) killer of linux kernel killing process.

linux kernel allows processes overcommit memory i.e. process can map (e.g. mmap(2)) more memory available. defined value of file /proc/sys/vm/overcommit_memory. possible values:

  • 0 : heuristics based overcommit (default)
  • 1 : overcommit
  • 2 : never overcommit

overcommitting enabled default because considered process not use memory maps, well, @ least not @ same time.

the problem begins, when process asking allocate memory (e.g. malloc(2)) there not enough memory available. kernel trigger oom killer, , kill process(es) based on oom score(s), defined in file /proc/pid/oom_score values ranging 0 1000, higher value bigger chance oom killer kill process in case of oom situation.

the oom score calculated complex algorithm considering factors such owns process, how long it's been running, how many children has, how memory using, , on. note that, root owned process 30 deducted (when >=30) real oom score.

you can influence oom score providing adjustment score in /proc/pid/oom_score_adj file, allowed values range -1000 +1000, negative keep process, , positive influence killing. can check oom_score of process in question, , make necessary adjustments oom killer not have in it's priority list when start killing. although note that, not recommended when process trying keep hogging memory (like in case).

the alternate solutions include installing more memory obviously, better check if can done within program changing it's algorithm example, or impose resource limit via e.g. cgroups, result in same situation i'm afraid.


Comments

Popular posts from this blog

download - Firefox cannot save files (most of the time), how to solve? - Super User

windows - "-2146893807 NTE_NOT_FOUND" when repair certificate store - Super User

sql server - "Configuration file does not exist", Event ID 274 - Super User