Categories
Uncategorized

Taught you do object detection (IV) with a deep learning: model uses

Previous “taught you how to do object detection (three) with a deep learning: model training,” describes how to use yolov3 train our own object detection model, this article will focus on how to use our trained models to detect Pictures or video object.

If you read the previous article, then you know that we are using a AlexeyAB / darknet project, which although provides a method for object detection, object detection are based on code written in c ++ and python, but there are several issues the following :

    Do not support Chinese display.

    Do not show confidence.

    Detects frame styles are not friendly.

    Object detection code execution written in python always reported type related error, it is estimated that the underlying c ++ program issues.

Among them, Chinese garbled problems and opencv related, there are many online articles have introduced, but they are very complicated, so I based python, learn the code qqwweee / keras-yolo3 project to re-write a set of object detection program, the main idea is plotted in python opencv detection information instead of the library to the image PIL, of course, there are other details of the changes, not one described, directly on the code:
        darknet.py document was revised method detect_image

def detect_image(class_names, net, meta, im, thresh=.5, hier_thresh=.5, nms=.45, debug=False):
    num = c_int(0)
    if debug: print("Assigned num")
    pnum = pointer(num)
    if debug: print("Assigned pnum")
    predict_image(net, im)
    if debug: print("did prediction")

    dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, None, 0, pnum, 0)
    if debug: print("Got dets")
    num = pnum[0]
    if debug: print("got zeroth index of pnum")
    if nms:
        do_nms_sort(dets, num, meta.classes, nms)
    if debug: print("did sort")
    res = []
    if debug: print("about to range")
    for j in range(num):
        if debug: print("Ranging on " + str(j) + " of " + str(num))
        if debug: print("Classes: " + str(meta), meta.classes, meta.names)
        for i in range(meta.classes):
            if debug: print("Class-ranging on " + str(i) + " of " + str(meta.classes) + "= " + str(dets[j].prob[i]))
            if dets[j].prob[i] > 0.0:
                b = dets[j].bbox
                if altNames is None:
       #

nameTag = meta.names [i] This step will lead to mistakes, and should be a preliminary judgment about c ++ program, the parameters directly into the category list to bypass the problem.

nameTag = class_names[i] print(nameTag) else: nameTag = altNames[i] print(nameTag) if debug: print("Got bbox", b) print(nameTag) print(dets[j].prob[i]) print((b.x, b.y, b.w, b.h)) res.append((nameTag, dets[j].prob[i], (b.x, b.y, b.w, b.h))) if debug: print("did range") res = sorted(res, key=lambda x: -x[1]) if debug: print("did sort") free_detections(dets, num) if debug: print("freed detections") return res

Add darknet_video_custom.py, reads as follows

# -*- coding: utf-8 -*-
"""
本模块使用yolov3模型探测目标在图片或视频中的位置
"""
__author__ = '

Programmers eleven polyester raw

'import colorsys import os from timeit import default_timer as timer import cv2 import numpy as np from PIL import ImageDraw, ImageFont, Image import darknet def _convertBack(x, y, w, h): xmin = int(round(x - (w / 2))) xmax = int(round(x + (w / 2))) ymin = int(round(y - (h / 2))) ymax = int(round(y + (h / 2))) return xmin, ymin, xmax, ymax def letterbox_image(image, size): '''resize image with unchanged aspect ratio using padding''' iw, ih = image.size w, h = size scale = min(w / iw, h / ih) nw = int(iw * scale) nh = int(ih * scale) image = image.resize((nw, nh), Image.BICUBIC) new_image = Image.new('RGB', size, (128, 128, 128)) new_image.paste(image, ((w - nw) // 2, (h - nh) // 2)) return new_image class YOLO(object): _defaults = { "configPath": "names-data/yolo-obj.cfg", "weightPath": "names-data/backup/yolo-obj_3000.weights", "metaPath": "names-data/voc.data", "classes_path": "names-data/voc.names", "thresh": 0.3, "iou_thresh": 0.5, # "model_image_size": (416, 416), # "model_image_size": (608, 608), "model_image_size": (800, 800), "gpu_num": 1, } def __init__(self, **kwargs): self.__dict__.update(self._defaults) # set up default values self.__dict__.update(kwargs) # and update with user overrides self.class_names = self._get_class() self.colors = self._get_colors() self.netMain = darknet.load_net_custom(self.configPath.encode("ascii"), self.weightPath.encode("ascii"), 0, 1) # batch size = 1 self.metaMain = darknet.load_meta(self.metaPath.encode("ascii")) self.altNames = self._get_alt_names() def _get_class(self): classes_path = os.path.expanduser(self.classes_path) with open(classes_path, encoding="utf-8") as f: class_names = f.readlines() class_names = [c.strip() for c in class_names] return class_names def _get_colors(self): class_names = self._get_class() # Generate colors for drawing bounding boxes. hsv_tuples = [(x / len(class_names), 1., 1.) for x in range(len(class_names))] colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) colors = list( map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors)) np.random.seed(10101) # Fixed seed for consistent colors across runs. np.random.shuffle(colors) # Shuffle colors to decorrelate adjacent classes. np.random.seed(None) # Reset seed to default. return colors def _get_alt_names(self): try: with open(self.metaPath) as metaFH: metaContents = metaFH.read() import re match = re.search("names *= *(.*)$", metaContents, re.IGNORECASE | re.MULTILINE) if match: result = match.group(1) else: result = None try: if os.path.exists(result): with open(result) as namesFH: namesList = namesFH.read().strip().split("\n") altNames = [x.strip() for x in namesList] except TypeError: pass except Exception: pass return altNames def cvDrawBoxes(self, detections, image): #

Font-related settings, including the font file path, font size

font = ImageFont.truetype(font='font/simfang.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) #

Detection frame border thickness, the formula can be automatically adjusted so that the thickness according to the size of the image

thickness = (image.size[0] + image.size[1]) // 300 # #

Each traverse detection target detected: (classname, probaility, (x, y, w, h))

for c, detection in enumerate(detections): #

Get the current categories and confidence scores goals

classname = detection[0] # score = round(detection[1] * 100, 2) score = round(detection[1], 2) label = '{} {:.2f}'.format(classname, score) #

Calculated upper-left corner detection frame (xmin, ymin) and lower right coordinates (xmax, ymax)

x, y, w, h = detection[2][0], \ detection[2][1], \ detection[2][2], \ detection[2][3] xmin, ymin, xmax, ymax = _convertBack( float(x), float(y), float(w), float(h)) #

Get drawn examples

draw = ImageDraw.Draw(image) #

Get the size of the text displayed

label_size = draw.textsize(label, font) #

The coordinates correspond to the top, left, bottom, right, be careful not to correspond wrong

top, left, bottom, right = (ymin, xmin, ymax, xmax) top = max(0, np.floor(top + 0.5).astype('int32')) left = max(0, np.floor(left + 0.5).astype('int32')) bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32')) right = min(image.size[0], np.floor(right + 0.5).astype('int32')) print(label, (left, top), (right, bottom)) if top - label_size[1] >= 0: text_origin = np.array([left, top - label_size[1]]) else: text_origin = np.array([left, top + 1]) if c > len(self.class_names) - 1: c = 1 #

Draw a border thickness

for i in range(thickness): draw.rectangle( [left + i, top + i, right - i, bottom - i], outline=self.colors[c]) #

Draw boundary detection frame text

draw.rectangle( [tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c]) #

Draw Text

draw.text(text_origin, label, fill=(0, 0, 0), font=font) del draw return image def detect_video(self, video_path, output_path="",show=True): nw = self.model_image_size[0] nh = self.model_image_size[1] assert nw % 32 == 0, 'Multiples of 32 required' assert nh % 32 == 0, 'Multiples of 32 required' vid = cv2.VideoCapture(video_path) if not vid.isOpened(): raise IOError("Couldn't open webcam or video") video_FourCC = cv2.VideoWriter_fourcc(*"mp4v") video_fps = vid.get(cv2.CAP_PROP_FPS) video_size = (nw,nh) isOutput = True if output_path != "" else False if isOutput: print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size)) out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size) accum_time = 0 curr_fps = 0 fps = "FPS: ??" prev_time = timer() # Create an image we reuse for each detect darknet_image = darknet.make_image(nw, nh, 3) while True: return_value, frame = vid.read() if return_value: #

Converted to RGB format because opencv read picture BGR use the default format, but with RGB PIL

frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) image = Image.fromarray(frame_rgb) image_resized = image.resize(video_size, Image.LINEAR) darknet.copy_image_from_bytes(darknet_image, np.asarray(image_resized).tobytes()) detections = darknet.detect_image(self.class_names, self.netMain, self.metaMain, darknet_image, thresh=self.thresh, debug=True) image_resized = self.cvDrawBoxes(detections, image_resized) result = np.asarray(image_resized) #

BGR turn into a format for processing opencv

result = cv2.cvtColor(result, cv2.COLOR_RGB2BGR) curr_time = timer() exec_time = curr_time - prev_time prev_time = curr_time accum_time = accum_time + exec_time curr_fps = curr_fps + 1 if accum_time > 1: accum_time = accum_time - 1 fps = "FPS: " + str(curr_fps) curr_fps = 0 cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.50, color=(255, 0, 0), thickness=2) if show: cv2.imshow("Object Detect", result) if isOutput: print("start write...==========================================") out.write(result) if cv2.waitKey(1) & 0xFF == ord('q'): break else: break out.release() vid.release() cv2.destroyAllWindows() def detect_image(self, image_path, save_path): nw = self.model_image_size[0] nh = self.model_image_size[1] assert nw % 32 == 0, 'Multiples of 32 required' assert nh % 32 == 0, 'Multiples of 32 required' try: image = Image.open(image_path) except: print('Open Error! Try again!') else: image_resized = image.resize((nw, nh), Image.LINEAR) darknet_image = darknet.make_image(nw, nh, 3) darknet.copy_image_from_bytes(darknet_image, np.asarray(image_resized).tobytes()) #

Image recognition category obtained target confidence level, and center coordinates of the detection frame aspect

detections = darknet.detect_image(self.class_names, self.netMain, self.metaMain, darknet_image, thresh=0.25, debug=True) #

In the picture drawn on detections information

image_resized = self.cvDrawBoxes(detections, image_resized) #

After the display draw pictures

image_resized.show() image_resized.save(save_path) if __name__ == "__main__": _yolo = YOLO() _yolo.detect_image("names-data/images/food.JPG", "names-data/images/food_detect.JPG") # _yolo.detect_video("names-data/videos/food.mp4", "names-data/videos/food_detect.mp4",show=False)

The key part of the code above is accompanied by a related note, not one by one interpretation, the other accompanied by Chinese font file, font directory under the project can be put.

Download Link: https://github.com/Halfish/lstm-ctc-ocr/blob/master/fonts/simfang.ttf

Here is my collection of some of the other fonts, you can pick your favorite font to use.

Link: https: //pan.baidu.com/s/1PWS7Hw1z3dkDyq7feZxqEQ extraction code: xu8q

Let’s look at how to display confidence, open src / images.c file, draw_detections_cv_v3 function is replaced with the following code, replacing the note about the project to re-make:

void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output){
    int i, j;
    if (!show_img) return;
    static int frame_id = 0;
    frame_id++;
    for (i = 0; i < num; ++i) {
        char labelstr[4096] = { 0 };
        int class_id = -1;
        for (j = 0; j < classes; ++j) {
            int show = strncmp(names[j], "dont_show", 9);
            if (dets[i].prob[j] > thresh && show) {
                float score=dets[i].prob[j];//

Added confidence in the label on the label

if (class_id < 0) { strcat(labelstr, names[j]); strcat(labelstr, ", "); sprintf(labelstr + strlen(labelstr), "%0.2f", score); class_id = j; } else { strcat(labelstr, ", "); strcat(labelstr, names[j]); strcat(labelstr, ", "); sprintf(labelstr + strlen(labelstr), "%0.2f", score); } printf("%s: %.0f%% ", names[j], score * 100); } } if (class_id >= 0) { int width = show_img->height * .006; int offset = class_id * 123457 % classes; float red = get_color(2, offset, classes); float green = get_color(1, offset, classes); float blue = get_color(0, offset, classes); float rgb[3]; rgb[0] = red; rgb[1] = green; rgb[2] = blue; box b = dets[i].bbox; b.w = (b.w < 1) ? b.w : 1; b.h = (b.h < 1) ? b.h : 1; b.x = (b.x < 1) ? b.x : 1; b.y = (b.y < 1) ? b.y : 1; int left = (b.x - b.w / 2.)*show_img->width; int right = (b.x + b.w / 2.)*show_img->width; int top = (b.y - b.h / 2.)*show_img->height; int bot = (b.y + b.h / 2.)*show_img->height; if (left < 0) left = 0; if (right > show_img->width - 1) right = show_img->width - 1; if (top < 0) top = 0; if (bot > show_img->height - 1) bot = show_img->height - 1; float const font_size = show_img->height / 1000.F; CvPoint pt1, pt2, pt_text, pt_text_bg1, pt_text_bg2; pt1.x = left; pt1.y = top; pt2.x = right; pt2.y = bot; pt_text.x = left; pt_text.y = top - 12; pt_text_bg1.x = left; pt_text_bg1.y = top - (10 + 25 * font_size); pt_text_bg2.x = right; pt_text_bg2.y = top; CvScalar color; color.val[0] = red * 256; color.val[1] = green * 256; color.val[2] = blue * 256; cvRectangle(show_img, pt1, pt2, color, width, 8, 0); if (ext_output) printf("\t(left_x: %4.0f top_y: %4.0f width: %4.0f height: %4.0f)\n", (float)left, (float)top, b.w*show_img->width, b.h*show_img->height); else printf("\n"); cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, width, 8, 0); cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, CV_FILLED, 8, 0); // filled CvScalar black_color; black_color.val[0] = 0; CvFont font; cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, font_size, font_size, 0, font_size * 3, 8); cvPutText(show_img, labelstr, pt_text, &font, black_color); } } if (ext_output) { fflush(stdout); } }

The above operations are ready Afterward, python darknet_video_custom.py can begin to detect objects in the picture or video. Results are as follows:

Is not it cool O (∩_∩) O ~. This series of articles have been written to this four, are “quick to feel cool object detection”, “data labeling”, “training model”, “model uses”, we have experienced the whole process of object detection of objects the process of detecting a certain understanding. Next “taught you how to do object detection (five) with a deep learning: YOLO” will introduce the relevant content YOLO algorithm, let us know how it works behind the target detection.

ok, Benpian friends – so much content, thanks for reading O (∩_∩) O, 88 ~

 

Famous Share


 

People are not running out is the travel, sitting quietly thinking also travel, all to explore, search, touch those unknown situation, whether it is endemic, or soul, is a kind of travel.

– Lin Ching

  

Recommended for you


How to rent a GPU servers in the cloud Ali doing deep learning?

Taught you do object detection (three) with a deep learning: model training

Taught you do object detection (two) with a deep learning: data labeling

Taught you do object detection depth study (a): Fast feel cool object detection

ubuntu16.04 installation Anaconda3

Continue to observe the state of the Nvidia GPU under Ubuntu

 
My blog is about to be synchronized to Tencent cloud + community, inviting all of them settled: https:? //Cloud.tencent.com/developer/support-plan invite_code = 1kvpuxzlylh68

Leave a Reply