Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
CN118570889A - Sequence image target recognition method, device and electronic device based on image quality optimization - Google Patents
[go: Go Back, main page]

CN118570889A - Sequence image target recognition method, device and electronic device based on image quality optimization - Google Patents

Sequence image target recognition method, device and electronic device based on image quality optimization Download PDF

Info

Publication number
CN118570889A
CN118570889A CN202411062099.8A CN202411062099A CN118570889A CN 118570889 A CN118570889 A CN 118570889A CN 202411062099 A CN202411062099 A CN 202411062099A CN 118570889 A CN118570889 A CN 118570889A
Authority
CN
China
Prior art keywords
target object
image
object image
target
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411062099.8A
Other languages
Chinese (zh)
Other versions
CN118570889B (en
Inventor
宋鸿飞
王麒
陈帅斌
蒋泽飞
夏虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Denghong Technology Co Ltd
Original Assignee
Hangzhou Denghong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Denghong Technology Co Ltd filed Critical Hangzhou Denghong Technology Co Ltd
Priority to CN202411062099.8A priority Critical patent/CN118570889B/en
Publication of CN118570889A publication Critical patent/CN118570889A/en
Application granted granted Critical
Publication of CN118570889B publication Critical patent/CN118570889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/467Encoded features or binary features, e.g. local binary patterns [LBP]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Quality & Reliability (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the field of image data processing, in particular to a sequential image target identification method and device based on image quality optimization and electronic equipment. Firstly, acquiring a queue of object images to be identified, which are acquired by a camera, then, determining the target IOU relation of each object image to be identified based on a face target detection network and a human body target detection network to obtain the queue of the target IOU relation, then, extracting the queue of the object images from the queue of the object images to be identified based on the queue of the target IOU relation, then, selecting the object image with the optimal face quality from the queue of the object images to be detected as the object image to be detected, then, carrying out face recognition on the object image to be detected to obtain a recognition result, wherein the recognition result is a person identity tag, and finally, designating the person identity tag in the recognition result as the person identity tag of the queue of the object image.

Description

Image quality optimization-based sequential image target identification method and device and electronic equipment
Technical Field
The application relates to the field of image data processing, in particular to a sequential image target identification method and device based on image quality optimization and electronic equipment.
Background
In modern video monitoring and intelligent security systems, sequential image target recognition technology plays a vital role. With the development of technology, the requirements on the accuracy and the efficiency of image recognition are higher and higher. However, the prior art has some limitations in processing large-scale image data, particularly in cross-domain personnel identification and trajectory tracking. For example, in the existing cross-domain personnel recognition and track tracking algorithm, a plurality of shot face images are usually sent to recognition to construct personnel tracks, so that each shot face image needs to be sent to a face recognition module to match personnel identity ids, excessive computing resources are consumed, time and labor are wasted, and efficiency is low. In addition, due to different factors such as the visual angle, illumination condition, distance and the like of the camera, the quality of the captured face image is uneven, which can lead to doubtful credibility of face recognition results. Moreover, not every captured face image meets the requirement of face recognition, but if some image frames are filtered, gaps and break points can appear when the personnel track is drawn, and the integrity of the track and the accuracy of analysis are affected.
Accordingly, a sequential image object recognition scheme based on image quality preference is desired.
Disclosure of Invention
The present application has been made in view of the above problems. An object of the present application is to provide a sequential image target recognition method, apparatus and electronic device based on image quality preference.
The embodiment of the application provides a preferable sequential image target identification method based on image quality, which comprises the following steps:
Acquiring a queue of images of objects to be identified, which are acquired by a camera;
determining target IOU relations of all the object images to be identified in the queues of the object images to be identified based on a face target detection network and a human target detection network to obtain a queue of target IOU relations;
extracting a queue of target object images from the queue of object images to be identified based on the queue of target IOU relationships;
selecting a target object image with the optimal face quality from the queue of the target object images as a target object image to be detected;
performing face recognition on the target object image to be detected to obtain a recognition result, wherein the recognition result is a personnel identity tag;
and designating the personnel identity label in the identification result as the personnel identity label of the queue of the target object image.
For example, according to an embodiment of the present application, a method for identifying a sequential image object based on image quality preference, wherein determining, based on a face object detection network and a body object detection network, a target IOU relationship of each of a queue of object images to be identified to obtain a queue of target IOU relationships includes:
inputting the images of the objects to be identified into the human face target detection network and the human body target detection network respectively to obtain a human body boundary box and a human face boundary box;
calculating a target IOU relationship between the human body boundary box and the human face boundary box according to the following relationship calculation formula:
the intersection area is the area of the intersection between the human body boundary box and the human face boundary box, and the union area is the area of the union between the human body boundary box and the human face boundary box.
For example, according to an embodiment of the present application, a sequential image object recognition method based on image quality preference, wherein extracting a queue of target object images from the queue of object images to be recognized based on the queue of target IOU relations, includes:
in response to the target IOU relation being smaller than or equal to a preset threshold, eliminating the corresponding object image to be identified;
and in response to the target IOU relationship being greater than a preset threshold, incorporating the corresponding object image to be identified into a queue of the target object image.
For example, according to an embodiment of the present application, a method for identifying a sequential image target based on image quality preference, wherein selecting a target object image with optimal face quality from a queue of the target object images as a target object image to be detected includes: for each target object image in the queue of target object images:
Processing each target object image by using an LBP mode operator to obtain a target object image LBP characteristic vector;
processing each target object image by using the HOG feature descriptors to obtain target object HOG feature vectors;
inputting the HOG feature vector of the target object and the LBP feature vector of the target object into a dynamic interaction module under gating response to obtain a multi-mode statistical feature vector of the target object;
Inputting each target object image into an image feature extractor based on a cavity convolutional neural network model to obtain a target object image feature map;
Inputting the target object image feature map into a feature foreground mask salizer based on a convolution gating feedforward mechanism to obtain a foreground salient target object image feature map;
inputting the foreground significant target object image feature map and the target object multi-mode statistical feature vector into a MetaNet model-based cross-domain joint encoder to obtain a target object image fusion feature map under the assistance of multi-mode statistical features;
And inputting the multi-mode statistical feature assisted target object image fusion feature map into an image quality scoring device based on a decoder to obtain a scoring decoding value.
For example, according to an embodiment of the present application, a method for identifying a sequential image target based on image quality preference, wherein inputting the target object HOG feature vector and the target object image LBP feature vector into a dynamic interaction module under a gating response to obtain a target object multi-modal statistical feature vector includes:
inputting the HOG feature vector of the target object and the LBP feature vector of the target object into a feature combination module for cascade processing to obtain a multi-mode statistical information combination feature vector of the target object;
after matrix multiplication of the target object multi-mode statistical information joint feature vector and the parameter matrix is calculated, the obtained feature vector and the bias vector are added according to positions to obtain a linear transformation target object multi-mode statistical information joint feature vector;
Using Activating the linear transformation target object multi-mode statistical information combined feature vector by a function to obtain a target object multi-mode statistical information dynamic fusion response gating value;
calculating the position-based product between the HOG feature vector of the target object and the multi-mode statistical information dynamic fusion response gating value of the target object to obtain a HOG feature vector of the weight modulation target object;
After calculating a response gating value of the dynamic fusion of the multi-mode statistical feature information of the target object, multiplying the obtained weight value with the LBP feature vector of the target object image according to the position to obtain the LBP feature vector of the weight modulation target object image;
And carrying out position point-based on the weight modulation target object HOG feature vector and the weight modulation target object image LBP feature vector to obtain a target object multi-mode statistical feature vector.
For example, according to an embodiment of the present application, a method for identifying a sequential image target based on image quality preference, wherein inputting the target object image feature map into a feature foreground mask salizer based on a convolution-gated feed-forward mechanism to obtain a foreground salient target object image feature map includes:
carrying out layer normalization processing on the target object image feature map to obtain a normalized target object image feature map;
performing channel expansion based on point convolution and depth convolution coding based on a cavity convolution layer on the normalized target object image feature map to obtain a target object image depth convolution backup feature map and a target object image depth convolution original edition feature map;
inputting the target object image depth convolution original edition feature map into a foreground gating mask module based on Gelu functions to obtain a target object image depth convolution gating mask weight feature map;
Calculating the position-based point multiplication between the target object image depth convolution gating mask weight feature map and the target object image depth convolution backup feature map to obtain a target object image gating mask foreground salient feature map;
And performing channel contraction based on point convolution on the target object image gating mask foreground salient feature map to obtain the foreground salient target object image feature map.
For example, according to the image quality preference-based sequential image target recognition method of the embodiment of the present application, a target object image corresponding to the largest of the scored decoding values is determined as the target object image to be detected.
For example, according to an embodiment of the present application, a method for identifying a sequential image target based on image quality preference, wherein identifying the image of the target object to be detected to obtain an identification result includes:
Inputting the target object image to be detected into a AlexNet-based face feature extractor to obtain a face feature vector;
And inputting the face feature vector into a face recognition device based on a classifier to obtain the recognition result.
For example, the image quality preference-based sequential image target recognition method according to the embodiment of the present application further includes a training step of: the dynamic interaction module is used for training the dynamic interaction module under the gating response, the image feature extractor based on the cavity convolutional neural network model, the feature foreground mask salient based on the convolutional gating feedforward mechanism, the cross-domain joint encoder based on the MetaNet model and the image quality scoring device based on the decoder;
wherein the training step comprises:
acquiring training data, wherein the training data comprises a queue of training images of an object to be identified, which are acquired by a camera;
Determining target IOU relations of all training object images to be identified in the training object image queue based on the face target detection network and the human body target detection network to obtain a training target IOU relation queue;
extracting a queue of training target object images from the queue of training target object images to be identified based on the queue of training target IOU relations;
Processing each training target object image in the queue of training target object images by using the LBP mode operator to obtain a training target object image LBP feature vector;
Processing each training target object image by using the HOG feature descriptors to obtain training target object HOG feature vectors;
Inputting the HOG feature vector of the training target object and the LBP feature vector of the training target object image into a dynamic interaction module under the gating response to obtain a multi-mode statistical feature vector of the training target object;
inputting the images of the training target objects into the image feature extractor based on the cavity convolutional neural network model to obtain a training target object image feature map;
inputting the training target object image feature map into the feature foreground mask salizer based on the convolution gating feedforward mechanism to obtain a training foreground salient target object image feature map;
Inputting the training foreground significant target object image feature map and the training target object multi-mode statistical feature vector into the MetaNet model-based cross-domain joint encoder to obtain a training multi-mode statistical feature assisted target object image fusion feature map;
inputting the target object image fusion feature map under the assistance of the training multi-mode statistical features into the decoder-based image quality scoring device to obtain a decoding loss function value;
Calculating a preset loss function value of the target object image fusion feature map under the assistance of the training multi-mode statistical features to obtain a target object image fusion loss function value under the assistance of the multi-mode statistical features;
and taking the weighted sum of the decoding loss function value and the target object image fusion loss function value under the assistance of the multi-mode statistical feature as a loss function value, and training a dynamic interaction module under the gating response, the image feature extractor based on the cavity convolutional neural network model, the feature foreground mask saliency based on the convolutional gating feedforward mechanism, the cross-domain joint encoder based on the MetaNet model and the image quality scoring device based on the decoder.
The embodiment of the application also provides a sequential image target recognition device based on image quality optimization, which comprises:
the image queue acquisition module is used for acquiring a queue of images of the object to be identified, which are acquired by the camera;
The IOU relation determining module is used for determining the target IOU relation of each object image to be identified in the queue of the object images to be identified based on the face target detection network and the human body target detection network so as to obtain a queue of the target IOU relation;
A target object image queue extracting module, configured to extract a queue of target object images from the queue of object images to be identified based on the queue of target IOU relationships;
the optimization module is used for selecting a target object image with optimal face quality from the queue of the target object images as a target object image to be detected;
The face recognition module is used for recognizing the image of the target object to be detected to obtain a recognition result, wherein the recognition result is a personnel identity tag;
And the personnel identity label designating module is used for designating the personnel identity label in the identification result as the personnel identity label of the queue of the target object image.
The embodiment of the application also provides electronic equipment, which comprises:
A processor; and
A memory in which computer program instructions are stored which, when executed by the processor, cause the processor to perform the image quality preferred sequence image target recognition method of any preceding claim.
According to the image quality optimization-based sequential image target recognition method, device and electronic equipment, a more reliable target recognition scheme can be provided while computing resources are saved, the problems in traditional cross-domain personnel recognition and track tracking are effectively solved, the accuracy and efficiency of face recognition are improved, and a more reliable target identification scheme is provided for the fields of security monitoring and the like.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present application, the following description will briefly explain the drawings of the embodiments of the present application. It is apparent that the figures in the following description relate only to some embodiments of the application and are not limiting of the application.
Fig. 1 shows a schematic diagram of an application architecture of a sequential image object recognition method based on image quality preference in an embodiment of the present application;
FIG. 2 shows a flowchart of a preferred sequence image target recognition method based on image quality in an embodiment of the application;
fig. 3 shows a flowchart of sub-step S540 of the image quality preferred sequence image target recognition method in an embodiment of the application;
FIG. 4 is a schematic diagram showing the structure of a sequential image object recognition apparatus according to an embodiment of the present application, which is preferable based on image quality;
FIG. 5 shows an application scenario diagram of a sequential image object recognition method based on image quality preference in an embodiment of the present application; and
Fig. 6 shows a schematic diagram of a queue of four object images to be identified.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are also within the scope of the application.
The terms used in the present specification are general terms that are currently widely used in the art in view of functions of the present application, but may be changed according to the intention, precedent, or new technology in the art of the person of ordinary skill in the art. Furthermore, specific terms may be selected, and in this case, detailed meanings thereof will be described in the detailed description of the present application. Accordingly, the terms used in the specification should not be construed as simple names, but are based on meanings of the terms and general description of the present application.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Fig. 1 shows an application architecture diagram of a sequential image target recognition method based on image quality preference in an embodiment of the present application, including a server 100 and a terminal device 200.
The terminal device 200 and the server 100 may be connected to each other through the internet to realize communication therebetween. Optionally, the internet described above uses standard communication techniques and/or protocols. The internet is typically the internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks. In some embodiments, the data exchanged over the network is represented using techniques and/or formats including hypertext markup language (Hyper Text Markup Language, HTML), extensible markup language (Extensible Markup Language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The server 100 may provide various network services for the terminal device 200, wherein the server 100 may be a server, a server cluster formed by a plurality of servers, or a cloud computing center. In particular, the server 100 may include a processor 110 (Center Processing Unit, CPU), a memory 120, an input device 130, and an output device 140, etc., the input device 130 may include a keyboard, a mouse, a touch screen, etc., and the output device 140 may include a display device such as a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), a Cathode Ray Tube (CRT), etc.
The memory 120 may include Read Only Memory (ROM) and Random Access Memory (RAM) and provides the processor 110 with program instructions and data stored in the memory 120. In the embodiment of the present application, the memory 120 may be used to store a program of the sequential image object recognition method preferred based on image quality in the embodiment of the present application.
Processor 110 is operative to perform the steps of any of the preferred sequential image object recognition methods based on image quality of the embodiments of the present application in accordance with the obtained program instructions by calling the program instructions stored by memory 120.
In addition, the application architecture diagram in the embodiment of the present application is to more clearly illustrate the technical solution in the embodiment of the present application, and does not limit the technical solution provided by the embodiment of the present application, and certainly, for other application architectures and service applications, the technical solution provided by the embodiment of the present application is also applicable to similar problems.
The method for identifying a sequential image object based on image quality preference provided according to at least one embodiment of the present application is described below in a non-limiting manner by means of several examples or embodiments, and as described below, different features of these specific examples or embodiments may be combined with each other without contradiction, so as to obtain new examples or embodiments, which are also within the scope of protection of the present application.
In view of the above technical problems, in the technical solution of the present application, a method for identifying a sequential image target based on image quality optimization is provided, as shown in fig. 2, which includes: s510, acquiring a queue of images of the object to be identified, which are acquired by a camera; s520, determining the target IOU relation of each object image to be identified in the queue of the object images to be identified based on a human face target detection network and a human body target detection network so as to obtain a queue of target IOU relations; s530, extracting a queue of target object images from the queue of target object images to be identified based on the queue of target IOU relations; s540, selecting a target object image with optimal face quality from the queue of the target object images as a target object image to be detected; s550, carrying out face recognition on the target object image to be detected to obtain a recognition result, wherein the recognition result is a personnel identity tag; s560, designating the personnel identity label in the identification result as the personnel identity label of the queue of the target object image.
In step S520, determining, based on the face target detection network and the body target detection network, a target IOU relationship of each object image to be identified in the queue of the object images to be identified to obtain a queue of target IOU relationships, including: inputting the images of the objects to be identified into the human face target detection network and the human body target detection network respectively to obtain a human body boundary box and a human face boundary box; calculating a target IOU relationship between the human body boundary box and the human face boundary box according to the following relationship calculation formula:
the intersection area is the area of the intersection between the human body boundary box and the human face boundary box, and the union area is the area of the union between the human body boundary box and the human face boundary box.
Wherein, in step S530, extracting the queue of the target object image from the queue of the object image to be identified based on the queue of the target IOU relationship includes: in response to the target IOU relation being smaller than or equal to a preset threshold, eliminating the corresponding object image to be identified; and in response to the target IOU relationship being greater than a preset threshold, incorporating the corresponding object image to be identified into a queue of the target object image.
It should be understood that, in the technical solution of the present application, whether the objects selected by the human body bounding box and the human face bounding box belong to the same object may be confirmed based on the object IOU relationship. In this particular example, the target screening is performed based on a comparison between the target IOU relationship and a preset threshold. That is, in the above-mentioned image quality-based preferred sequence image target recognition method, the defect that in the conventional scheme, face recognition needs to be performed on each snap-shot face image can be avoided, target detection and human body detection are performed on each appearing target, a queue of target object images is extracted according to a target IOU relationship between the target detection and the human body detection, and then target object images with reliable face quality are screened from the queue of target object images to perform face recognition to match the identity id of a person. By means of the method, the image sequence of the target object can be constructed through the target IOU relation, the target object image with the optimal face quality is identified to carry out personnel identity recognition, face identity information verification can be completed only through one-time face recognition, and identity id information is matched for the sequence image. The method can save computing resources and provide a more reliable target identification scheme, effectively solve the problems in the traditional cross-domain personnel identification and track tracking, improve the accuracy and efficiency of face recognition, and provide a more reliable target identification scheme for the fields of security monitoring and the like.
Particularly, in the image quality optimization-based sequential image target recognition method, the step of selecting the target object image with the optimal face quality from the queue of the target object images is important, so that the face characteristics of the target object to be detected in the target object image can be ensured to be clearer, the face recognition algorithm is facilitated to extract key information more accurately, and the face recognition and the confirmation task of the personnel identity can be performed more accurately. That is, in cross-domain personnel identification and track tracking, selecting an optimal image for identification can reduce identification failure caused by poor image quality, and ensure continuity and integrity of personnel tracks.
Based on the above, the technical idea of the application is to process each target object image in the queue of target object images in an image processing and analyzing algorithm based on artificial intelligence and deep learning, so as to learn and capture multi-modal statistical features, depth feature information and facial semantics of each target object image, thereby utilizing the multi-modal statistical features to assist in optimizing expression of target object features, so as to score quality of different target object images, and determining the target object image corresponding to the maximum of the scoring decoding values as the target object image to be detected, so as to perform subsequent face recognition and personnel identity detection tasks.
Accordingly, as shown in fig. 3, selecting a target object image with the optimal face quality from the queue of target object images as a target object image to be detected includes: for each target object image in the queue of target object images: s541, processing each target object image by using an LBP mode operator to obtain a target object image LBP feature vector; s542, processing each target object image by using the HOG feature descriptors to obtain target object HOG feature vectors; s543, inputting the HOG feature vector of the target object and the LBP feature vector of the target object into a dynamic interaction module under gating response to obtain a multi-mode statistical feature vector of the target object; s544, inputting the target object images into an image feature extractor based on a cavity convolutional neural network model to obtain a target object image feature map; s545, inputting the target object image feature map into a feature foreground mask salizer based on a convolution gating feedforward mechanism to obtain a foreground salient target object image feature map; s546, inputting the foreground significant target object image feature map and the target object multi-mode statistical feature vector into a MetaNet model-based cross-domain joint encoder to obtain a target object image fusion feature map under the assistance of multi-mode statistical features; s547, inputting the multi-mode statistical feature assisted target object image fusion feature map into an image quality scoring device based on a decoder to obtain a scoring decoding value.
Specifically, the step of selecting the target object image with the optimal face quality from the queue of the target object images as the target object image to be detected is as follows: for each target object image in the queue of target object images, firstly, processing each target object image by using an LBP mode operator to obtain an LBP feature vector of the target object image; and processing each target object image by using the HOG feature descriptors to obtain target object HOG feature vectors. It should be understood that the LBP mode operator can capture texture feature information in an image, and has better characterization capability for targets with obvious texture features such as faces. Meanwhile, the LBP characteristics extracted by the LBP mode operator have invariance to the rotation of the image, and even if the face rotates in the image, the LBP characteristics still remain stable. The HOG feature descriptors can effectively capture edge and shape feature information in the image, and have good characterization capability for features such as face contours and the like. Moreover, the HOG features extracted by the HOG feature descriptors have scale invariance to a certain extent, and can adapt to target objects and faces under different scales.
Then, consider that since the target object HOG feature vector mainly describes edge and shape feature information about the target object face in the image, the target object image LBP feature vector focuses more on texture feature information about the target object face in the image. The two features provide different aspects and types of features of the face semantics of the target object, and have implicit relevance and interaction information. Therefore, in order to effectively combine the two types of semantic features of faces in the images so as to comprehensively consider the features of different aspects of the target object image and improve the diversity and the characterization capability of the features, in the technical scheme of the application, the HOG feature vector of the target object and the LBP feature vector of the target object image are further input into a dynamic interaction module of the feature vector under the gating response to obtain the multi-mode statistical feature vector of the target object. Through the processing of the dynamic interaction module of the feature vector under the gating response, the correlation relation and interaction influence between the HOG feature vector of the target object and the LBP feature vector of the target object can be learned and captured, so that interaction supplementation is carried out by utilizing the implicit correlation semantics between the two feature information, and the importance and contribution degree of different types of image semantic features to the subsequent image quality assessment task are identified. In this way, in the process of merging the HOG feature vector of the target object and the LBP feature vector of the target object to assist in carrying out the subsequent image quality scoring task, the adaptive dynamic weighted fusion of the two types of image features is realized by using a gating response mechanism, so that the model can learn the multi-mode statistical key features of the target object image, the feature expression capability is improved, and a more comprehensive and accurate data basis is provided for the subsequent processing and image quality scoring task of the target object image.
Accordingly, in step S543, the target object HOG feature vector and the target object image LBP feature vector are input to a dynamic interaction module under a gating response to obtain a target object multi-modal statistical feature vector, which includes: inputting the HOG feature vector of the target object and the LBP feature vector of the target object into a feature combination module for cascade processing to obtain a multi-mode statistical information combination feature vector of the target object; after matrix multiplication of the target object multi-mode statistical information joint feature vector and the parameter matrix is calculated, the obtained feature vector and the bias vector are added according to positions to obtain a linear transformation target object multi-mode statistical information joint feature vector; usingActivating the linear transformation target object multi-mode statistical information combined feature vector by a function to obtain a target object multi-mode statistical information dynamic fusion response gating value; calculating the position-based product between the HOG feature vector of the target object and the multi-mode statistical information dynamic fusion response gating value of the target object to obtain a HOG feature vector of the weight modulation target object; after calculating a response gating value of the dynamic fusion of the multi-mode statistical feature information of the target object, multiplying the obtained weight value with the LBP feature vector of the target object image according to the position to obtain the LBP feature vector of the weight modulation target object image; and carrying out position point-based on the weight modulation target object HOG feature vector and the weight modulation target object image LBP feature vector to obtain a target object multi-mode statistical feature vector.
In a specific example, inputting the target object HOG feature vector and the target object image LBP feature vector into a dynamic interaction module under a gating response to obtain a target object multi-modal statistical feature vector, including: inputting the HOG feature vector of the target object and the LBP feature vector of the target object into a dynamic interaction module of the feature vector under the gating response to process according to the following dynamic interaction formula so as to obtain the multi-mode statistical feature vector of the target object; wherein, the dynamic interaction formula is:
Wherein, AndRespectively the target object HOG feature vector and the target object image LBP feature vector,A vector concatenation operation is represented and is performed,Is a matrix of parameters that are selected from the group consisting of,Is the offset vector of the reference signal,Is a sigmoid function of the number of bits,Is a target object multi-modal statistics dynamic fusion response gating value,Is the multi-modal statistical feature vector of the target object.
Further, after multi-mode statistical features of the target object are extracted, in order to understand the semantics of the face of the target object contained in each target object image more deeply and comprehensively, so that the image quality is better scored to screen out the image with the optimal quality for personnel identity recognition.
It should be appreciated that the target object image feature map contains manifold semantic and feature information in the image that can affect subsequent assessment of image quality due to background interference and redundant information in the image that is not related to the target object face semantic. Based on the above, in order to better capture important features and structural information in the target object image, in the technical scheme of the application, the target object image feature map is further input into a feature foreground mask salizer based on a convolution gating feedforward mechanism to obtain a foreground salient target object image feature map. The characteristic foreground mask saliency device based on the convolution gating feedforward mechanism can highlight a target object in an image, and the saliency of the target object is enhanced by inhibiting background noise and interference, so that the characteristic foreground mask saliency device has an important role in screening of follow-up face quality optimized images and cross-domain personnel identity recognition.
Specifically, the feature foreground mask salizer based on the convolution gating feedforward mechanism firstly performs layer normalization processing on the target object image feature map so as to eliminate scale differences between different layers. Then, the normalized target object image feature map enhances the channel expression capability of the feature map by performing channel expansion through point convolution, and simultaneously, a hole convolution layer is introduced to realize depth convolution coding by adjusting the coverage of a convolution kernel, so that rich space and depth information are provided for a subsequent gating mechanism. In the core step, the depth convolution original edition feature map of the target object image is sent to a foreground gating mask module, the foreground mask weight is dynamically generated based on a specific function, the self-adaptive selection and reinforcement of the foreground feature of the target object image are realized, the step is the key for realizing the foreground information saliency, and the model can focus on the foreground region of the target object image through a gating mechanism and inhibit background noise. Then, by calculating the point-by-point multiplication between the target object image depth convolution gating mask weight feature map and the target object image depth convolution backup feature map, the gating mask front Jing Tuxian of the target object image feature map is realized, and the operation effectively combines the foreground mask weight and the target object image feature map to generate the feature map with highlighted foreground information. Finally, the target object image gating mask foreground salient feature map is subjected to channel contraction through point convolution, so that final feature modulation is completed, and a foreground salient target object image feature map is generated. The method not only improves the expression capability of the image characteristics of the target object, but also enhances the recognition and processing capability of the model to key foreground information, namely the human face part, in the image, and particularly can remarkably improve the performance of the deep learning model when processing data containing complex human face foreground and background.
Accordingly, in step S545, the target object image feature map is input to a feature foreground mask salizer based on a convolution-gated feed-forward mechanism to obtain a foreground salient target object image feature map, including: carrying out layer normalization processing on the target object image feature map to obtain a normalized target object image feature map; performing channel expansion based on point convolution and depth convolution coding based on a cavity convolution layer on the normalized target object image feature map to obtain a target object image depth convolution backup feature map and a target object image depth convolution original edition feature map; inputting the target object image depth convolution original edition feature map into a foreground gating mask module based on Gelu functions to obtain a target object image depth convolution gating mask weight feature map; calculating the position-based point multiplication between the target object image depth convolution gating mask weight feature map and the target object image depth convolution backup feature map to obtain a target object image gating mask foreground salient feature map; and performing channel contraction based on point convolution on the target object image gating mask foreground salient feature map to obtain the foreground salient target object image feature map.
In one specific example, inputting the target object image feature map into a feature foreground mask salizer based on a convolution-gated feed-forward mechanism to obtain a foreground salient target object image feature map includes: inputting the target object image feature map into the feature foreground mask saliency device based on the convolution gating feedforward mechanism, and processing the feature foreground mask saliency device by using the following foreground mask saliency formula to obtain the foreground salient target object image feature map; the foreground mask significantly enhancing formula is as follows:
Wherein, For the target object image feature map,Representing a layer normalization operation on the feature map,To normalize the target object image feature map,In order to perform the point convolution operation,For convolution kernel asIs used for the operation of the hole convolution of (1),The feature map is backed up for a target object image depth convolution,The master feature map is depth convolved for the target object image,Is thatThe function is activated and the function is activated,In order to implement the masking process,A mask weight feature map is depth convolved for the target object image,For each position feature value in the feature map,For the predetermined super-parameter(s),In order to multiply by the point of the position,The mask foreground highlighting feature map is gated for the target object image,And (5) an image feature map of the foreground significant target object.
In order to understand the semantics of the target object image more deeply and accurately and identify which images have better face quality, the foreground salient target object image feature image and the target object multi-mode statistical feature vector are further input into a MetaNet model-based cross-domain joint encoder to obtain a target object image fusion feature image under the assistance of the multi-mode statistical feature. The processing of the MetaNet model-based cross-domain joint encoder can learn shared feature representation among image features of different modes, and is beneficial to realizing effective fusion among different features by learning the cross-domain feature representation, specifically, the statistical features of the target object image are utilized to assist in optimizing the expression of the semantic features of the template object image, so that the semantic representation capability of the target object image is improved, and the subsequent image quality detection task and target identity recognition task are facilitated.
And then, inputting the target object image fusion feature map under the assistance of the multi-mode statistical features into an image quality scoring device based on a decoder to obtain a scoring decoding value. That is, the optimization characterization information of the semantic features of the target object image under the assistance of the statistical features is utilized to perform decoding regression, so that the quality of the image is evaluated to obtain a scoring decoding value. And further, the target object image corresponding to the maximum evaluation code value is determined as the target object image to be detected, so that the subsequent effective face recognition and personnel identity detection tasks are facilitated. Therefore, in cross-domain personnel identification and track tracking, the optimal image can be selected for identification, so that identification failure caused by poor image quality can be reduced, and continuity and integrity of personnel tracks are ensured.
Wherein, in step S547, the target object image corresponding to the largest one of the evaluation resolution code values is determined as the target object image to be detected.
Further, in step S550, performing face recognition on the target object image to be detected to obtain a recognition result, including: inputting the target object image to be detected into a AlexNet-based face feature extractor to obtain a face feature vector; and inputting the face feature vector into a face recognition device based on a classifier to obtain the recognition result.
Further, in the technical scheme of the present application, the image quality optimization-based sequential image target recognition method further includes a training step: the dynamic interaction module is used for training the dynamic interaction module under the gating response, the image feature extractor based on the cavity convolutional neural network model, the feature foreground mask salient based on the convolutional gating feedforward mechanism, the cross-domain joint encoder based on the MetaNet model and the image quality scoring device based on the decoder.
Wherein the training step comprises: acquiring training data, wherein the training data comprises a queue of training images of an object to be identified, which are acquired by a camera; determining target IOU relations of all training object images to be identified in the training object image queue based on the face target detection network and the human body target detection network to obtain a training target IOU relation queue; extracting a queue of training target object images from the queue of training target object images to be identified based on the queue of training target IOU relations; processing each training target object image in the queue of training target object images by using the LBP mode operator to obtain a training target object image LBP feature vector; processing each training target object image by using the HOG feature descriptors to obtain training target object HOG feature vectors; inputting the HOG feature vector of the training target object and the LBP feature vector of the training target object image into a dynamic interaction module under the gating response to obtain a multi-mode statistical feature vector of the training target object; inputting the images of the training target objects into the image feature extractor based on the cavity convolutional neural network model to obtain a training target object image feature map; inputting the training target object image feature map into the feature foreground mask salizer based on the convolution gating feedforward mechanism to obtain a training foreground salient target object image feature map; inputting the training foreground significant target object image feature map and the training target object multi-mode statistical feature vector into the MetaNet model-based cross-domain joint encoder to obtain a training multi-mode statistical feature assisted target object image fusion feature map; inputting the target object image fusion feature map under the assistance of the training multi-mode statistical features into the decoder-based image quality scoring device to obtain a decoding loss function value; calculating a preset loss function value of the target object image fusion feature map under the assistance of the training multi-mode statistical features to obtain a target object image fusion loss function value under the assistance of the multi-mode statistical features; and taking the weighted sum of the decoding loss function value and the target object image fusion loss function value under the assistance of the multi-mode statistical feature as a loss function value, and training a dynamic interaction module under the gating response, the image feature extractor based on the cavity convolutional neural network model, the feature foreground mask saliency based on the convolutional gating feedforward mechanism, the cross-domain joint encoder based on the MetaNet model and the image quality scoring device based on the decoder.
In a preferred example, the training target object multi-modal statistical feature vector is used to represent a gating mechanism based dynamic interaction feature representation of LBP features and HOG features of the training target object image. The training foreground salient object image feature graph represents foreground salient enhancement features of image semantic features determined by cavity convolution coding of the training object image. When the training foreground salient target object image feature map and the training target object multi-mode statistical feature vector are input into a MetaNet model-based cross-domain joint encoder, the training target object multi-mode statistical feature vector is used as an auxiliary mode to restrict feature expression of the training foreground salient target object image feature map along a channel dimension, so that the training multi-mode statistical feature assisted target object image fusion feature map has multi-mode channel hybrid feature expression richness, and meanwhile, the training multi-mode statistical feature assisted target object image fusion feature map has complex semantic features, so that decoding regression identification is difficult, and decoding training efficiency is affected.
Accordingly, the applicant of the present application further introduces a decoding loss function value, such as a predetermined loss function value other than the difference loss function between the true prompt result and the predicted prompt result, in the model training process, that is, trains the model by gradient back propagation based on the loss function value.
Specifically, calculating a predetermined loss function value of the target object image fusion feature map under the assistance of the training multi-mode statistical feature to obtain a target object image fusion loss function value under the assistance of the multi-mode statistical feature, including the following steps: expanding the target object image fusion feature map under the assistance of the training multi-mode statistical features into a target object image fusion feature vector under the assistance of the training multi-mode statistical features; calculating a first multi-modal statistical feature-assisted target object image fusion weight matrix and a second multi-modal statistical feature-assisted target object image fusion weight matrix based on the training multi-modal statistical feature-assisted target object image fusion feature vector, wherein the first multi-modal statistical feature-assisted target object image fusion weight matrix and the second multi-modal statistical feature-assisted target object image fusion weight matrix are the first multi-modal statistical feature-assisted target object image fusion weight matrixThe feature values of the positions are respectively the first feature vector of the target object image fusion under the assistance of the training multi-mode statistical featuresEigenvalue sum of firstOne half of the mean and difference absolute values of the eigenvalues; multiplying the target object image fusion feature vector under the assistance of the training multi-mode statistical feature with the target object image fusion weight matrix under the assistance of the first multi-mode statistical feature and the target object image fusion weight matrix under the assistance of the second multi-mode statistical feature respectively to obtain a target object image fusion intermediate vector under the assistance of the first multi-mode statistical feature and a target object image fusion intermediate vector under the assistance of the second multi-mode statistical feature; calculating the vector inner product of the first multi-modal statistical feature assisted target object image fusion intermediate vector and the second multi-modal statistical feature assisted target object image fusion intermediate vector to obtain a first multi-modal statistical feature assisted target object image fusion loss term; matrix multiplication is carried out on the target object image fusion weight matrix under the assistance of the first multi-modal statistical features and the target object image fusion weight matrix under the assistance of the second multi-modal statistical features, and a result matrix is calculatedThe norm is used for obtaining a target object image fusion loss term under the assistance of the second multi-modal statistical characteristic; and subtracting the product of the preset weight super parameter and the target object image fusion loss item under the assistance of the second multi-modal statistical feature from the target object image fusion loss item under the assistance of the first multi-modal statistical feature to obtain a target object image fusion loss function value under the assistance of the multi-modal statistical feature.
Then, model parameters can be optimized by gradient back propagation based on a weighted sum of the target object image fusion loss function value and the decoding loss function value under the assistance of the multi-modal statistical features.
The process of obtaining the fusion loss function value of the target object image under the assistance of the multi-mode statistical feature can be specifically expressed as the following loss calculation formula:
Wherein, Fusing feature vectors for the target object image under the assistance of the training multi-mode statistical features,AndRespectively fusing the target object image under the assistance of the first multi-modal statistical features and the target object image under the assistance of the second multi-modal statistical features,AndThe first multi-modal statistical feature assisted target object image fusion weight matrix and the second multi-modal statistical feature assisted target object image fusion weight matrix are respectively obtainedThe characteristic value of the location is used to determine,AndRespectively fusing feature vectors for the target object images under the assistance of the training multi-mode statistical featuresEigenvalue sum of firstThe characteristic value of the characteristic value is calculated,For the matrix multiplication to be performed,To calculate a matrixThe norm of the sample is calculated,For the predetermined weight to exceed the parameters,And fusing the loss function value for the target object image under the assistance of the multi-mode statistical characteristics.
In other words, in the above preferred example, the multi-modal statistical feature assisted target object image fusion loss function value performs, through the short-range and long-range cross-scale detail linked structural feature representation of the training multi-modal statistical feature assisted target object image fusion feature map, a query composition of a detail inner product space in the training multi-modal statistical feature assisted target object image fusion feature map to approximate a low rank independent observable composition of a link detail composition provided by structural detail interaction of the training multi-modal statistical feature assisted target object image fusion feature map, so that by training with the multi-modal statistical feature assisted target object image fusion loss function value, a detail group decomposition is performed on the basis of detail complexity through the distributed detail group of the training multi-modal statistical feature assisted target object image fusion feature map, so as to promote decoding regression decomposition recognition of a complex feature structure of the training multi-modal statistical feature assisted target object image fusion feature map, and improve decoding training efficiency. Therefore, the scoring of the image quality can be more effectively carried out to select the target object image with the optimal face quality so as to carry out the subsequent face recognition and personnel identity detection tasks.
Further, based on the above embodiment, referring to fig. 4, a schematic structural diagram of a sequential image object recognition device 800 based on image quality preference in an embodiment of the present application is shown. The image quality preference-based sequential image object recognition apparatus 800 includes: an image queue obtaining module 810, configured to obtain a queue of images of objects to be identified acquired by the camera; the IOU relationship determining module 820 is configured to determine, based on the face target detection network and the body target detection network, a target IOU relationship of each object image to be identified in the queue of object images to obtain a queue of target IOU relationships; a target object image queue extracting module 830, configured to extract a queue of target object images from the queue of object images to be identified based on the queue of target IOU relationships; a optimizing module 840, configured to select, from the queue of target object images, a target object image with the optimal face quality as a target object image to be detected; the face recognition module 850 is configured to perform face recognition on the image of the target object to be detected to obtain a recognition result, where the recognition result is a personnel identity tag; and the personnel identity label designating module 860 is configured to designate the personnel identity label in the identification result as the personnel identity label of the queue of the target object image.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective modules in the above-described image quality-based preferred sequence image target recognition apparatus 800 have been described in detail in the above description of the image quality-based preferred sequence image target recognition method with reference to fig. 2 to 3, and thus, repetitive descriptions thereof will be omitted.
Fig. 5 is an application scenario diagram of a sequential image object recognition method based on image quality preference according to an embodiment of the present application. As shown in fig. 5, in this application scenario, first, a queue of object images to be identified (for example, D illustrated in fig. 5) acquired by a camera is acquired, and then, the queue of object images to be identified is input into a server (for example, S illustrated in fig. 5) in which a sequential image object recognition algorithm based on image quality preference is deployed, wherein the server is capable of processing the queue of object images to be identified using the sequential image object recognition algorithm based on image quality preference to determine a person identity tag of the queue of object images to be identified.
Based on the foregoing embodiment, there is also provided in an embodiment of the present application an electronic device of another exemplary embodiment, including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the sequential image object recognition method of any preceding claim based on image quality preference.
For example, taking an electronic device as an example of the server 100 in fig. 1 of the present application, a processor in the electronic device is the processor 110 in the server 100, and a memory in the electronic device is the memory 120 in the server 100.
Further, in one embodiment of the present application, there is also provided a sequential image object recognition method based on image quality preference. The image quality optimization-based sequential image target identification method comprises the following steps: and step 1, storing a queue to be identified. And capturing human body/face information through a lens, storing the human body/face information as a target sequence according to the iou relation of the target, and distributing pseudo tags. And 2, face quality is optimized. And selecting a picture with higher quality as face recognition input, and improving the reliability of a face recognition result. And 3, face recognition. And extracting face features of the clear front face, comparing the faces in a personnel base, and identifying personnel identity information. And 4, matching the human body id. Matching the identified face with a human body, performing iou calculation on a face frame and a human body frame, and regarding the face and the human body as the same person id when the threshold value is exceeded.
Accordingly, the method has the following beneficial effects: 1. the target queue can complete the face identity information verification of all targets only by carrying out face recognition once. The pseudo tag method stores the targets of the same identity in the same queue, so that all identity id information can be matched for the queue only by carrying out face recognition once. Even if the face information is lost, the identity can be confirmed in the form of a queue pseudo tag.
Specifically, as shown in fig. 6, four queues Q1 to Q4 are first provided, each of which stores a plurality of images therein. And when the first face of Q3 and the last face of Q4 meet the face recognition requirement, Q4 is directly allocated with id, and the human feature vectors of Q3 and Q4 are matched after Q3 is allocated with id. Wherein the score is lower, the Q2 ranking priority is higher, and Q1 is later, because the reordering relationship analyzes the body posture orientation, Q1 back. Q3 and Q2 are successfully matched and then Q1 is successfully matched, so that the face id of Q3 is distributed to Q2 and Q1. And Q4 does not match the upper body characteristic information, so the face id does not need to be assigned to a cross-domain event.
More specifically, in step 1: the multi-view camera performs human shape/face event snapshot to form a cross-domain time sequence queue to be identified. Qi is taken as all events at the moment T of a single visual angle, wherein the events comprise the snapshot of the face and the human shape. And (3) performing time sequence queue allocation on the snap face/humanoid event by calculating whether the iou of the context target detection frame exceeds a threshold value, and simultaneously allocating a pseudo tag id to each q. For the face and the human-shaped event at the same moment, when the face area is positioned in the human body area, the two events are regarded as the same target.
In step 2: and filtering the low-quality pictures of the recorded face events, wherein the pictures which do not meet the face recognition requirements are not sent to the recognition module, so that the computing resources are saved, such as the conditions of blurring, serious shielding and the like. N faces which are front, clear and free of shielding are screened out to wait for recognition.
In step 3: and sending the pictures meeting the requirements to a recognition module, selecting res50 in the face recognition module as a arcface method of a backbone network to extract unique feature vectors of the faces, using the unique feature vectors as query contents, carrying out identity matching with a personnel information base which is input in advance, calculating the similarity between the feature vectors, searching the maximum score, and judging whether the maximum score exceeds a threshold value set by implementation to acquire matched identity information. And ensuring the elimination of accidental errors by taking the most recognition results as final results in the face time sequence event to be recognized. And after the face events in the face sequence are successfully identified, id can be allocated to all the events in the face sequence. In step 1, the human faces and the human bodies are paired in advance, so that when the human faces acquire the human information, the corresponding human body sequences are also allocated with the same human id information.
Those skilled in the art will appreciate that various modifications and improvements of the present disclosure may occur. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.
Furthermore, while the present application makes various references to certain elements in a system according to an embodiment of the present application, any number of different elements may be used and run on a client and/or server. The units are merely illustrative and different aspects of the systems and methods may use different units.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a program that instructs associated hardware, and the program may be stored on a computer readable storage medium such as a read-only memory, a magnetic or optical disk, etc. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present application is not limited to any specific form of combination of hardware and software.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present application and is not to be construed as limiting thereof. Although exemplary embodiments of the present application have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this application.

Claims (11)

1.一种基于图像质量优选的序列图像目标识别方法,其特征在于,包括:1. A method for sequential image target recognition based on image quality optimization, characterized by comprising: 获取由摄像头采集的待识别对象图像的队列;Obtain a queue of images of objects to be identified captured by a camera; 基于人脸目标检测网络和人体目标检测网络,确定所述待识别对象图像的队列中各个待识别对象图像的目标IOU关系以得到目标IOU关系的队列;Based on the face target detection network and the human target detection network, determine the target IOU relationship of each object image to be identified in the queue of the object images to be identified to obtain a queue of target IOU relationships; 基于所述目标IOU关系的队列,从所述待识别对象图像的队列提取目标对象图像的队列;Based on the queue of target IOU relationships, extracting a queue of target object images from the queue of object images to be identified; 从所述目标对象图像的队列中挑选人脸质量最优的目标对象图像作为待检测目标对象图像;Selecting a target object image with the best face quality from the queue of target object images as the target object image to be detected; 对所述待检测目标对象图像进行人脸识别以得到识别结果,所述识别结果为人员身份标签;Performing face recognition on the target object image to be detected to obtain a recognition result, wherein the recognition result is a person identity label; 将所述识别结果中的人员身份标签指定为所述目标对象图像的队列的人员身份标签。The person identity tag in the recognition result is designated as the person identity tag of the queue of the target object image. 2.根据权利要求1所述的基于图像质量优选的序列图像目标识别方法,其特征在于,基于人脸目标检测网络和人体目标检测网络,确定所述待识别对象图像的队列中各个待识别对象图像的目标IOU关系以得到目标IOU关系的队列,包括:2. The method for sequential image target recognition based on image quality optimization according to claim 1, characterized in that, based on a face target detection network and a human target detection network, determining the target IOU relationship of each object image to be recognized in the queue of the object images to be recognized to obtain a queue of target IOU relationships comprises: 将所述各个待识别对象图像分别输入所述人脸目标检测网络和所述人体目标检测网络以得到人体边界框和人脸边界框;Inputting the images of the objects to be identified into the face target detection network and the human target detection network respectively to obtain a human body bounding box and a face bounding box; 以如下关系计算公式来计算所述人体边界框和所述人脸边界框之间的目标IOU关系,其中,所述关系计算公式为:The target IOU relationship between the human body bounding box and the face bounding box is calculated using the following relationship calculation formula, wherein the relationship calculation formula is: ; 其中,所述交集面积为所述人体边界框和所述人脸边界框之间的交集的面积,所述并集面积为所述人体边界框和所述人脸边界框之间的并集的面积。The intersection area is the area of the intersection between the human body bounding box and the face bounding box, and the union area is the area of the union between the human body bounding box and the face bounding box. 3.根据权利要求2所述的基于图像质量优选的序列图像目标识别方法,其特征在于,基于所述目标IOU关系的队列,从所述待识别对象图像的队列提取目标对象图像的队列,包括:3. The method for sequential image object recognition based on image quality optimization according to claim 2, characterized in that extracting a queue of target object images from the queue of the to-be-recognized object images based on the queue of the target IOU relationship comprises: 响应于所述目标IOU关系小于等于预设阈值,剔除对应的待识别对象图像;In response to the target IOU relationship being less than or equal to a preset threshold, the corresponding image of the object to be identified is eliminated; 响应于所述目标IOU关系大于预设阈值,纳入对应的待识别对象图像至所述目标对象图像的队列。In response to the target IOU relationship being greater than a preset threshold, the corresponding to-be-recognized object image is included in the queue of the target object image. 4.根据权利要求3所述的基于图像质量优选的序列图像目标识别方法,其特征在于,从所述目标对象图像的队列中挑选人脸质量最优的目标对象图像作为待检测目标对象图像,包括:针对于所述目标对象图像的队列中的各个目标对象图像:4. The method for sequential image target recognition based on image quality optimization according to claim 3 is characterized in that the target object image with the best face quality is selected from the queue of target object images as the target object image to be detected, comprising: for each target object image in the queue of target object images: 使用LBP模式算子对所述各个目标对象图像进行处理以得到目标对象图像LBP特征向量;Using an LBP mode operator to process each target object image to obtain a target object image LBP feature vector; 使用HOG特征描述子对所述各个目标对象图像进行处理以得到目标对象HOG特征向量;Using the HOG feature descriptor to process each target object image to obtain a target object HOG feature vector; 将所述目标对象HOG特征向量和所述目标对象图像LBP特征向量输入门控响应下的动态交互模块以得到目标对象多模态统计特征向量;Inputting the target object HOG feature vector and the target object image LBP feature vector into a dynamic interaction module under a gated response to obtain a multimodal statistical feature vector of the target object; 将所述各个目标对象图像输入基于空洞卷积神经网络模型的图像特征提取器以得到目标对象图像特征图;Inputting each target object image into an image feature extractor based on a dilated convolutional neural network model to obtain a target object image feature map; 将所述目标对象图像特征图输入基于卷积门控前馈机制的特征前景掩码显著器以得到前景显著目标对象图像特征图;Inputting the target object image feature map into a feature foreground mask salient device based on a convolutional gated feed-forward mechanism to obtain a foreground salient target object image feature map; 将所述前景显著目标对象图像特征图和所述目标对象多模态统计特征向量输入基于MetaNet模型的跨域联合编码器以得到多模态统计特征辅助下目标对象图像融合特征图;Inputting the foreground salient target object image feature map and the target object multimodal statistical feature vector into a cross-domain joint encoder based on a MetaNet model to obtain a target object image fusion feature map assisted by multimodal statistical features; 将所述多模态统计特征辅助下目标对象图像融合特征图输入基于解码器的图像质量评分器以得到评分解码值。The target object image fusion feature map assisted by the multimodal statistical features is input into a decoder-based image quality scorer to obtain a score decoding value. 5.根据权利要求4所述的基于图像质量优选的序列图像目标识别方法,其特征在于,将所述目标对象HOG特征向量和所述目标对象图像LBP特征向量输入门控响应下的动态交互模块以得到目标对象多模态统计特征向量,包括:5. The method for sequential image target recognition based on image quality optimization according to claim 4, characterized in that the target object HOG feature vector and the target object image LBP feature vector are input into a dynamic interaction module under a gated response to obtain a target object multimodal statistical feature vector, comprising: 将所述目标对象HOG特征向量和所述目标对象图像LBP特征向量输入特征联合模块中进行级联处理以得到目标对象多模态统计信息联合特征向量;Inputting the target object HOG feature vector and the target object image LBP feature vector into a feature combination module for cascade processing to obtain a target object multimodal statistical information combination feature vector; 计算所述目标对象多模态统计信息联合特征向量和参数矩阵的矩阵乘法后,再将得到的特征向量与偏置向量进行按位置相加以得到线性变换目标对象多模态统计信息联合特征向量;After calculating the matrix multiplication of the joint eigenvector of the multimodal statistical information of the target object and the parameter matrix, the obtained eigenvector is added to the bias vector by position to obtain the joint eigenvector of the multimodal statistical information of the linearly transformed target object; 使用函数对所述线性变换目标对象多模态统计信息联合特征向量进行激活以得到目标对象多模态统计信息动态融合响应门控值;use The function activates the linear transformation target object multimodal statistical information joint feature vector to obtain the target object multimodal statistical information dynamic fusion response gating value; 计算所述目标对象HOG特征向量与所述目标对象多模态统计信息动态融合响应门控值之间的按位置乘积以得到权重调制目标对象HOG特征向量;Calculating the positional product between the target object HOG feature vector and the target object multimodal statistical information dynamic fusion response gate value to obtain a weighted modulated target object HOG feature vector; 计算一减去所述目标对象多模态统计特征信息动态融合响应门控值后,将得到的权重值与所述目标对象图像LBP特征向量进行按位置相乘以得到权重调制目标对象图像LBP特征向量;After calculating a dynamic fusion response gate value of the multimodal statistical feature information of the target object minus the dynamic fusion response gate value of the multimodal statistical feature information of the target object, the obtained weight value is multiplied by the LBP feature vector of the target object image according to the position to obtain a weight-modulated LBP feature vector of the target object image; 将所述权重调制目标对象HOG特征向量和所述权重调制目标对象图像LBP特征向量进行按位置点加以得到目标对象多模态统计特征向量。The weight-modulated target object HOG feature vector and the weight-modulated target object image LBP feature vector are processed according to position points to obtain a target object multimodal statistical feature vector. 6.根据权利要求5所述的基于图像质量优选的序列图像目标识别方法,其特征在于,将所述目标对象图像特征图输入基于卷积门控前馈机制的特征前景掩码显著器以得到前景显著目标对象图像特征图,包括:6. The method for sequential image target recognition based on image quality optimization according to claim 5, characterized in that the target object image feature map is input into a feature foreground mask salient device based on a convolutional gated feedforward mechanism to obtain a foreground salient target object image feature map, comprising: 对所述目标对象图像特征图进行层归一化处理以得到归一化目标对象图像特征图;Performing layer normalization processing on the target object image feature map to obtain a normalized target object image feature map; 对所述归一化目标对象图像特征图进行基于点卷积的通道扩展和基于空洞卷积层的深度卷积编码以得到目标对象图像深度卷积备份特征图和目标对象图像深度卷积原版特征图;Performing point convolution-based channel expansion and hole convolution-layer-based deep convolution coding on the normalized target object image feature map to obtain a target object image deep convolution backup feature map and a target object image deep convolution original feature map; 将所述目标对象图像深度卷积原版特征图输入基于Gelu函数的前景门控掩码模块以得到目标对象图像深度卷积门控掩码权重特征图;Inputting the target object image deep convolution original feature map into a foreground gated mask module based on the Gelu function to obtain a target object image deep convolution gated mask weight feature map; 计算所述目标对象图像深度卷积门控掩码权重特征图和所述目标对象图像深度卷积备份特征图之间的按位置点乘以得到目标对象图像门控掩码前景凸显特征图;Calculate the position point multiplication between the target object image deep convolution gated mask weight feature map and the target object image deep convolution backup feature map to obtain the target object image gated mask foreground highlight feature map; 对所述目标对象图像门控掩码前景凸显特征图进行基于点卷积的通道收缩以得到所述前景显著目标对象图像特征图。The target object image gated mask foreground salient feature map is subjected to channel shrinkage based on point convolution to obtain the foreground salient target object image feature map. 7.根据权利要求6所述的基于图像质量优选的序列图像目标识别方法,其特征在于,将所述评分解码值的最大者对应的目标对象图像确定为所述待检测目标对象图像。7. The method for sequential image target recognition based on image quality optimization according to claim 6 is characterized in that the target object image corresponding to the largest score decoding value is determined as the target object image to be detected. 8.根据权利要求7所述的基于图像质量优选的序列图像目标识别方法,其特征在于,对所述待检测目标对象图像进行人脸识别以得到识别结果,包括:8. The method for sequential image target recognition based on image quality optimization according to claim 7, characterized in that performing face recognition on the target object image to be detected to obtain a recognition result comprises: 将所述待检测目标对象图像输入基于AlexNet的人脸特征提取器以得到人脸特征向量;Inputting the target object image to be detected into a face feature extractor based on AlexNet to obtain a face feature vector; 将所述人脸特征向量输入基于分类器的人脸识别器以得到所述识别结果。The facial feature vector is input into a classifier-based face recognizer to obtain the recognition result. 9.根据权利要求8所述的基于图像质量优选的序列图像目标识别方法,其特征在于,还包括训练步骤:用于对所述门控响应下的动态交互模块、所述基于空洞卷积神经网络模型的图像特征提取器、所述基于卷积门控前馈机制的特征前景掩码显著器、所述基于MetaNet模型的跨域联合编码器和所述基于解码器的图像质量评分器进行训练;9. The method for sequential image object recognition based on image quality optimization according to claim 8, characterized in that it also includes a training step: for training the dynamic interaction module under the gated response, the image feature extractor based on the hole convolutional neural network model, the feature foreground mask salient device based on the convolutional gated feedforward mechanism, the cross-domain joint encoder based on the MetaNet model and the image quality scorer based on the decoder; 其中,所述训练步骤,包括:Wherein, the training step includes: 获取训练数据,所述训练数据包括由摄像头采集的训练待识别对象图像的队列;Acquire training data, wherein the training data includes a queue of training images of objects to be identified collected by a camera; 基于所述人脸目标检测网络和所述人体目标检测网络,确定所述训练待识别对象图像的队列中各个训练待识别对象图像的目标IOU关系以得到训练目标IOU关系的队列;Based on the face target detection network and the human target detection network, determining the target IOU relationship of each training object image to be identified in the queue of training object images to obtain a queue of training target IOU relationships; 基于所述训练目标IOU关系的队列,从所述训练待识别对象图像的队列提取训练目标对象图像的队列;Based on the queue of the training target IOU relationship, extracting a queue of training target object images from the queue of training object images to be identified; 使用所述LBP模式算子对所述训练目标对象图像的队列中的各个训练目标对象图像进行处理以得到训练目标对象图像LBP特征向量;Using the LBP mode operator to process each training target object image in the queue of training target object images to obtain a training target object image LBP feature vector; 使用所述HOG特征描述子对所述各个训练目标对象图像进行处理以得到训练目标对象HOG特征向量;Using the HOG feature descriptor to process each of the training target object images to obtain a training target object HOG feature vector; 将所述训练目标对象HOG特征向量和所述训练目标对象图像LBP特征向量输入所述门控响应下的动态交互模块以得到训练目标对象多模态统计特征向量;Inputting the training target object HOG feature vector and the training target object image LBP feature vector into the dynamic interaction module under the gated response to obtain a training target object multimodal statistical feature vector; 将所述各个训练目标对象图像输入所述基于空洞卷积神经网络模型的图像特征提取器以得到训练目标对象图像特征图;Inputting each of the training target object images into the image feature extractor based on the hole convolutional neural network model to obtain a training target object image feature map; 将所述训练目标对象图像特征图输入所述基于卷积门控前馈机制的特征前景掩码显著器以得到训练前景显著目标对象图像特征图;Inputting the training target object image feature map into the feature foreground mask salient device based on the convolution gated feedforward mechanism to obtain a training foreground salient target object image feature map; 将所述训练前景显著目标对象图像特征图和所述训练目标对象多模态统计特征向量输入所述基于MetaNet模型的跨域联合编码器以得到训练多模态统计特征辅助下目标对象图像融合特征图;Inputting the training foreground salient target object image feature map and the training target object multimodal statistical feature vector into the cross-domain joint encoder based on the MetaNet model to obtain a target object image fusion feature map assisted by the training multimodal statistical feature; 将所述训练多模态统计特征辅助下目标对象图像融合特征图输入所述基于解码器的图像质量评分器以得到解码损失函数值;Inputting the target object image fusion feature map assisted by the training multimodal statistical feature into the decoder-based image quality scorer to obtain a decoding loss function value; 计算所述训练多模态统计特征辅助下目标对象图像融合特征图的预定损失函数值以得到多模态统计特征辅助下目标对象图像融合损失函数值;Calculating a predetermined loss function value of the training multimodal statistical feature-assisted target object image fusion feature map to obtain a multimodal statistical feature-assisted target object image fusion loss function value; 将所述解码损失函数值和所述多模态统计特征辅助下目标对象图像融合损失函数值的加权和作为损失函数值,对所述门控响应下的动态交互模块、所述基于空洞卷积神经网络模型的图像特征提取器、所述基于卷积门控前馈机制的特征前景掩码显著器、所述基于MetaNet模型的跨域联合编码器和所述基于解码器的图像质量评分器进行训练。The weighted sum of the decoding loss function value and the target object image fusion loss function value assisted by the multimodal statistical features is used as the loss function value, and the dynamic interaction module under the gated response, the image feature extractor based on the void convolutional neural network model, the feature foreground mask salient device based on the convolutional gated feedforward mechanism, the cross-domain joint encoder based on the MetaNet model and the decoder-based image quality scorer are trained. 10.一种基于图像质量优选的序列图像目标识别装置,其特征在于,包括:10. A sequence image target recognition device based on image quality optimization, characterized by comprising: 图像队列获取模块,用于获取由摄像头采集的待识别对象图像的队列;An image queue acquisition module is used to acquire a queue of images of objects to be identified collected by a camera; IOU关系确定模块,用于基于人脸目标检测网络和人体目标检测网络,确定所述待识别对象图像的队列中各个待识别对象图像的目标IOU关系以得到目标IOU关系的队列;An IOU relationship determination module is used to determine the target IOU relationship of each object image to be identified in the queue of the object images to be identified based on the face target detection network and the human target detection network to obtain a queue of target IOU relationships; 目标对象图像队列提取模块,用于基于所述目标IOU关系的队列,从所述待识别对象图像的队列提取目标对象图像的队列;A target object image queue extraction module, configured to extract a queue of target object images from the queue of the to-be-identified object images based on the queue of the target IOU relationship; 优选模块,用于从所述目标对象图像的队列中挑选人脸质量最优的目标对象图像作为待检测目标对象图像;A selection module, used for selecting a target object image with the best face quality from the queue of target object images as the target object image to be detected; 人脸识别模块,用于对所述待检测目标对象图像进行人脸识别以得到识别结果,所述识别结果为人员身份标签;A face recognition module is used to perform face recognition on the target object image to be detected to obtain a recognition result, wherein the recognition result is a person identity label; 人员身份标签指定模块,用于将所述识别结果中的人员身份标签指定为所述目标对象图像的队列的人员身份标签。The personnel identity tag designation module is used to designate the personnel identity tag in the recognition result as the personnel identity tag of the queue of the target object image. 11.一种电子设备,其特征在于,包括:11. An electronic device, comprising: 处理器;以及Processor; and 存储器,在所述存储器中存储有计算机程序指令,所述计算机程序指令在被所述处理器运行时使得所述处理器执行如权利要求1-9中任一项所述的基于图像质量优选的序列图像目标识别方法。A memory, in which computer program instructions are stored, and when the computer program instructions are executed by the processor, the processor executes the method for sequential image target recognition based on image quality optimization as described in any one of claims 1 to 9.
CN202411062099.8A 2024-08-05 2024-08-05 Sequence image target recognition method, device and electronic device based on image quality optimization Active CN118570889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411062099.8A CN118570889B (en) 2024-08-05 2024-08-05 Sequence image target recognition method, device and electronic device based on image quality optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411062099.8A CN118570889B (en) 2024-08-05 2024-08-05 Sequence image target recognition method, device and electronic device based on image quality optimization

Publications (2)

Publication Number Publication Date
CN118570889A true CN118570889A (en) 2024-08-30
CN118570889B CN118570889B (en) 2024-10-29

Family

ID=92467478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411062099.8A Active CN118570889B (en) 2024-08-05 2024-08-05 Sequence image target recognition method, device and electronic device based on image quality optimization

Country Status (1)

Country Link
CN (1) CN118570889B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118762237A (en) * 2024-09-05 2024-10-11 浙江大学湖州研究院 Wetland species classification method based on air-space remote sensing fusion images
CN118778457A (en) * 2024-09-06 2024-10-15 杭州道秾科技有限公司 Adaptive optimization control system of preservation parameters based on fruit status detection
CN118824524A (en) * 2024-09-13 2024-10-22 江苏省疾病预防控制中心(江苏省预防医学科学院) Multimodal lung disease diagnosis and screening system based on GAN algorithm

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934115A (en) * 2019-02-18 2019-06-25 苏州市科远软件技术开发有限公司 Construction method, face identification method and the electronic equipment of human face recognition model
CN110826370A (en) * 2018-08-09 2020-02-21 广州汽车集团股份有限公司 Method and device for identifying identity of person in vehicle, vehicle and storage medium
CN111382693A (en) * 2020-03-05 2020-07-07 北京迈格威科技有限公司 Image quality determination method and device, electronic equipment and computer readable medium
CN112329679A (en) * 2020-11-12 2021-02-05 济南博观智能科技有限公司 Face recognition method, face recognition system, electronic equipment and storage medium
US20210209802A1 (en) * 2020-07-21 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Image Detection Method, Apparatus, Electronic Device and Storage Medium
CN113569809A (en) * 2021-08-27 2021-10-29 腾讯音乐娱乐科技(深圳)有限公司 Image processing method, device and computer readable storage medium
EP3905120A1 (en) * 2020-04-27 2021-11-03 20Face B.V. Quality check method and system for determining a recognition quality of an input face image
CN114627526A (en) * 2022-02-14 2022-06-14 厦门瑞为信息技术有限公司 Fusion duplicate removal method and device based on multi-camera snapshot image and readable medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826370A (en) * 2018-08-09 2020-02-21 广州汽车集团股份有限公司 Method and device for identifying identity of person in vehicle, vehicle and storage medium
CN109934115A (en) * 2019-02-18 2019-06-25 苏州市科远软件技术开发有限公司 Construction method, face identification method and the electronic equipment of human face recognition model
CN111382693A (en) * 2020-03-05 2020-07-07 北京迈格威科技有限公司 Image quality determination method and device, electronic equipment and computer readable medium
EP3905120A1 (en) * 2020-04-27 2021-11-03 20Face B.V. Quality check method and system for determining a recognition quality of an input face image
US20210209802A1 (en) * 2020-07-21 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Image Detection Method, Apparatus, Electronic Device and Storage Medium
CN112329679A (en) * 2020-11-12 2021-02-05 济南博观智能科技有限公司 Face recognition method, face recognition system, electronic equipment and storage medium
CN113569809A (en) * 2021-08-27 2021-10-29 腾讯音乐娱乐科技(深圳)有限公司 Image processing method, device and computer readable storage medium
CN114627526A (en) * 2022-02-14 2022-06-14 厦门瑞为信息技术有限公司 Fusion duplicate removal method and device based on multi-camera snapshot image and readable medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KIM, HI; LEE, SH AND RO, YM: "FACE IMAGE ASSESSMENT LEARNED WITH OBJECTIVE AND RELATIVE FACE IMAGE QUALITIES FOR IMPROVED FACE RECOGNITION", 《IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》, 1 January 2015 (2015-01-01) *
POOYA FAZELI ARDEKANI: "Face mask recognition using a custom CNN and data augmentation", 《SIGNAL, IMAGE AND VIDEO PROCESSING》, 4 September 2023 (2023-09-04) *
王海龙;王怀斌;王荣耀;王海涛;刘强;张鲁洋;蒋梦浩;: "基于视频监控的人脸识别方法", 计算机测量与控制, no. 04, 25 April 2020 (2020-04-25) *
陈正浩;吴云东;蔡国榕;陈水利;: "基于纹理特征融合的人脸图像质量评估算法", 集美大学学报(自然科学版), no. 04, 28 July 2018 (2018-07-28) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118762237A (en) * 2024-09-05 2024-10-11 浙江大学湖州研究院 Wetland species classification method based on air-space remote sensing fusion images
CN118778457A (en) * 2024-09-06 2024-10-15 杭州道秾科技有限公司 Adaptive optimization control system of preservation parameters based on fruit status detection
CN118824524A (en) * 2024-09-13 2024-10-22 江苏省疾病预防控制中心(江苏省预防医学科学院) Multimodal lung disease diagnosis and screening system based on GAN algorithm

Also Published As

Publication number Publication date
CN118570889B (en) 2024-10-29

Similar Documents

Publication Publication Date Title
CN111709409B (en) Face living body detection method, device, equipment and medium
US12190588B2 (en) Occlusion-aware multi-object tracking
US11704817B2 (en) Method, apparatus, terminal, and storage medium for training model
CN118570889B (en) Sequence image target recognition method, device and electronic device based on image quality optimization
US12175684B2 (en) Pedestrian tracking method, computing device, pedestrian tracking system and storage medium
Li et al. Visual tracking via incremental log-euclidean riemannian subspace learning
US10346464B2 (en) Cross-modiality image matching method
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN111310731A (en) Video recommendation method, device and equipment based on artificial intelligence and storage medium
CN109101602A (en) Image encrypting algorithm training method, image search method, equipment and storage medium
CN111144366A (en) A stranger face clustering method based on joint face quality assessment
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
US11244475B2 (en) Determining a pose of an object in the surroundings of the object by means of multi-task learning
CN114155475A (en) Method, device and medium for recognizing end-to-end personnel actions under view angle of unmanned aerial vehicle
Imran et al. FaceEngine: a tracking-based framework for real-time face recognition in video surveillance system
CN112836682B (en) Method, device, computer equipment and storage medium for identifying objects in video
KR102540290B1 (en) Apparatus and Method for Person Re-Identification based on Heterogeneous Sensor Camera
Rana et al. Real time deep learning based face recognition system using Raspberry PI
CN117058739B (en) Face clustering updating method and device
CN108596068B (en) A method and device for motion recognition
CN115797678B (en) Image processing method, device, equipment, storage medium and computer program product
CN118587758B (en) Cross-domain personnel identification and matching method, device and electronic device
Truong et al. A scalable real-time attendance tracking system based on face recognition
CN120163850B (en) Monitoring video intelligent tracking method based on dynamic feature collaborative optimization
CN118155136B (en) A pedestrian search method based on cascade regression and angle relationship supervision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant