US12536616B2 - Image processing device and image processing method - Google Patents
Image processing device and image processing methodInfo
- Publication number
- US12536616B2 US12536616B2 US17/768,853 US202017768853A US12536616B2 US 12536616 B2 US12536616 B2 US 12536616B2 US 202017768853 A US202017768853 A US 202017768853A US 12536616 B2 US12536616 B2 US 12536616B2
- Authority
- US
- United States
- Prior art keywords
- image data
- multispectral
- multispectral image
- image processing
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10036—Multispectral image; Hyperspectral image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure generally pertains to an image processing device and an image processing method.
- neural networks such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN) are known, and they are used in a plurality of technical fields, for example in image processing.
- DNN Deep Neural Network
- CNN Convolutional Neural Network
- image processing devices may use DNN and CNN for image reconstruction, multispatial and multispectral image generation, object recognition and the like.
- DNN and CNN typically have an input layer, an output layer and multiple hidden layers between the input layer and the output layer.
- a neural network may be trained to output images having high spectral resolution or high spatial resolution, using as an input to the neural network, a color channel image, such as an RGB image (having red, green and blue color channels).
- the disclosure provides an image processing device comprising circuitry configured to obtain input image data being represented by a number of color channels and to input the input image data into a neural network for generating output multispectral image data, wherein the neural network is configured to generate at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- the disclosure provides an image processing method comprising obtaining input image data being represented by a number of color channels and inputting the input image data into a neural network for generating output multispectral image data, wherein the neural network is configured to generate at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- FIG. 1 illustrates a proposed approach of multispectral image data generation from input image data represented by a number of color channels
- FIG. 2 illustrates an exemplary optimized relationship between spectral resolution and spatial resolution of multispectral image data
- FIG. 3 visualizes the application of a Convolutional Neural Network
- FIG. 4 shows a block diagram of an embodiment of an image processing device
- FIG. 5 illustrates an embodiment of a processing scheme of a learning method of a Convolutional Neural Network
- FIG. 6 shows a block diagram of an embodiment of learning system
- FIG. 7 shows a block diagram of an embodiment of an image processing system
- FIG. 8 is a flowchart of an embodiment of an image processing method.
- multispectral imaging systems and common Red Green Blue (RGB) imaging systems are used to capture and analyze images having high spectral resolution and high spatial resolution, respectively.
- RGB Red Green Blue
- a multispectral imaging device provides higher resolved spectral information than a common RGB imaging system.
- the analysis of a high resolved spectrum may be used in a variety of applications, such as biometrics, remote sensing, medical and food inspection.
- a multispectral sensing device is usually more expensive than a RGB imaging device.
- the spatial resolution of a mosaic-array multispectral sensor typically is lower than the spatial resolution of a common RGB sensor.
- the design costs of a common RGB sensor usually, are less than the costs of a multispectral sensor, most imaging systems focus on spatial resolution rather than spectral resolution.
- multispectral imaging systems perform hyper/multispectral image data reconstruction from a RGB image using deep learning techniques, in order to benefit from both spatial and spectral resolution information.
- neural networks such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN) are known, and they have reached state-of-the-art level performance in many domains, such as of image processing, image reconstruction, multispatial and multispectral image generation, language processing and the like.
- CNN is a part of DNN that are usually applied to analyzing visual imagery.
- CNN uses image classification algorithms for image transformation, multispatial and multispectral image generation, image classification, medical image analysis, image and video recognition, natural language processing, material classification applications (e.g. remote sensing, medical diagnosis) and the like.
- a CNN may have an input layer and an output layer, as well as multiple hidden layers.
- the hidden layers of a CNN typically have a number of convolutional layers i.e. pooling layers, fully connected layers and the like.
- Each convolutional layer within a neural network usually has attributes, such as an input having shape (number of images) ⁇ (image width) ⁇ (image height) ⁇ (image depth), a number of convolutional kernels, acting like a filter, whose width and height are hyper-parameters, and whose depth must be typically equal to that of the image.
- the convolutional layers convolve the input and pass their result to the next layer.
- the Conventional CNN is trained such as to reconstruct a hyper/multispectral image from an RGB image.
- the conventional CNN may be trained to output only images with a predefined number of spectral channels, without taking into account the amount of spectral information, which may be needed for different applications, target scenes, systems, users desires or the like.
- such an approach usually requires a high computational effort as well as much memory when calculating a high resolved multispectral image, which has a large number of spectral channels.
- the conventional approach typically outputs a hyper/multispectral image with a predefined number of spectral channels, and, thus, maybe with an unnecessary amount of spectral information for a target or vice versa.
- some embodiments pertain to an image processing device including circuitry configured to obtain input image data being represented by a number of color channels, and to input the input image data into a neural network for generating output multispectral image data, wherein the neural network is configured to generate at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- the image processing device may be a digital (video) camera, a surveillance camera, a biometric device, a security camera, a medical/healthcare device, a remote sensing device, a food inspection device, an edge computing enabled image sensor, such as smart sensor associated with smart speaker, or the like, a motor vehicles device, a smartphone, a personal computer, a laptop computer, a personal computer, a wearable electronic device, electronic glasses, or the like, a circuitry, a processor, multiple processors, logic circuits or a mixture of those parts.
- the circuitry may include one or more processors, logical circuits, memory (read only memory, random memory, etc., storage memory, i.e. hard disc, compact disc, flash drive, etc.), an interface for communication via a network, such as a wireless network, internet, local area network, or the like, a CMOS (Complementary Metal Oxide Semiconductor) image sensor, a CCD (Charge Coupled Device) image sensor, or the like.
- CMOS Complementary Metal Oxide Semiconductor
- CCD Charge Coupled Device
- the input image data may be generated by the image sensor, as mentioned above.
- the input image data may be also obtained from a memory included in the device, from an external memory, etc., from an artificial image generator, created via computer generated graphics, or the like.
- the input image data may be represented by a number of color channels, for example three color channels, such as Red, Green and Blue, or the like.
- the input image data may also be represented for example by a small number of spectral channels.
- the color channel of a specific color for example red, green, or blue, may include information of multiple spectral channels that corresponds to the wavelength range of red, green, or blue, respectively. That is, the color channels may be considered as an integration of the corresponding (multiple) spectral channels located in the wavelength range of the associated color channel.
- FIG. 1 a proposed approach for generating multispectral image data from input image data represented by a number of color channels is illustrated.
- an image processing device acquires input image data, as input image data 1 , representing an image, for example captured by a digital camera.
- the input image data 1 are represented by a number of color channels.
- the number of channels of the input image data 1 is three, namely Red, Green and Blue, without limiting the present disclosure to these three color channels (in principal, any number and type of color channels can be chosen).
- the input image data 1 are input to a neural network, such as for example a CNN, for generating output multispectral image data, such as multispectral image data 2 .
- the output multispectral image data 2 are generated from the input image data 1 and the number of spectral channels of the output multispectral image data 2 is nine (9), in this embodiment. Therefore, the input image data being represented by a number of color channels have been transformed to output multispectral image data 2 being represented by a number of spectral channels.
- the neural network generates at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- the neural network may also generate a plurality of multispectral image data on the basis of the input image data. That is, each of the plurality of multispectral image data may be followed by another multispectral image data and each of the plurality of multispectral image data may be generated on the basis of the previous generated multispectral image data.
- multiple intermediate multispectral image data May be generated by the neural network.
- the circuitry may be further configured to obtain the first or the second multispectral image data as the output multispectral image data.
- the neural network may generate at least first and second multispectral image data and thus, the circuitry may be obtain, as the output multispectral image data, the first multispectral image data or the second multispectral image data, based, for example, on a setting of a user, or a predetermined set up of the image processing device based on a target application.
- the input image data may include spectral image data.
- the spectral image data may be input image data represented by a small number of spectral channels, which may be suitable for example, for object classification or the like, using neural network.
- the input image data may also include Red Green Blue (RGB) image data represented by a specific number of color channels, in which multiple spectral channels are integrated, as described above.
- RGB Red Green Blue
- the number of spectral channels of the output multispectral image data is larger than the number of spectral channels of the input image data.
- the number of spectral channels of the output multispectral image data may be nine (9), or the like. Therefore, the output multispectral image data may have, after processing, higher spectral resolution.
- a size of the image data remains the same before and after image processing, even in the case of higher spectral resolution of the output image data after image processing.
- a spatial resolution of the first multispectral image data may be higher than a spatial resolution of the second multispectral image data.
- a conventional imaging device such as a mosaic-array multispectral imaging device, using a conventional neural network, usually sacrifices its spatial resolution for spectral resolution, while both information offers benefits for computational sensing applications. Therefore, it may be suitable, a multispectral image to be generated from a RGB image, or from a multispectral image represented by small number of spectral channels, which has an optimized trade-off condition between spatial and spectral resolution, for the device.
- the output multispectral image data is generated based on a predetermined relationship between the spatial resolution and the number of spectral channels.
- the predetermined relationship may be an optimized trade-off relationship between spectral resolution and spatial resolution.
- the optimal point of an optimized trade-off relationship between spectral resolution and spatial resolution may depend on a system, an application, a target scene, or the like.
- the predetermined relationship between the spatial resolution and the number of spectral channels may be determined based on a setting of a user, or a predetermined set up of the image processing device according to a target application.
- FIG. 2 An exemplary optimized relationship between spectral resolution and spatial resolution of multispectral image data, such as multispectral skin data is illustrated in FIG. 2 .
- the number of spectral channels e.g. spectral bands
- the number of spectral bands increases from three (3) spectral bands to three hundred (300) spectral bands.
- the classification accuracy is represented in y-axis, which, in this embodiment, increases with the number of spectral bands, up to the sixteen (16) bands.
- Dashed line 3 represents the spatial resolution of the image.
- the optimized performance is obtained with 16 spectral channels multispectral data.
- the best relationship between spectral resolution and spatial resolution depends on the target. For example, for other applications, the best trade-off point may be different.
- the optimal relationship between spectral resolution and spatial resolution may change depending on the content in a scene, which may make difficult the design of an optimal multispectral sensor with the best performance.
- the neural network may be a convolutional neural network (CNN), without limiting the present disclosure in that regard.
- CNN convolutional neural network
- the convolutional neural network may include convolutional layers, or may also include local or global pooling layers, such as max-pooling layers, which reduce the dimensions of the image data, as it is generally known.
- the pooling layers may be used for pooling, which is a form of non-linear down-sampling, such as spatial pooling, namely max-pooling, average pooling, sum pooling, or the like.
- the generation of the multispectral image data may be either during a training phase of a neural network, such as a CNN, or may be a generation of the multispectral image data with an already trained neural network, such as a trained CNN, for example, for extracting information from the image data (e.g. object recognition, or recognition of other information in the image data, such as spatial information, spectral information, patterns, colors, etc.).
- a neural network such as a CNN
- the neural network may be an un-trained neural network.
- the neural network may be part of the image processing device, e.g. stored in a storage or memory of the image processing device, or the image processing device may have access to a neural network, e.g. based on inter-processor communication, electronic bus, network (including internet), etc.
- FIG. 3 shows generally in the first line the CNN structure, and in the second line the basic principle of building blocks.
- the principles of a CNN and its application in imaging is generally known and, thus, it is only briefly discussed in the following under reference of FIG. 3 .
- the input image includes for example three maps or layers (exemplary red, green and blue (RGB) color information) and N times N blocks.
- the CNN has a convolutional layer and a subsequent pooling layer, wherein this structure can be repeated as also shown in FIG. 3 .
- the convolutional layer includes the neurons.
- a kernel filter
- the pooling layer which is based in the present embodiment on the Max-Pooling (see second line, “Max-Pooling), takes the information of the most active neurons of the convolution layer and discards the other information. After several repetitions (three in FIG.
- the process ends with a fully-connected layer, which is also referred to as affine layer.
- the last layer includes typically a number of neurons, which corresponds to the number of object classes (output features) which are to be differentiated by the CNN.
- the output is illustrated in FIG. 3 , first line, as an output distribution, wherein the distribution is shown by a row of columns, wherein each column represents a class and the height of the column represents the weight of the object class.
- the different classes correspond to the output or image attribute features, which are output by the CNN.
- the classes are, for example, “people, car, etc.” Typically several hundred or several thousand of classes can be used, e.g. also for object recognition of different objects.
- the convolutional neural network may be trained to generate the first multispectral image data from the input image data and the second multispectral image data from the first multispectral image data.
- the convolutional neural network may generate a plurality of multispectral image data, such as a first multispectral image data and a second multispectral image data, which is generated on the basis of first multispectral image data and which follows the first multispectral image data. That is, each of the plurality of multispectral image data may be generated on the basis of the previous generated multispectral image data and each of the plurality of multispectral image data may be followed by another multispectral image data.
- the convolutional neural network may be trained based on RGB image data and on multispectral image data.
- the training data of multispatial multispectral images may also be generated from high resolution hyperspectral data (the terms multispectral and hyperspectral data are generally known in the art, and they are typically differentiated by the number of spectral channels, wherein the hyperspectral data has more spectral channels than multispectral data).
- a CNN in image processing, uses as training database, groundtruth image data and desired image data, for example RGB image data and multispectral image data.
- multispectral image data represented by C channels
- hyperspectral image data are generated from hyperspectral image data by using following equation:
- I c ⁇ 3 ⁇ 8 ⁇ 0 780 R ⁇ ( ⁇ ) ⁇ L ⁇ ( ⁇ ) ⁇ S c ( ⁇ ) ⁇ d ⁇ ⁇ + n
- I c is the intensity of spectral band c (spectral channel) of a multispectral image
- ⁇ is the wavelength over which is integrated
- R is the spectral reflectance of a target in a scene
- L is the spectral distribution of the illumination, e.g. white illumination, which has a flat spectral distribution over all wavelengths
- S c is the sensor's spectral sensitivity of spectral band c
- n is the sensor noise.
- R is measured hyperspectral data (HS image) by a hyperspectral camera
- L can be set considering the illumination which will be used in the application and S c is given from a sensor specification of a camera.
- the circuitry is further configured to perform object recognition.
- object recognition may be performed in an autonomous vehicle application, in which a size of a pedestrian in an image may depend on a distance from the vehicle. To detect a pedestrian who is far from the vehicle, a higher spatial resolved image may be suitable for a pedestrian detector.
- object recognition may be performed, for example, in a hand identification application, in which a hand may make various poses. In such cases, spatial resolution is less useful than spectral resolution. Hence, the relationship between spectral resolution and spatial resolution may include a higher amount of spectral information than spatial information.
- image processing based on multispectral and hyperspectral imaging is widely used in food industry (e.g. bruise detection of a fruit, freshness detection of a fish), material classification applications (e.g. remote sensing, medical diagnosis) and the like, and, thus, some embodiments pertain to these fields.
- Some embodiments pertain to an image processing method, which may be performed by the image processing device described herein, or any other electronic device, processor, or other computing means or the like.
- the method includes obtaining input image data being represented by a number of color channels and inputting the input image into a neural network for generating output multispectral image data, wherein the neural network is configured to generate at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- the image processing method may further include obtaining the first or the second multispectral image data as the output multispectral image data.
- the input image data may include spectral image data, wherein the number of spectral channels of the output multispectral image data may be larger than the number of spectral channels of the input image data.
- a spatial resolution of the first multispectral image data may be higher than a spatial resolution of the second multispectral image data.
- the output multispectral image data may be generated based on a predetermined relationship between the spatial resolution and the number of spectral channels.
- the neural network may be a convolutional neural network, which may be trained to generate the first multispectral image data from the input image data and the second multispectral image data from the first multispectral image data.
- the convolutional neural network may also be trained based on RGB image data and on multispectral image data, as discussed herein.
- the image processing method may further include performing object recognition.
- FIG. 4 a block diagram of an embodiment of an image processing device 11 is illustrated, which inputs image data into a convolutional neural network (CNN) for generating multispectral image data, as mentioned herein.
- CNN convolutional neural network
- the image processing device 11 includes a circuitry 12 with an interface 13 , a Central Processing Unit (CPU) 14 , including multiple processors including Graphics Processing Units (GPUs), a memory 15 that includes a RAM, a ROM and a storage memory and a trained CNN 16 (which is stored in a memory).
- CPU Central Processing Unit
- GPU Graphics Processing Unit
- a memory 15 that includes a RAM, a ROM and a storage memory
- a trained CNN 16 which is stored in a memory.
- the image processing device 11 acquires, through the interface 13 , image data, such as input image data 1 , being represented by a number of color channels, namely Red, Green and Blue in this embodiment.
- image data such as input image data 1
- the input image data 1 represent an image of a target scene been captured with a digital camera, such as an RGB camera (not shown).
- the input image data 1 being represented by a number of color channels are transmitted to the CPU 14 , which inputs the input image data 1 into the CNN 16 for generating multispectral image data, being represented by a number of spectral channels.
- the CNN 16 has been trained in advance to generate (at least) first multispectral image data and second multispectral image data on the basis of the input image data 1 .
- the image processing device 11 is configured to obtain as output multispectral image data, such as output multispectral image data 2 , anyone of the first or the second multispectral image data generated by the CNN 16 .
- the image processing device 11 obtains the second multispectral image data as the output multispectral image data 2 .
- the number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data, exemplary, the number of spectral channels of the second multispectral image data is nine (9).
- the implementation of the above described image processing device 11 may result to computational effort reduction and memory reduction.
- the CNN 16 may be a single CNN, being able to generate multispatial multispectral image data from a RGB image.
- FIG. 5 illustrates an embodiment of a processing scheme of a learning method of the CNN 16 for generating a plurality of multispectral image data 20 - 1 to 20 -N, on the basis of the input image data 1 , wherein each multispectral image data of the plurality of multispectral image data may be obtained from the image processing device 11 , as the output multispectral image data 2 of FIG. 4 .
- the image processing device 11 inputs into the CNN 16 the input image data 1 , such as RGB image data, which are represented by a number of color channels, namely Red, Green and Blue.
- an input image in a CNN has a shape, that is, (number of images) ⁇ (image width) ⁇ (image height) ⁇ (image depth).
- the input image data 1 representing an input image of which a height and a width define a spatial resolution.
- the height of the input image data 1 is Height 0 and the width is Width 0 .
- the number of spectral channels of the input image data 1 is Ch 0 .
- the convolutional layers of the CNN 16 convolve the input image data 1 , perform rectification using Rectified Linear Unit (RELU) and spatial pooling that is carried out by max-pooling layers and then, pass their result to the next layer.
- the result of the next layer is multispectral image data 20 - 1 (e.g. corresponding to first multispectral image data) being represented, by six (6) spectral channels and the multispectral image data 20 - 1 represent a multispectral image, which has a height Height 1 , a width Width 1 and a number of spectral channels Ch 1 , wherein Height 0 >Height 1 , Width 0 >Width 1 and Ch 0 ⁇ Ch 1 .
- the result of the next layer is multispectral image data 20 - 2 (e.g. corresponding to second multispectral image data) being represented, by nine (9) spectral channels and the multispectral image data 20 - 2 represent a multispectral image, which has a height Height 2 , a width Width 2 and a number of spectral channels Ch 2 , wherein Height 0 >Height 1 >Height 2 , Width 0 >Width 1 >Width 2 and Ch 0 ⁇ Ch 1 ⁇ Ch 2 .
- the convolution process evolves as described above until a size of the multispectral image data become a size of the kernel of the CNN 16 .
- the result of the last layer of the CNN 16 is multispectral image data 20 -N (e.g. corresponding to N-th multispectral data) being represented, by twelve (12) spectral channels and the multispectral image data 20 -N represent a multispectral image, which has a height Height N , a width Width N and a number of spectral channels Ch N , wherein Height 0 >Height 1 >Height 2 > . . . >Height N , Width 0 >Width 1 >Width 2 > . . . >Width N and Ch 0 ⁇ Ch 1 ⁇ Ch 2 ⁇ . . . ⁇ Ch N .
- the CNN 16 is trained so that anyone of the multispectral image data 20 - 1 to 20 -N (e.g. first to N-th multispectral data) could be obtained by the image processing device 11 , as output multispectral image data 2 . That is, the CNN 16 generates multiple intermediate multispectral image data, at several points in the neural network. Moreover, the image processing device 11 obtains anyone of the multispectral image data 20 - 1 to 20 -N with a predetermined relationship between spatial resolution and spectral resolution depending on the application or the target scene. The predetermined relationship may be set in advance by a user and thus, the CNN 16 does not calculate anymore, when the predetermined relationship, which is an optimized relationship between spatial resolution and spectral resolution, is achieved.
- a suitable multispectral image may be determined by analyzing a degree of spatial frequency of an input RGB image.
- a multispectral image with a small number of spectral channels may be desirable.
- spectral information may be more important for the performed application and a multispectral image with a large number of spectral channels may be desirable in some embodiments.
- a target performance may be determined by a result of the application, e.g. reliability of object classification result.
- the application result may not achieve the target performance when inputting a multispectral image with a small number of spectral channels, and thus, the CNN may continue to generate a multispectral image with a larger number of spectral channels, until the application result achieves the setting criteria.
- FIG. 6 An embodiment of a learning system 30 , shown as a block diagram, is illustrated in FIG. 6 that generates a learning model based on which a neural network, such as CNN 16 is trained.
- a neural network such as CNN 16
- the learning system 30 includes a memory device, such as the memory 15 of image processing device 11 , described under the reference of FIG. 4 , a RGB image generator 31 , a multispectral image generator 32 , a learning apparatus, such as the CNN 16 and a learning model 33 .
- N and N is the number of intermediate MS images, e.g. represented by a plurality of multispectral image data, such as multispectral image data 20 - 1 to 20 -N of FIG. 5 .
- the intermediate MS images should meet the following conditions:
- the learning apparatus such as the CNN 16 is trained to transform multiple MS images from a RGB image by minimizing the following reconstruction loss:
- MS REC ⁇ circle around (1) ⁇ is a reconstructed MS image i by CNN
- MS GT ⁇ circle around (1) ⁇ is a ground truth of MS image i which is generated by the multispectral image generator 32
- MSE is a Mean Squared Error function, without limiting the present disclosure in that regard.
- the Mean Absolute Error function, or the like, may also be used.
- the learning system 30 generates a learned model, which is stored into the memory 15 of image processing device 11 .
- FIG. 7 shows a block diagram of the image processing system 40 .
- the image processing system 40 includes an image capturing apparatus 41 , such as a camera including a RGB image sensor, the image processing device 11 , a memory 48 , for storing a database, and an information processing apparatus 44 , which includes a target area detection unit 45 , a feature extraction unit 46 and a recognition unit 47 .
- an image capturing apparatus 41 such as a camera including a RGB image sensor
- the image processing device 11 the image processing device 11
- a memory 48 for storing a database
- an information processing apparatus 44 which includes a target area detection unit 45 , a feature extraction unit 46 and a recognition unit 47 .
- the image processing system 40 is configured to perform object recognition of the image data provided by the image capturing apparatus 41 and processed by the image processing device 11 .
- the image capturing apparatus 41 such as an RGB camera, captures an image, such as a RGB image, of a target scene and transmits RGB image data, representing the captured RGB image, to the image processing device 11 .
- the image processing device 11 outputs multispectral image data 43 being generated by a trained convolutional neural network, such as the CNN 16 , which is trained based on the learned model 33 .
- the generated multispectral image data 43 are generated also based on input information 42 that is related to a target to be recognized.
- the generated multispectral image data 43 are transmitted to the information processing apparatus 44 , which is configured to perform object recognition.
- the multispectral image data 43 are transmitted to the target area detection unit 45 , the feature extraction unit 46 and then to the recognition unit 47 .
- the recognition unit 47 performs object recognition based on data, included in the database, which is stored in the memory 48 .
- the output 49 of the image processing system 40 depends on recognition result of the target (e.g. user ID).
- an image processing method 50 which is performed by the image processing device 11 and/or the image processing system 40 in some embodiments, is discussed under reference of FIG. 8 .
- input image data such as input image data 1
- the image processing device 11 and/or the image processing system 40 are obtained by the image processing device 11 and/or the image processing system 40 , as discussed above.
- the input image data may be obtained from an image sensor or from a memory included in the device, from an external memory, etc., or from an artificial image generator, created via computer generated graphics, or the like.
- the input image data, at 52 are input into a convolutional neural network, such as CNN 16 , for generating, at 53 , output multispectral (MS) image data, such as the output multispectral image data 2 , as discussed above.
- a convolutional neural network such as CNN 16
- MS output multispectral
- the input image data may be represented by a number of color channels, such as Red, Green and Blue, or may be represented by a small number of spectral channels, for example three (3), or the like.
- the convolutional neural network generates first and second multispectral image data on the basis of the input image data.
- a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data, as discussed above.
- the first or the second multispectral image data are obtained as output multispectral data.
- the first multispectral image data are generated on the basis of the input image data and the second multispectral image data are generated on the basis of the first multispectral image data, as discussed herein.
- the obtained output multispectral image data are output.
- the method as described herein is also implemented in some embodiments as a computer program causing a computer and/or a processor to perform the method, when being carried out on the computer and/or processor.
- a non-transitory computer-readable recording medium is provided that stores therein a computer program product, which, when executed by a processor, such as the processor described above, causes the methods described herein to be performed.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
Description
as already described in detail in
where MSREC{circle around (1)} is a reconstructed MS imagei by CNN, MSGT{circle around (1)} is a ground truth of MS imagei which is generated by the multispectral image generator 32, MSE is a Mean Squared Error function, without limiting the present disclosure in that regard. The Mean Absolute Error function, or the like, may also be used. The learning system 30 generates a learned model, which is stored into the memory 15 of image processing device 11.
-
- (1) An image processing device comprising circuitry configured to:
- obtain input image data being represented by a number of color channels;
- input the input image data into a neural network for generating output multispectral image data, wherein the neural network is configured to generate at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- (2) The image processing device of (1), wherein the circuitry is further configured to obtain the first or the second multispectral image data as the output multispectral image data.
- (3) The image processing device of (1) or (2), wherein the input image data include spectral image data.
- (4) The image processing device of (3), wherein the number of spectral channels of the output multispectral image data is larger than the number of spectral channels of the input image data.
- (5) The image processing device of anyone of (1) to (4), wherein a spatial resolution of the first multispectral image data is higher than a spatial resolution of the second multispectral image data.
- (6) The image processing device of (5), wherein the output multispectral image data is generated based on a predetermined relationship between the spatial resolution and the number of spectral channels.
- (7) The image processing device of anyone of (1) to (6), wherein the neural network is a convolutional neural network.
- (8) The image processing device of (7), wherein the convolutional neural network is trained to generate the first multispectral image data from the input image data and the second multispectral image data from the first multispectral image data.
- (9) The image processing device of (7), wherein the convolutional neural network is trained based on RGB image data and on multispectral image data.
- (10) The image processing device of anyone of (1) to (9), wherein the circuitry is further configured to perform object recognition.
- (11) An image processing method comprising:
- obtaining input image data being represented by a number of color channels;
- inputting the input image data into a neural network for generating output multispectral image data, wherein the neural network is configured to generate at least first and second multispectral image data on the basis of the input image data, wherein a number of spectral channels of the second multispectral image data is larger than the number of spectral channels of the first multispectral image data.
- (12) The image processing method of (11), further comprising obtaining the first or the second multispectral image data as the output multispectral image data.
- (13) The image processing method of (11) or (12), wherein the input image data include spectral image data.
- (14) The image processing method of (13), wherein the number of spectral channels of the output multispectral image data is larger than the number of spectral channels of the input image data.
- (15) The image processing method of anyone of (11) to (14), wherein a spatial resolution of the first multispectral image data is higher than a spatial resolution of the second multispectral image data.
- (16) The image processing method of (15), wherein the output multispectral image data is generated based on a predetermined relationship between the spatial resolution and the number of spectral channels.
- (17) The image processing method of anyone of (11) to (16), wherein the neural network is a convolutional neural network.
- (18) The image processing device of (17), wherein the convolutional neural network is trained to generate the first multispectral image data from the input image data and the second multispectral image data from the first multispectral image data.
- (19) The image processing method of (17), wherein the convolutional neural network is trained based on RGB image data and on multispectral image data.
- (20) The image processing method of anyone of (11) to (19), further comprising performing object recognition
- (21) A computer program comprising program code causing a computer to perform the method according to anyone of (11) to (20), when being carried out on a computer.
- (22) A non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method according to anyone of (11) to (20) to be performed.
- (1) An image processing device comprising circuitry configured to:
Claims (20)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP19204783 | 2019-10-23 | ||
| EP19204783.5 | 2019-10-23 | ||
| EP19204783 | 2019-10-23 | ||
| PCT/EP2020/079090 WO2021078629A1 (en) | 2019-10-23 | 2020-10-15 | Image processing device and image processing method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20240303773A1 US20240303773A1 (en) | 2024-09-12 |
| US12536616B2 true US12536616B2 (en) | 2026-01-27 |
Family
ID=68342612
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/768,853 Active 2042-09-01 US12536616B2 (en) | 2019-10-23 | 2020-10-15 | Image processing device and image processing method |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12536616B2 (en) |
| CN (1) | CN114556428A (en) |
| WO (1) | WO2021078629A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240242380A1 (en) * | 2023-01-13 | 2024-07-18 | Maya Heat Transfer Technologies Ltd. | System for generating an image dataset for training an artificial intelligence model for object recognition, and method of use thereof |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106485688A (en) | 2016-09-23 | 2017-03-08 | 西安电子科技大学 | High spectrum image reconstructing method based on neutral net |
| CN108830796A (en) | 2018-06-20 | 2018-11-16 | 重庆大学 | Based on the empty high spectrum image super-resolution reconstructing method combined and gradient field is lost of spectrum |
| US20190096049A1 (en) | 2017-09-27 | 2019-03-28 | Korea Advanced Institute Of Science And Technology | Method and Apparatus for Reconstructing Hyperspectral Image Using Artificial Intelligence |
| US20200302249A1 (en) * | 2019-03-19 | 2020-09-24 | Mitsubishi Electric Research Laboratories, Inc. | Systems and Methods for Multi-Spectral Image Fusion Using Unrolled Projected Gradient Descent and Convolutinoal Neural Network |
| US20210250526A1 (en) * | 2017-09-12 | 2021-08-12 | Carbon Bee | Device for capturing a hyperspectral image |
| US20210350590A1 (en) * | 2019-01-29 | 2021-11-11 | Korea Advanced Institute Of Science And Technology | Method and device for imaging of lensless hyperspectral image |
| US11354804B1 (en) * | 2019-09-27 | 2022-06-07 | Verily Life Sciences Llc | Transforming multispectral images to enhanced resolution images enabled by machine learning |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3483108B2 (en) * | 1997-11-25 | 2004-01-06 | 株式会社日立製作所 | Multispectral image processing apparatus and recording medium storing program for the same |
| CN104112263B (en) * | 2014-06-28 | 2018-05-01 | 南京理工大学 | The method of full-colour image and Multispectral Image Fusion based on deep neural network |
| CN108805874B (en) * | 2018-06-11 | 2022-04-22 | 中国电子科技集团公司第三研究所 | Multispectral image semantic cutting method based on convolutional neural network |
| CN109003239B (en) * | 2018-07-04 | 2022-03-29 | 华南理工大学 | Multispectral image sharpening method based on transfer learning neural network |
-
2020
- 2020-10-15 CN CN202080072571.XA patent/CN114556428A/en active Pending
- 2020-10-15 US US17/768,853 patent/US12536616B2/en active Active
- 2020-10-15 WO PCT/EP2020/079090 patent/WO2021078629A1/en not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106485688A (en) | 2016-09-23 | 2017-03-08 | 西安电子科技大学 | High spectrum image reconstructing method based on neutral net |
| US20210250526A1 (en) * | 2017-09-12 | 2021-08-12 | Carbon Bee | Device for capturing a hyperspectral image |
| US20190096049A1 (en) | 2017-09-27 | 2019-03-28 | Korea Advanced Institute Of Science And Technology | Method and Apparatus for Reconstructing Hyperspectral Image Using Artificial Intelligence |
| CN108830796A (en) | 2018-06-20 | 2018-11-16 | 重庆大学 | Based on the empty high spectrum image super-resolution reconstructing method combined and gradient field is lost of spectrum |
| US20210350590A1 (en) * | 2019-01-29 | 2021-11-11 | Korea Advanced Institute Of Science And Technology | Method and device for imaging of lensless hyperspectral image |
| US20200302249A1 (en) * | 2019-03-19 | 2020-09-24 | Mitsubishi Electric Research Laboratories, Inc. | Systems and Methods for Multi-Spectral Image Fusion Using Unrolled Projected Gradient Descent and Convolutinoal Neural Network |
| US11354804B1 (en) * | 2019-09-27 | 2022-06-07 | Verily Life Sciences Llc | Transforming multispectral images to enhanced resolution images enabled by machine learning |
Non-Patent Citations (16)
| Title |
|---|
| Can et al., "An Efficient CNN for Spectral Reconstruction from RGB Images", arxiv.org, Cornell University Library, Apr. 12, 2018, 5 pages. |
| Galliani et al., "Learned Spectral Super-Resolution", arXiv:1703.09470v1, Mar. 28, 2017, 10 pages. |
| Han et al: "Reconstruction From Multispectral to Hyperspectral Image Using Spectral Library-Based Dictionary Learning", IEEE Transactions on Geoscience and Remote Sensing, vol. 57, No. 3, Mar. 2019, pp. 1325-1335. |
| International Search Report and Written Opinion mailed on Dec. 14, 2020, received for PCT Application PCT/EP2020/079090, Filed on Oct. 15, 2020, 10 pages. |
| Li et al., "Hyperspectral image super-resolution using deep convolutional neural network", Journal of Latex Templates, Neurocomputing, Available Online at: https://www.researchgate.net/publication/317024713_Hyperspectral_image_super-resolution_using_deep_convolutional_neural_network May 3, 2017, pp. 1-35. |
| Mei et al., "Hyperspectral Image Spatial Super-Resolution via 3D Full Convolutional Neural Network", Remote Sensing, vol. 9, Available Online at: https://www.researchgate.net/publication/320913192_Hyperspectral_Image_Spatial_Super-Resolution_via_3D_Full_Convolutional_Neural_Network Nov. 7, 2017, pp. 1-22. |
| Miao et al: "lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement", 2019 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Oct. 27, 2019, pp. 4058-4068. |
| Yang et al., "Hyperspectral and Multispectral Image Fusion via Deep Two-Branches Convolutional Neural Network," May 21, 2018 (Year: 2018). * |
| Can et al., "An Efficient CNN for Spectral Reconstruction from RGB Images", arxiv.org, Cornell University Library, Apr. 12, 2018, 5 pages. |
| Galliani et al., "Learned Spectral Super-Resolution", arXiv:1703.09470v1, Mar. 28, 2017, 10 pages. |
| Han et al: "Reconstruction From Multispectral to Hyperspectral Image Using Spectral Library-Based Dictionary Learning", IEEE Transactions on Geoscience and Remote Sensing, vol. 57, No. 3, Mar. 2019, pp. 1325-1335. |
| International Search Report and Written Opinion mailed on Dec. 14, 2020, received for PCT Application PCT/EP2020/079090, Filed on Oct. 15, 2020, 10 pages. |
| Li et al., "Hyperspectral image super-resolution using deep convolutional neural network", Journal of Latex Templates, Neurocomputing, Available Online at: https://www.researchgate.net/publication/317024713_Hyperspectral_image_super-resolution_using_deep_convolutional_neural_network May 3, 2017, pp. 1-35. |
| Mei et al., "Hyperspectral Image Spatial Super-Resolution via 3D Full Convolutional Neural Network", Remote Sensing, vol. 9, Available Online at: https://www.researchgate.net/publication/320913192_Hyperspectral_Image_Spatial_Super-Resolution_via_3D_Full_Convolutional_Neural_Network Nov. 7, 2017, pp. 1-22. |
| Miao et al: "lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement", 2019 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Oct. 27, 2019, pp. 4058-4068. |
| Yang et al., "Hyperspectral and Multispectral Image Fusion via Deep Two-Branches Convolutional Neural Network," May 21, 2018 (Year: 2018). * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2021078629A1 (en) | 2021-04-29 |
| CN114556428A (en) | 2022-05-27 |
| US20240303773A1 (en) | 2024-09-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zhang et al. | Cloud detection method using CNN based on cascaded feature attention and channel attention | |
| Cao et al. | PanCSC-Net: A model-driven deep unfolding method for pansharpening | |
| US11551333B2 (en) | Image reconstruction method and device | |
| Zhang et al. | LR-Net: Low-rank spatial-spectral network for hyperspectral image denoising | |
| Chang et al. | HSI-DeNet: Hyperspectral image restoration via convolutional neural network | |
| Romero et al. | Unsupervised deep feature extraction for remote sensing image classification | |
| Jiang et al. | Multi-spectral RGB-NIR image classification using double-channel CNN | |
| Xie et al. | Hyperspectral image super-resolution using deep feature matrix factorization | |
| EP4163832B1 (en) | Neural network training method and apparatus, and image processing method and apparatus | |
| Li et al. | DMNet: A network architecture using dilated convolution and multiscale mechanisms for spatiotemporal fusion of remote sensing images | |
| Kemker et al. | Self-taught feature learning for hyperspectral image classification | |
| CN112819910A (en) | Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network | |
| CN115565045B (en) | Hyperspectral and multispectral image fusion method based on multiscale spatial spectrum transformation | |
| US20220180476A1 (en) | Systems and methods for image feature extraction | |
| Ahmed et al. | PIQI: perceptual image quality index based on ensemble of Gaussian process regression | |
| US20210374527A1 (en) | Information processing apparatus, information processing method, and storage medium | |
| US12340540B2 (en) | Imaging sensor, an image processing device and an image processing method | |
| Wang et al. | Pixel-to-abundance translation: Conditional generative adversarial networks based on patch transformer for hyperspectral unmixing | |
| Confalonieri et al. | An end-to-end framework for the classification of hyperspectral images in the wood domain | |
| El-gabri et al. | Dlra-net: Deep local residual attention network with contextual refinement for spectral super-resolution | |
| US12536616B2 (en) | Image processing device and image processing method | |
| CN109961083B (en) | Method and image processing entity for applying a convolutional neural network to an image | |
| Wang et al. | BDPartNet: Feature decoupling and reconstruction fusion network for infrared and visible image | |
| CN113256556B (en) | Image selection method and device | |
| Chen et al. | Hyperspectral remote sensing IQA via learning multiple kernels from mid-level features |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: SONY GROUP CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SARTOR, PIERGIORGIO;GATTO, ALEXANDER;UEMORI, TAKESHI;AND OTHERS;SIGNING DATES FROM 20221025 TO 20230803;REEL/FRAME:064575/0670 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |