Object Recognition with Description Using Convolutional Neural Network: Informative Essay

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now

Abstract –

Object recognition is a sub-field of computer vision and it is based on machine learning. In the past several year Machine learning has been dominated by neural network and which provides implements in computing power and data availability. Sub-type of Neural Networks is called Convolutional Neural Networks (CNN) and is suited for image-related tasks. This convolutional Neural Network is trained to look for different features such as spatial, appearance, structure, edges, and corners across the image and to combine them into more complex structures. In Object detection, the system needs both Localization of probable Objects for the classification of these features. Convolution object recognition is a viewing technology despite other object detection methods. It is possible to create a functional implementation of a neural network without access to specialized hardware. It also decreases the training time. We have implemented Convolutional Neural Network using java since the neural network is more effective and relatively precise than other convolutional object detection methods. Hence combining the Convolutional Neural Network with the Stream Mining algorithm for object recognition it will make it work faster as the stream mining algorithm changes the large stream of datasets and descriptions associated with it. For Stream mining and Convolutional Neural Networks, we have used Rapid Miner software which uses java for machine learning concepts for efficient outcomes.

.Keywords – Convolutional Neural Network, Stream mining, Correlation matching, Semantic matching, Object Recognition.

Introduction

Object Recognition is widely studied in Image mining. The main goal to Outline based item seek is a testing issue predominantly because of three challenges: [1] how to coordinate the essential draw question with the bright picture, [2] how to find the little item in a major picture that is like the draw question what’s more, [3] given the expansive picture database, how to guarantee a proficient hunt plot that is sensibly versatile. [4]To address the above difficulties, we propose to use [5]object proposition for object pursuit and limitation. Nonetheless, rather than

absolutely depending on outline highlights[6], we propose to completely use the appearance highlights of article recommendations[7] to determine the ambiguities between coordinating portrayal inquiry and article recommendations.[8] Our proposed inquiry versatile hunt is detailed as a sub-chart choice issue, which can be comprehended by the greatest stream calculation.[9]By performing inquiry development, it can precisely[10] find the little target protests in a jumbled foundation or thickly drawn distorted serious animation (Manga-like) pictures.[11]To make strides in the registering effectiveness of coordinating proposition hopefuls, the proposed Multi-View Spatially Constrained Proposal Selection (MVSCPS) [12] encodes each recognized item proposition as far as a little neighborhood premise of stay objects.[13]The outcomes on benchmark datasets approve the upsides of using both the outline also, appearance highlights for outline-based pursuit,[14] while guaranteeing adequate adaptability in the meantime. To make efficient for object recognition, we used [15] convolutional Neural Network algorithm. To overcome [16]the problem of the sketch (or sketch-like object) based object search in a real-life image or video. [17]A multi-view clustering approach to identifying a shortlist of spatially distinct proposals is also capable of handling the[18] content level (spatial and structural) differences among proposals within a single formulation.

Related work

In 2018, Mingxing Zhang, Yang Yang, and Hanwang Zhang explained Precise and Detailed Image Captioning using Online Positive Recall and Missing Concepts Mining, great progress in automatic image captioning has been achieved by using semantic concepts detected from the image. However, we argue that the existing concepts-to-caption framework, in which the concept detector is trained using the image-caption pairs to minimize the vocabulary discrepancy, suffers from the deficiency of insufficient concepts. The reasons are two-fold: 1) the extreme imbalance between the number of occurrences of positive and negative samples of the concept; and 2) the incomplete labeling in training captions caused by the biased annotation and usage of synonyms. In this paper, we propose a method, termed Online Positive Recall and Missing Concepts Mining (OPR-MCM), to overcome those problems. Our method adaptively re-weights the loss of different samples according to their predictions for online positive recall and uses a two-stage optimization strategy for missing concept mining. In this way, more semantic concepts can be detected and high accuracy will be expected. In the caption generation stage, we explore an element-wise selection process to automatically choose the most suitable concepts at each time step. Thus, our method can generate more precise and detailed captions to describe the image. We conduct extensive experiments on the MSCOCO image captioning dataset and the MSCOCO online test server, which shows that our method achieves superior image captioning performance compared with other competitive methods.

In 2014, Chenggang Yan, and Yongdong Zhang, explained A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decisions on Many-core Processors. High-Efficiency Video Coding (HEVC) uses a very flexible tree structure to organize coding units, which leads to superior coding efficiency compared with previous video coding standards. However, such a flexible coding unit tree structure also places a great challenge for encoders. In order to fully exploit the coding efficiency brought by this structure, a huge amount of computational complexity is needed for an encoder to decide the optimal coding unit tree for each image block. One way to achieve this is to use parallel computing enabled by many-core processors. In this paper, we analyze the challenge to use many-core processors to make coding unit tree decisions. Through an in-depth understanding of the dependency among different coding units, we propose a parallel framework to decide on coding unit trees. Experimental results show that, on the Tile64 platform, our proposed method achieves an average of more than 11 and 16 times speedup for 1920×1080 and 2560×1600 video sequences, respectively, without any coding efficiency degradation.

In 2017, Chenggang Yan, and Hongtao Xie, explained Effective Uyghur Language Text Detection in Complex Background Images for Traffic Prompt Identification Text detection in complex background images is a challenging task for intelligent vehicles. Actually, almost all the widely-used systems focus on commonly used languages while for some minority languages, such as the Uyghur language, text detection is paid less attention. In this paper, we propose an effective Uyghur language text detection system for complex background images. First, a new channel-enhanced maximally stable extremal regions (MSERs) algorithm is put forward to detect component candidates. Second, a two-layer filtering mechanism is designed to remove most non-character regions. Third, the remaining component regions are connected into short chains, and the short chains are extended by a novel extension algorithm to connect the missed MSERs. Finally, a two-layer chain elimination filter is proposed to prune the non-text chains. To evaluate the system, we build a new data set by various Uyghur texts

with complex backgrounds. Extensive experimental comparisons show that our system is obviously effective for Uyghur language text detection in complex background images. The F-measure is 85%, which is much better than the state-of-the-art performance of 75.5%.

In 2014, Chenggang Yan, and Yongdong Zhang Proposed an Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors High Efficiency Video Coding (HEVC) provides superior coding efficiency than previous video coding standards at the cost of increasing encoding complexity. The complexity increase of the motion estimation (ME) procedure is rather significant, especially when considering the complicated partitioning structure of HEVC. To fully exploit the coding efficiency brought by HEVC requires a huge amount of computation. In this paper, we analyze the ME structure in HEVC and propose a parallel framework to decouple ME for different partitions on many-core processors. Based on the local parallel method (LPM), we first use the directed acyclic graph (DAG)-based order to parallelize coding tree units (CTUs) and adopt improved LPM (ILPM) within each CTU (DAGILPM), which exploits the CTU-level and prediction unit (PU)-level parallelism. Then, we find that there exist completely independent PUs (CIPUs) and partially independent PUs (PIPUs). When the degree of parallelism (DP) is smaller than the maximum DP of DAGILPM, we process the CIPUs and PIPUs, which further increases the DP. The data dependencies and coding efficiency stay the same as LPM. Experiments show that on a 64-core system, compared with serial execution, our proposed scheme achieves more than 30 and 40 times speedup for 1920×1080 and 2560 ×1600 video sequences, respectively.

In 2015, Cem Tekin explained Active Learning in Context-Driven Stream Mining with an Application for Image Mining in which images arrive with contexts (metadata) and need to be processed in real-time by the image mining system (IMS), which needs to make predictions and derive actionable intelligence from these streams. After extracting the features of the image by preprocessing, IMS determines online which of its available classifiers it should use on the extracted features to make a prediction using the context of the image. A key challenge associated with stream mining is that the prediction accuracy of the classifiers is unknown since the image source is unknown; thus these accuracies need to be learned online. Another key challenge of stream mining is that learning can only be done by observing the true label, but this is costly to obtain. To address these challenges, we model the image stream mining problem as an active, online contextual experts problem, where the context of the image is used to guide the classifier selection decision. We develop an active learning algorithm and show that it achieves regret sublinear in the number of images that have been observed so far. To further illustrate and assess the performance of our proposed methods, we apply them to diagnose breast cancer from images of cellular samples obtained from fine needle aspirate (FNA) of a breast mass. Our findings show that very high diagnosis accuracy can be achieved by actively obtaining only a small fraction of true labels through surgical biopsies. Other applications include video surveillance and video traffic monitoring.

Correlation matching mechanism

Correlation is widely used as an effective similarity measure in matching tasks. correlation-based matching methods for matching two images. This method is based on the rotation and scale invariant normalized cross-correlation. Both the size and the orientation of the correlation windows are determined according to the characteristic scale and the dominant direction of the interest points.

By using this algorithm, we specify the search-based technique that is we give the text keyword to search the related image based on the keyword. These are members of a broader class of learning algorithms, denoted subspace learning, which is computationally efficient and produces linear transformations that are easy to conceptualize, implement, and deploy. The formula for matching the similarity between the pair is X 1(t) X 2(t )dt

Semantic matching mechanism

Semantic matching is a technique used in computer science to identify information that is semantically related. Given any two graph-like structures, e.g. classifications,

Taxonomy_(general)’ taxonomies database or XML schemas and ontologies, matching is an operator which identifies those nodes in the two structures which semantically correspond to one another. Semantic relation(equivalence)we determine semantic relations by analyzing the meaning(concepts, not labels). There are various similarity measures, that we used

Jacard similarity:

The Jaccard coefficient, which is sometimes referred to as the Tanimoto coefficient, measures similarity as the intersection divided by the union of the objects. For text documents, the Jaccard coefficient compares the sum weight of shared terms to the sum weight of terms that are present in either of the two documents but are not shared terms.

The Jaccard coefficient is a similarity measure and ranges between 0 and 1. It is 1 When Ta = Tb and 0 when

Ta and Tb are disjoint, where 1 means the two objects are the same, and 0 means they are completely different.

The corresponding distance measure is Di = 1  SIMLARITYi

and we will use Di instead in subsequent experiments.

Stream mining algorithm

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). In most models, these algorithms have access to limited memory (generally logarithmic in the size of and/or the maximum value in the stream). They may also have limited processing time per item.

Convolutional Neural Network

A convolutional neural network (CNN) is a specific type of artificial neural network that uses perceptrons, a machine learning unit algorithm, for supervised learning, to analyze data. CNNs apply to image processing, natural language processing, and other kinds of cognitive tasks. A convolutional neural network is also known as a ConvNet.

Correlation is widely used as an effective similarity measure in matching tasks. In this paper, we propose a new correlation-based method for matching two images. This method is based on the rotation and scale invariant normalized cross-correlation. Both the size and the orientation of the correlation windows are determined according to the characteristic scale and the dominant direction of the interest points.

A regular neural network or multi-layer perceptron (MLP) is made up of input/out layers and a series of hidden layers. An example of MLP which contains only one hidden layer. Note that each hidden layer is fully connected to the previous layer, i.e., each neuron in the hidden layer connects to all neurons in the previous layer.

In purely mathematical terms, convolution is a function derived from two given functions by integration which expresses how the shape of one is modified by the other. the convolution formula:

The convolution operation:

  • Input image
  • Feature detector
  • Feature map

Steps:

  1. From the input image, the matrix is taken by beginning from the top-left corner within the borders you see demarcated above, and then you count the number of cells in which the feature detector matches the input image.
  2. The number of matching cells is then inserted in the top-left cell of the feature map.
  3. You then move the feature detector one cell to the right and do the same thing. This movement is called a and since we are moving the feature detector one cell at a time, that would be called a stride of one pixel.
  4. you will find that the feature detector’s middle-left cell with the number 1 inside it matches the cell that it is standing over inside the input image. That’s the only matching cell, and so you write 1 in the next cell in the feature map, and so on and so forth.
  5. After you have gone through the whole first row, you can then move over to the next row and go through the same process

The extraction of the image is in matrix type and compare each pixel for the best extraction.

Actually, we use a convolution matrix to adjust an image. Here are a few examples of filters being applied to images using these matrices

Conclusion

In this work, image recognition using the Convolutional Neural Network is introduced with multi-view feature extraction. Different model architectures are proposed by incorporating different prior elegant CNN. Extensive parameters are discussed that can influence model performance. Deeper exploring different parameters that can be suited for the CNN recognition model is presented as well. The stream mining algorithm plays a vital role in extracting quality information. Related to the implementation, we also learned that there are no easy out of the box solutions for effectively implementing a Convolutional neural network. Regarding precision the results were promising. This shows how a system trained on general image data can be used to detect objects in a specific task, thus demonstrating the adaptability of the methods. one of the strengths of Convolutional Neural Networks is their inherit translation invariance. Instead of taking the whole image into consideration, it potentially creates an even more precise system. Deeper and more similarly used neural networks could learn the probabilities of finding an object from a certain part of the scene. However, time will show if this is the route picture research task. As a Danish proverb says, it is difficult to make predictions, especially about the future.

References

  1. Chenggang Yan, Yongdong Zhang, High-Efficiency Video Coding (HEVC) provides superior coding efficiency than previous video coding standards at the cost of increasing encoding complexity,  IEE Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors, -2014.
  2. Cem Tekin,  IMS determines online which of its available classifiers it should use on the extracted features to make a prediction using the context of the image, IEE Active Learning in Context-Driven Stream Mining with an Application to Image Mining, – 2015.
  3. Chenggang Yan, Hongtao Xie,  An effective Uyghur language text detection system in complex background images,  IEE Effective Uyghur Language Text Detection in Complex Background Images for Traffic Prompt Identification, -2017.
  4. Mingxing Zhang, Yang Yang, Hanwang Zhang, great progress in automatic image captioning has been achieved by using semantic concepts detected from the image, IEE More is Better: Precise and Detailed Image Captioning using Online Positive Recall andMissing Concepts Mining, -2018.
  5. Chin-Chuan Han, Hsu-Liang Cheng, et al., ‘Personal authentication using palm-print features’, Pattern Recognition, vol. 36, pp. 371-381, 2003.
  6. J.T. Anthony Lee, Ruey-Wen Hong, et al., ‘Mining spatial association rules in image databases’, Information Sciences, vol. 177, pp. 1593-1608, 2007.
  7. P. Rajendran, M. Madheswaran, ‘An Improved Image Mining Technique For Brain Tumour Classification Using Efficient classifier’, (IJCSIS) International Journal of Computer Science and Information Security, vol. 6, no. 3, 2009
  8. Shrey Dutta, Naveen Sankaran, Pramod Sankar K, C.V. Jawahar, ‘Robust Recognition of Degraded Documents Using Character N-Grams’, IEEE, 2012.
  9. Konstantinos Ntirogiannis, Basilis Gatos, Ioannis Pratikakis, ‘A Performance Evaluation Methodology for Historical Document Image Binarization’, IEEE International Conference on Document Analysis and Recognition, 2013.
  10. A. Krizhevsky, I. Sutskever, G. E. Hinton, ‘Imagenet classification with deep convolutional neural networks’, Advances in neural information processing systems, pp. 1097-1105, 2012.
  11. N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network for modeling sentences, 2014.
  12. K. Asanobu, ‘Data Mining for Typhoon Image’, Proc on MDM/KDD 2001, vol. 68277, 2001
  13. E Chang, Chen Li, et al., ‘Searching Near- Replicas of Images via Clustering’, Proc of SPIE, vol. 2812292, 1999.
  14. J.T. Anthony Lee, Ruey-Wen Hong, et al., ‘Mining spatial association rules in image databases’, Information Sciences, vol. 177, pp. 1593-1608, 2007.

Need help with assignments?

Our qualified writers can create original, plagiarism-free papers in any format you choose (APA, MLA, Harvard, Chicago, etc.)

Order from us for quality, customized work in due time of your choice.

Click Here To Order Now