The paper "Multimedia Data Fusion" is a great example of a report on media. Hybrid level multimodal fusion is meant at enjoying both the advantages accrued from the Decision level multimodal and Feature level multimodal fusion. The features, in this case, are in the 1st instance fused with a FF unit, and then the resultant vector is analyzed by an AU. Consequently the individual features are studied under other completely different AUs together with other decision features using the DF units (Sharma & Kaur, 2013). Further fusion occurs in the latter stages of all the decisions obtained as the final decision. Fig 4: Hybrid level multimodal F1 F2 Fn-1 Source: (Atrey, Hossain, El Saddik & Kankanhalli, 2010) Technical Review in Fusion Text and Image for Mining in Social Media Increased use of social media like Twitter, Facebook, and Instagram increased the volume of the flowing data to deal with in terms of analyzing and data extraction.
The social network content is multimedia, images, and texts. To get this information from twitter, the multimedia has to use text mining techniques that give them automatic ability to detect the sent message.
Once the message I twitted (written), they are filtered out to remain with the group of English tweets. Some Spanish & Dutch tweets still remain at this stage (He, Zha & Li, 2013). . The remaining tweets then are: tokenized- convert the list of strings into tokens based on the whitespace and also remove the punctuations, stop word filtering-eliminate common words as their presence is mean, stemming filtering-remove words to it’ s by removing the suffixes and the prefixes, and indexing-use of TF-IDF which weighs twitter features based on the frequency of use of each word in a single tweet compared to an overall number of tweets (Sun, Wang, Cheng & Fu, 2014). To ensure accuracy image mining of data, three vector features can be used namely: histogram of oriented gradient (HOG), Grey-level Co-occurrence matrix (GLCM) used to describe the color and the texture.
HOG descriptors that are utilized in the computer vision and image processing all used for data/object detection. However, the appearance of the object and image shape is dependent on the intensity gradient. However, GLCM is crucial in text description and is mostly applied in measuring the surface textures.
Therefore fusion is applied for text and image by a mere combination of the image and the text features (Kompatsiaris & Hobson, 2008). It is proper to note that infusion method, where the text mining score is dismal in comparison to the threshold, the text mining in such as case cannot be depended on hence the tweet is solely classified using the image only and vice versa (M. Alqhtani, Luo & Regan, 2015). `Internet increases the growth of the need for digital multimedia information.
The common information is normally images and text and at the time all in one. Fusing futures in multimedia, therefore, take two paths, either through the late fusion of early fusion. In late fusion, it focuses on multiple features and carries the fusion strategy using different candidate results though the correlation between original features may cause this strategy to underperform. Early fusion enhances similarity evaluation by mapping different features in a unified space. The system experience problem like high cost due to its unified feature space built with respect to the global statistical information.
The creation of a large database using this technique is expensive due to the diversity and web content involve in social media like twitter (Liu & Qin, 2014).