The image caption is the process of adding an explicit, coherent description to the contents of the image. This is done by using the latest deep learning techniques, which include computer vision and natural language processing, to understand the contents of the image and give it an appropriate caption. Multiple datasets suitable for many applications have been proposed. The biggest challenge for researchers with natural language processing is that the datasets are incompatible with all languages. The researchers worked on translating the most famous English data sets with Google Translate to understand the content of the images in their mother tongue. In this paper, the proposed review aims to enhance the understanding of image captioning strategies and to survey previous research related to image captioning while examining the most popular databases in different languages, mostly English, translating into other languages using the latest models for describing images, summarizing evaluation measures, and comparing them.
The recent emergence of sophisticated Large Language Models (LLMs) such as GPT-4, Bard, and Bing has revolutionized the domain of scientific inquiry, particularly in the realm of large pre-trained vision-language models. This pivotal transformation is driving new frontiers in various fields, including image processing and digital media verification. In the heart of this evolution, our research focuses on the rapidly growing area of image authenticity verification, a field gaining immense relevance in the digital era. The study is specifically geared towards addressing the emerging challenge of distinguishing between authentic images and deep fakes – a task that has become critically important in a world increasingly reliant on digital med
... Show MoreFG Mohammed, HM Al-Dabbas, Iraqi journal of science, 2018 - Cited by 6
With the rapid development of smart devices, people's lives have become easier, especially for visually disabled or special-needs people. The new achievements in the fields of machine learning and deep learning let people identify and recognise the surrounding environment. In this study, the efficiency and high performance of deep learning architecture are used to build an image classification system in both indoor and outdoor environments. The proposed methodology starts with collecting two datasets (indoor and outdoor) from different separate datasets. In the second step, the collected dataset is split into training, validation, and test sets. The pre-trained GoogleNet and MobileNet-V2 models are trained using the indoor and outdoor se
... Show MoreThe traditional city suffers from the decline of the urban image due to urban development and homogeneity with the urban context of the city, and because of the lack of determinants governing the urban image, it is that the center of the city of traditional Kadhimiya suffers from a break in the urban image, Therefore, the research included how to build a distinctive urban image of the center of the traditional city of Kadhimiya and achieve the visual pleasure and comfort of the recipient and the urban image here means is an image not picture which are related to several aspects, including physical, social and psychological as well as the collective memory of individuals and their rela
Text based-image clustering (TBIC) is an insufficient approach for clustering related web images. It is a challenging task to abstract the visual features of images with the support of textual information in a database. In content-based image clustering (CBIC), image data are clustered on the foundation of specific features like texture, colors, boundaries, shapes. In this paper, an effective CBIC) technique is presented, which uses texture and statistical features of the images. The statistical features or moments of colors (mean, skewness, standard deviation, kurtosis, and variance) are extracted from the images. These features are collected in a one dimension array, and then genetic algorithm (GA) is applied for image clustering.
... Show MoreNowadays, still images are used everywhere in the digital world. The shortages of storage capacity and transmission bandwidth make efficient compression solutions essential. A revolutionary mathematics tool, wavelet transform, has already shown its power in image processing. The major topic of this paper, is improve the compresses of still images by Multiwavelet based on estimation the high Multiwavelet coefficients in high frequencies sub band by interpolation instead of sending all Multiwavelet coefficients. When comparing the proposed approach with other compression methods Good result obtained
conventional FCM algorithm does not fully utilize the spatial information in the image. In this research, we use a FCM algorithm that incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership functions in the neighborhood of each pixel under consideration. The advantages of the method are that it is less
sensitive to noise than other techniques, and it yields regions more homogeneous than those of other methods. This technique is a powerful method for noisy image segmentation.
Although the Wiener filtering is the optimal tradeoff of inverse filtering and noise smoothing, in the case when the blurring filter is singular, the Wiener filtering actually amplify the noise. This suggests that a denoising step is needed to remove the amplified noise .Wavelet-based denoising scheme provides a natural technique for this purpose .
In this paper a new image restoration scheme is proposed, the scheme contains two separate steps : Fourier-domain inverse filtering and wavelet-domain image denoising. The first stage is Wiener filtering of the input image , the filtered image is inputted to adaptive threshold wavelet
... Show MoreSemantic segmentation is an exciting research topic in medical image analysis because it aims to detect objects in medical images. In recent years, approaches based on deep learning have shown a more reliable performance than traditional approaches in medical image segmentation. The U-Net network is one of the most successful end-to-end convolutional neural networks (CNNs) presented for medical image segmentation. This paper proposes a multiscale Residual Dilated convolution neural network (MSRD-UNet) based on U-Net. MSRD-UNet replaced the traditional convolution block with a novel deeper block that fuses multi-layer features using dilated and residual convolution. In addition, the squeeze and execution attention mechanism (SE) and the s
... Show More