This study explores the challenges in Artificial Intelligence (AI) systems in generating image captions, a task that requires effective integration of computer vision and natural language processing techniques. A comparative analysis between traditional approaches such as retrieval- based methods and linguistic templates) and modern approaches based on deep learning such as encoder-decoder models, attention mechanisms, and transformers). Theoretical results show that modern models perform better for the accuracy and the ability to generate more complex descriptions, while traditional methods outperform speed and simplicity. The paper proposes a hybrid framework that combines the advantages of both approaches, where conventional methods produce an initial description, which is then contextually, and refined using modern models. Preliminary estimates indicate that this approach could reduce the initial computational cost by up to 20% compared to relying entirely on deep models while maintaining high accuracy. The study recommends further research to develop effective coordination mechanisms between traditional and modern methods and to move to the experimental validation phase of the hybrid model in preparation for its application in environments that require a balance between speed and accuracy, such as real-time computer vision applications.
In this paper, visible image watermarking algorithm based on biorthogonal wavelet
transform is proposed. The watermark (logo) of type binary image can be embedded in the
host gray image by using coefficients bands of the transformed host image by biorthogonal
transform domain. The logo image can be embedded in the top-left corner or spread over the
whole host image. A scaling value (α) in the frequency domain is introduced to control the
perception of the watermarked image. Experimental results show that this watermark
algorithm gives visible logo with and no losses in the recovery process of the original image,
the calculated PSNR values support that. Good robustness against attempt to remove the
watermark was s
JPEG is most popular image compression and encoding, this technique is widely used in many applications (images, videos and 3D animations). Meanwhile, researchers are very interested to develop this massive technique to compress images at higher compression ratios with keeping image quality as much as possible. For this reason in this paper we introduce a developed JPEG based on fast DCT and removed most of zeros and keeps their positions in a transformed block. Additionally, arithmetic coding applied rather than Huffman coding. The results showed up, the proposed developed JPEG algorithm has better image quality than traditional JPEG techniques.
Image Fusion Using A Convolutional Neural Network
تعد مجالات الصورة وعلاماتها الحركية حضوراً دلالياً للاتصال العلامي واتساعاً في الرابطة الجدلية ما بين الدوال ومداليها، التي تقوم بها الرؤية الاخراجية لإنتاج دلالات اخفائية تمتلك جوهرها الانتقالي عبر الافكار بوصفها معطيات العرض، ويسعى التشفير الصوري الى بث ثنائية المعنى داخل الحقول المتعددة للعرض المسرحي، ولفهم المعنى المنبثق من هذه التشفيرات البصرية، تولدت الحاجة لبحث تشكيل هذه التشفيرات وكيفية تح
... Show MoreSubcutaneous vascularization has become a new solution for identification management over the past few years. Systems based on dorsal hand veins are particularly promising for high-security settings. The dorsal hand vein recognition system comprises the following steps: acquiring images from the database and preprocessing them, locating the region of interest, and extracting and recognizing information from the dorsal hand vein pattern. This paper reviewed several techniques for obtaining the dorsal hand vein area and identifying a person. Therefore, this study just provides a comprehensive review of existing previous theories. This model aims to offer the improvement in the accuracy rate of the system that was shown in previous studies and
... Show MoreA band rationing method is applied to calculate the salinity index (SI) and Normalized Multi-Band Drought Index (NMDI) as pre-processing to take Agriculture decision in these areas is presented. To separate the land from other features that exist in the scene, the classical classification method (Maximum likelihood classification) is used by classified the study area to multi classes (Healthy vegetation (HV), Grasslands (GL), Water (W), Urban (U), Bare Soil (BS)). A Landsat 8 satellite image of an area in the south of Iraq are used, where the land cover is classified according to indicator ranges for each (SI) and (NMDI).
Security concerns in the transfer of medical images have drawn a lot of attention to the topic of medical picture encryption as of late. Furthermore, recent events have brought attention to the fact that medical photographs are constantly being produced and circulated online, necessitating safeguards against their inappropriate use. To improve the design of the AES algorithm standard for medical picture encryption, this research presents several new criteria. It was created so that needs for higher levels of safety and higher levels of performance could be met. First, the pixels in the image are diffused to randomly mix them up and disperse them all over the screen. Rather than using rounds, the suggested technique utilizes a cascad
... Show More