1.Thinh NV, Lang TV, Thanh VT. RGTranCNet: Effective image captioning model using cross-attention and semantic knowledge. Vietnam J. Sci. Technol. [Internet]. 2025 Jul. 15 [cited 2026 Jun. 1];64(1):123–138. Available from: https://vjst.vast.vn/jst/article/view/22381