[1]
N. V. Thinh, T. V. Lang, and V. T. Thanh, “RGTranCNet: Effective image captioning model using cross-attention and semantic knowledge”, Vietnam J. Sci. Technol., vol. 63, no. 5, Jul. 2025.