[Paper Reading] Image Captioning using Deep Neural Architectures (arXiv: 1801.05568v1)
2024-08-27 06:37:49
Main Contributions:
- A brief introduction about two different methods (retrieval based method and generative method) for image captioning task.
- The authors implemented the classical model, Show and Tell, and gave analyses based on the experiments.
Excerpts:
- To achieve this goal, Show & Tell model is created by hybridizing two different models. It takes the image as input and provides it into Inception-v3 model. At the end of Inception-v3 model, a single fully connected layer is added. This layer will transform the output of Inception-v3 model into a word embedding vector. We input this word embedding vector into series of LSTM cells.
- For any given caption, we add two additional symbols as the start word and stop word. Whenever the stop word is encounted, it stops generating the sentence and it marks end of the string.
- Show & Tell model uses Beam Search to find suitable words to generate captions.
最新文章
- 提升网速的路由器优化方法(UPnP、QoS、MTU、交换机模式、无线中继)
- [Spring框架]Spring AOP基础入门总结一.
- MVVM小记
- 使用自己的CSS框架(转)
- C++ Primer学习笔记二
- jquery dialog的关闭事件不触发,触发不了
- 在 Ubuntu 14.04 中安装 Pepper Flash Player For Chromium
- Java 锁机制 synchronized
- hello随笔
- [HAOI2007]反素数
- BZOJ2333 [SCOI2011]棘手的操作 堆 左偏树 可并堆
- composer 安装yii2 The package is not available in a stable-enough version解决办法
- 问题解决:Apache: You don't have permission to access / on this server
- Jmeter做并发测试(设置集合点)
- bzoj 2011
- 【Android】6.3 ProgressDialog
- python-生产者消费者模式
- 20155334 实验三 敏捷开发与XP实践
- 20145312 实验二《 Java面向对象程序设计》
- Sql學習資源