diff --git a/doc/jsk_perception/nodes/vqa_node.md b/doc/jsk_perception/nodes/vqa_node.md
index f31f0c0498..aa32d6667d 100644
--- a/doc/jsk_perception/nodes/vqa_node.md
+++ b/doc/jsk_perception/nodes/vqa_node.md
@@ -74,10 +74,12 @@ make
 In the remote GPU machine,
 ```shell
 cd jsk_recognition/jsk_perception/docker
-./run_jsk_vil_api --port (Your vacant port) --ofa_task caption --ofa_model_scale huge
+./run_jsk_vil_api ofa --port (Your vacant port) --ofa_task caption --ofa_model_scale huge
 ```
 
 
+You should set a model argument. It should be `ofa` or `clip`.
+
 `--ofa_task` should be `caption` or `vqa`. Empirically, the output results are more natural for VQA tasks with the Caption model than with the VQA model in OFA.