Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs Agent] Docs Agent release version 0.4.0 #533

Merged
merged 2 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions examples/gemini/python/docs-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ check out the [Set up Docs Agent][set-up-docs-agent] section below.

Docs Agent's `agent runtask` command allows you to run pre-defined chains of prompts,
which are referred to as **tasks**. These tasks simplify complex interactions by defining
a series of steps that the Docs Agent will execute. The tasks are defined in `.yaml` files
stored in the [`tasks`][tasks-dir] directory of your Docs Agent project. The tasks are
a series of steps that the Docs Agent CLI will execute. The tasks are defined in `.yaml`
files stored in the [`tasks`][tasks-dir] directory of your Docs Agent project. The tasks are
designed to be reusable and can be used to automate common workflows, such as generating
release notes, updating documentation, or analyzing complex information.
release notes, drafting overview pages, or analyzing complex information.

A task file example:

Expand Down Expand Up @@ -101,6 +101,16 @@ The list below summarizes the tasks and features supported by Docs Agent:
agent runtask --task DraftReleaseNotes
```

- **Multi-modal support**: Docs Agent's `agent helpme` command can include image,
audio, and video files as part of a prompt to the Gemini 1.5 model, for example:

```sh
agent helpme Provide a concise, descriptive alt text for this PNG image --file ./my_image_example.png
```

You can use this feature for creating tasks as well. For example, see the
[DescribeImages][describe-images] task.

For more information on Docs Agent's architecture and features,
see the [Docs Agent concepts][docs-agent-concepts] page.

Expand Down Expand Up @@ -241,6 +251,13 @@ Clone the Docs Agent project and install dependencies:
**Important**: From this point, all `agent` command lines below need to
run in this `poetry shell` environment.

5. (**Optional**) To enable autocomplete commands and flags related to
Docs Agent in your shell environment, run the following command:

```
source scripts/autocomplete.sh
```

### 5. Edit the Docs Agent configuration file

This guide uses the [open source Flutter documents][flutter-docs-src] as an example dataset,
Expand Down Expand Up @@ -458,3 +475,4 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
[chunking-process]: docs/chunking-process.md
[new-15-mode]: docs/config-reference.md#app_mode
[tasks-dir]: tasks/
[describe-images]: tasks/describe-images-for-alt-text-task.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,6 @@ function convertDriveFolder(folderName, outputFolderName="", indexFile="") {
insertRichText(sheet, md_chip, "E", row_number);
insertRichText(sheet, folder_chip, "I", row_number);
}
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
}
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
}
30 changes: 30 additions & 0 deletions examples/gemini/python/docs-agent/docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,20 @@ For example:
agent helpme write a concept doc covering all features in this project? --allfiles ~/my-project --new
```

### Ask the model to print the output in JSON

The command below prints the output from the model in JSON format:

```sh
agent helpme <REQUEST> --response_type json
```

For example:

```sh
agent helpme how do I cook pasta? --response_type json
```

### Ask the model to run a pre-defined chain of prompts

The command below runs a task (a sequence of prompts) defined in
Expand Down Expand Up @@ -297,6 +311,22 @@ For example:
agent runtask --task IndexPageGenerator --custom_input ~/my_example/docs/development/
```

### Ask the model to print the output in plain text

By default, the `agent runtask` command uses Python's Rich console
to format its output. You can disable it by using the `--plaintext`
flag:

```sh
agent runtask --task <TASK> --plaintext
```

For example:

```sh
agent runtask --task DraftReleaseNotes --plaintext
```

## Managing online corpora

### List all existing online corpora
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"""Docs Agent"""

import typing
import os, pathlib

from absl import logging
import google.api_core
Expand Down Expand Up @@ -573,6 +574,73 @@ def ask_content_model_to_fact_check_prompt(self, context: str, prev_response: st
def generate_embedding(self, text, task_type: str = "SEMANTIC_SIMILARITY"):
return self.gemini.embed(text, task_type)[0]

# Generate a response to an image
def ask_model_about_image(self, prompt: str, image):
if not prompt:
prompt = f"Describe this image:"
if self.context_model.startswith("models/gemini-1.5"):
try:
# Adding prompt in the beginning allows long contextual
# information to be added.
response = self.gemini.generate_content([prompt, image])
except google.api_core.exceptions.InvalidArgument:
return self.config.conditions.model_error_message
else:
logging.error(f"The {self.context_model} can't read an image.")
response = None
exit(1)
return response

# Generate a response to audio
def ask_model_about_audio(self, prompt: str, audio):
if not prompt:
prompt = f"Describe this audio clip:"
audio_size = os.path.getsize(audio)
# Limit is 20MB
if audio_size > 20000000:
logging.error(f"The audio clip {audio} is too large: {audio_size} bytes.")
exit(1)
# Get the mime type of the audio file and trim the . from the extension.
mime_type = "audio/" + pathlib.Path(audio).suffix[:1]
audio_clip = {
"mime_type": mime_type,
"data": pathlib.Path(audio).read_bytes()
}
if self.context_model.startswith("models/gemini-1.5"):
try:
response = self.gemini.generate_content([prompt, audio_clip])
except google.api_core.exceptions.InvalidArgument:
return self.config.conditions.model_error_message
else:
logging.error(f"The {self.context_model} can't read an audio clip.")
exit(1)
return response

# Generate a response to video
def ask_model_about_video(self, prompt: str, video):
if not prompt:
prompt = f"Describe this video clip:"
video_size = os.path.getsize(video)
# Limit is 2GB
if video_size > 2147483648:
logging.error(f"The video clip {video} is too large: {video_size} bytes.")
exit(1)
request_options = {
"timeout": 600
}
mime_type = "video/" + pathlib.Path(video).suffix[:1]
video_clip_uploaded =self.gemini.upload_file(video)
video_clip = self.gemini.get_file(video_clip_uploaded)
if self.context_model.startswith("models/gemini-1.5"):
try:
response = self.gemini.generate_content([prompt, video_clip],
request_options=request_options)
except google.api_core.exceptions.InvalidArgument:
return self.config.conditions.model_error_message
else:
logging.error(f"The {self.context_model} can't see video clips.")
exit(1)
return response

# Function to give an embedding function for gemini using an API key
def embedding_function_gemini_retrieval(api_key, embedding_model: str):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,16 @@ from your `$HOME` directory.
poetry shell
```

Entering the `poetry shell` environment is **required** for
running the `agent` command.
**Important**: You must always enter the `poetry shell` environment
to run the `agent` command.

2. Run the `agent helpme` command, for example:
2. Enable autocomplete for Docs Agent CLI options in your environment:

```
source scripts/autocomplete.sh
```

3. Run the `agent helpme` command, for example:

```
agent helpme how do I cook pasta?
Expand All @@ -113,7 +119,7 @@ from your `$HOME` directory.
This command returns the Gemini model's response of your input prompt
`how do I cook pasta?`.

3. View the list of Docs Agent tasks available in your setup:
4. View the list of Docs Agent tasks available in your setup:

```
agent runtask
Expand All @@ -122,7 +128,7 @@ from your `$HOME` directory.
This command prints a list of Docs Agent tasks that you can run.
(See the `tasks` directory in your local Docs Agent setup.)

4. Run the `agent runtask` command, for example:
5. Run the `agent runtask` command, for example:

```
agent runtask --task IndexPageGenerator
Expand Down
Loading
Loading