Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy Nitro docs #174

Merged
merged 7 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 41 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,9 @@
- Quick Setup: Approximately 10-second initialization for swift deployment.
- Enhanced Web Framework: Incorporates drogon cpp to boost web service efficiency.

## Documentation

## About Nitro

Nitro is a light-weight integration layer (and soon to be inference engine) for cutting edge inference engine, make deployment of AI models easier than ever before!
Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.

The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.

Expand All @@ -40,37 +38,57 @@ The binary of nitro after zipped is only ~3mb in size with none to minimal depen

## Quickstart

**Step 1: Download Nitro**
**Step 1: Install Nitro**

To use Nitro, download the released binaries from the release page below:
- For Linux and MacOS

[![Download Nitro](https://img.shields.io/badge/Download-Nitro-blue.svg)](https://github.com/janhq/nitro/releases)
```bash
curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash -
```

After downloading the release, double-click on the Nitro binary.
- For Windows

**Step 2: Download a Model**
```bash
powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }"
```

Download a llama model to try running the llama C++ integration. You can find a "GGUF" model on The Bloke's page below:
**Step 2: Downloading a Model**

[![Download Model](https://img.shields.io/badge/Download-Model-green.svg)](https://huggingface.co/TheBloke)
```bash
mkdir model && cd model
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
```

**Step 3: Run Nitro**
**Step 3: Run Nitro server**

Double-click on Nitro to run it. After downloading your model, make sure it's saved to a specific path. Then, make an API call to load your model into Nitro.
```bash title="Run Nitro server"
nitro
```

**Step 4: Load model**

```zsh
curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
```bash title="Load model"
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
"llama_model_path": "/path/to/your_model.gguf",
"ctx_len": 2048,
"llama_model_path": "/model/llama-2-7b-model.gguf",
"ctx_len": 512,
"ngl": 100,
"embedding": true,
"n_parallel": 4,
"pre_prompt": "A chat between a curious user and an artificial intelligence",
"user_prompt": "USER: ",
"ai_prompt": "ASSISTANT: "
}'
```

**Step 5: Making an Inference**

```bash title="Nitro Inference"
curl http://localhost:3928/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Who won the world series in 2020?"
},
]
}'
```

Expand All @@ -89,7 +107,6 @@ Table of parameters
| `system_prompt` | String | The prompt to use for system rules. |
| `pre_prompt` | String | The prompt to use for internal configuration. |


***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal
```zsh
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port])
Expand All @@ -98,32 +115,13 @@ Table of parameters
- host : host value normally 127.0.0.1 or 0.0.0.0
- port : the port that nitro got deployed onto

**Step 4: Perform Inference on Nitro for the First Time**

```zsh
curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--header 'Access-Control-Allow-Origin: *' \
--data '{
"messages": [
{"content": "Hello there 👋", "role": "assistant"},
{"content": "Can you write a long story", "role": "user"}
],
"stream": true,
"model": "gpt-3.5-turbo",
"max_tokens": 2000
}'
```

Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.

## Compile from source
To compile nitro please visit [Compile from source](docs/manual_install.md)
To compile nitro please visit [Compile from source](docs/new/build-source.md)

### Contact

- For support, please file a GitHub ticket.
- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH).
- For long-form inquiries, please email [email protected].

- For long-form inquiries, please email [email protected].
48 changes: 44 additions & 4 deletions docs/docs/examples/chatbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,50 @@
title: Nitro with Chatbox
---

:::info COMING SOON
:::
This guide demonstrates how to integrate Nitro with Chatbox, showcasing the compatibility of Nitro with various platforms.

<!--
## What is Chatbox?
Chatbox is a versatile desktop client that supports multiple cutting-edge Large Language Models (LLMs). It is available for Windows, Mac, and Linux operating systems.

## How to use Nitro as backend -->
For more information, please visit the [Chatbox official GitHub page](https://github.com/Bin-Huang/chatbox).


## Downloading and Installing Chatbox

To download and install Chatbox, follow the instructions available at this [link](https://github.com/Bin-Huang/chatbox#download).

## Using Nitro as a Backend

1. Start Nitro server

Open your command line tool and enter:
```
nitro
```

> Ensure you are using the latest version of [Nitro](new/install.md)

2. Run the Model

To load the model, use the following command:

```
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
"llama_model_path": "model/llama-2-7b-chat.Q5_K_M.gguf",
"ctx_len": 512,
"ngl": 100,
}'
```

3. Config chatbox
Adjust the `settings` in Chatbox to connect with Nitro. Change your settings to match the configuration shown in the image below:

![Settings](img/chatbox.PNG)

4. Chat with the Model

Once the setup is complete, you can start chatting with the model using Chatbox. All functions of Chatbox are now enabled with Nitro as the backend.

## Video demo
Binary file added docs/docs/examples/img/chatbox.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions docs/docs/new/about.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: About Nitro
slug: /docs
slug: /about
---

Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
Expand Down Expand Up @@ -119,7 +119,3 @@ Nitro welcomes contributions in various forms, not just coding. Here are some wa

- [drogon](https://github.com/drogonframework/drogon): The fast C++ web framework
- [llama.cpp](https://github.com/ggerganov/llama.cpp): Inference of LLaMA model in pure C/C++

## FAQ
:::info COMING SOON
:::
1 change: 1 addition & 0 deletions docs/docs/new/architecture.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Architecture
slug: /achitecture
---

![Nitro Architecture](img/architecture.drawio.png)
Expand Down
1 change: 1 addition & 0 deletions docs/docs/new/build-source.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Build From Source
slug: /build-source
---

This guide provides step-by-step instructions for building Nitro from source on Linux, macOS, and Windows systems.
Expand Down
20 changes: 20 additions & 0 deletions docs/docs/new/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: FAQs
slug: /faq
---

### 1. Is Nitro the same as Llama.cpp with an API server?

Yes, that's correct. However, Nitro isn't limited to just Llama.cpp; it will soon integrate multiple other models like Whisper, Bark, and Stable Diffusion, all in a single binary. This eliminates the need for you to develop a separate API server on top of AI models. Nitro is a comprehensive solution, designed for ease of use and efficiency.

### 2. Is Nitro simply Llama-cpp-python?

Indeed, Nitro isn't bound to Python, which allows you to leverage high-performance software that fully utilizes your system's capabilities. With Nitro, learning how to deploy a Python web server or use FastAPI isn't necessary. The Nitro web server is already fully optimized.

### 3. Why should I switch to Nitro over Ollama?

While Ollama does provide similar functionalities, its design serves a different purpose. Ollama has a larger size (around 200MB) compared to Nitro's 3MB distribution. Nitro's compact size allows for easy embedding into subprocesses, ensuring minimal concerns about package size for your application. This makes Nitro a more suitable choice for applications where efficiency and minimal resource usage are key.

### 4. Why is the model named "chat-gpt-3.5"?

Many applications implement the OpenAI ChatGPT API, and we want Nitro to be versatile for any AI client. While you can use any model name, we've ensured that if you're already using the chatgpt API, switching to Nitro is seamless. Just replace api.openai.com with localhost:3928 in your client settings (like Chatbox, Sillytavern, Oobaboga, etc.), and it will work smoothly with Nitro.
1 change: 1 addition & 0 deletions docs/docs/new/model-cycle.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Model Life Cycle
slug: /model-cycle
---

## Load model
Expand Down
1 change: 1 addition & 0 deletions docs/docs/new/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Quickstart
slug: /quickstart
---

## Step 1: Install Nitro
Expand Down
9 changes: 8 additions & 1 deletion docs/openapi/NitroAPI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,10 @@ components:
default: true
nullable: true
description: Determines if output generation is in a streaming manner.
cache_prompt:
type: boolean
default: true
description: Optimize performance in repeated or similar requests.
temp:
type: number
default: 0.7
Expand Down Expand Up @@ -577,7 +581,10 @@ components:
min: 0
max: 1
description: Set probability threshold for more relevant outputs

cache_prompt:
type: boolean
default: true
description: Optimize performance in repeated or similar requests.
ChatCompletionResponse:
type: object
description: Description of the response structure
Expand Down
Loading