Skip to content

Commit

Permalink
latest update
Browse files Browse the repository at this point in the history
  • Loading branch information
hahuyhoang411 committed Nov 15, 2023
1 parent 279de3b commit cba402f
Show file tree
Hide file tree
Showing 7 changed files with 99 additions and 42 deletions.
7 changes: 3 additions & 4 deletions docs/docs/features/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@
title: Prompt Role Support
---

Understanding the roles of different prompts—system, user, and assistant is crucial for effectively utilizing the Large Language Model. These prompts work together to create a coherent and functional conversational flow.

With Nitro, developers can easily config the dialog for "system prompt" or implement advanced prompt engineering like [few-shot learning](https://arxiv.org/abs/2005.14165).
System, user, and assistant prompt is crucial for effectively utilizing the Large Language Model. These prompts work together to create a coherent and functional conversational flow.

With Nitro, developers can easily config the dialog or implement advanced prompt engineering like [few-shot learning](https://arxiv.org/abs/2005.14165).

## System prompt
- The system prompt is foundational in setting up the assistant's behavior. You can config it under `pre_prompt`.
Expand Down Expand Up @@ -35,7 +34,7 @@ curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
}'
```

For testing the assistant
For testing the assistant inference:

```zsh title="Pirate Assistant"
curl -X POST 'http://localhost:3928/inferences/llamacpp/chat_completion' \
Expand Down
76 changes: 52 additions & 24 deletions docs/docs/new/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,63 +3,91 @@ title: About Nitro
slug: /docs
---

Nitro is a fast, lightweight (3mb) inference server that can be embedded in apps to run local AI. Nitro can be used to run a variety of popular open source AI models, and provides an OpenAI-compatible API.
Nitro is a high-efficiency C++ inference engine for edge computing, supporting [Jan](https://jan.ai/). It's lightweight and embeddable, ideal for product integration.

Nitro is used to power [Jan](https://jan.ai), a open source alternative to OpenAI's platform that can be run on your own computer or server.


Nitro is a fast, lightweight, and embeddable inference engine, powering [Jan](https://jan.ai/). Developed in C++, it's specially optimized for use in edge computing and is ready for deployment in products.

⚡ Discover more about Nitro on [GitHub](https://github.com/janhq/nitro)
Learn more on [GitHub](https://github.com/janhq/nitro).

## Why Nitro?

### Lightweight & Fast
- **Fast Inference:** Built on `drogon` C++17/20, ensuring rapid data processing for real-time applications.
- **Lightweight:** Only 3MB, ideal for resource-sensitive environments.
- **Easily Embeddable:** Simple integration into existing applications, offering flexibility.
- **Quick Setup:** Approximately 10-second initialization for swift deployment.
- **Enhanced Web Framework:** Incorporates `drogon cpp`, boosting web service efficiency.
- **Feature Control:** Offers features like background process management and model unloading for optimized performance.

- Old materials
- At a mere 3MB, Nitro is a testament to efficiency. This stark difference in size makes Nitro an ideal choice for applications.
- Nitro is designed to blend seamlessly into your application without restricting the use of other tools. This flexibility is a crucial advantage.
- **Quick Setup:**
Nitro can be up and running in about 10 seconds. This rapid deployment means you can focus more on development and less on installation processes.
### OpenAI-compatible API

- Old material
- Nitro uses the `drogon` C++17/20 HTTP application framework, which makes a significant difference. This framework is known for its speed, ensuring that Nitro processes data swiftly. This means your applications can make quick decisions based on complex data, a crucial factor in today's fast-paced digital environment.
- Nitro elevates its game with drogon cpp, a C++ production-ready web framework. Its non-blocking socket IO ensures that your web services are efficient, robust, and reliable.
- [Batching Inference](features/batch)
- Non-blocking Socket IO
Nitro is OpenAI compatible which mean you can use curl call the same as OpenAI curl but for local Large Language Model


<!-- <div style={{ width: '50%', float: 'left', clear: 'left' }}>
Nitro API
```
curl -X POST 'http://localhost:3928/inferences/llamacpp/chat_completion' \
-H "Content-Type: application/json" \
-d '{
"llama_model_path": "/path/to/your_model.gguf",
"messages": [
{
"role": "user",
"content": "Who won the world series in 2020?"
}
]
}'
```
</div>
<div style={{ width: '50%', float: 'right', clear: 'right' }}>
OpenAI API
```
check
```
</div> -->

### OpenAI-compatible API

- [ ] OpenAI-compatible
- [ ] Given examples
- [ ] Given examples (make a column to compare)
- [ ] What is not covered? (e.g. Assistants, Tools -> See Jan)

- Extends OpenAI's API with helpful model methods
- e.g. Load/Unload model
- e.g. Checking model status
- [Unload model](features/load-unload)
- With Nitro, you gain more control over `llama.cpp` features. You can now stop background slot processing and unload models as needed. This level of control optimizes resource usage and enhances application performance.

### Cross-Platform

- [ ] Cross-platform
- [ ] Cross-platform: Support Windows, Linux, MacOS. Also support wide variety of CPUs ARM, x86, and GPUs Nvidia, AMD

### Multi-modal

- [ ] Hint at what's coming
- [ ] Hint at what's coming: After Large language model go wild, Large Multimodal Model is the next wave. Multi-modal will be support in the short time. Please stay tuned to see Nitro could think, draw, see and speech.

## Architecture

- [ ] Link to Specifications
- [ ] Link to Specifications: For deep understanding about how Nitro works please refer to [Specifications](architecture.md)

## Support
- If you encounter problems with Nitro, create a [GitHub issue](https://github.com/janhq/nitro).
- Describe the issue in detail, including error logs and steps to reproduce it.

- We have the [#nitro-dev](https://discord.gg/FTk2MvZwJH) channel on [Discord](https://discord.gg/FTk2MvZwJH) to discuss all things about Nitro development. You can also be of great help by helping other users in the help channel.

- [ ] File a Github Issue
- [ ] Go to Discord

## Contributing

- [ ] Link to Github
There are many ways to contribute to Nitro, and not all involve coding. Here's a few ideas to get started:

- Begin by going through the [Getting Started](nitro/overview) guide. If you encounter issues or have suggestions, let us know by [opening an issue](https://github.com/janhq/nitro/issues).

- Browse [open issues](https://github.com/janhq/nitro/issues). You can offer workarounds, clarification, or suggest labels to help organize issues. If you find an issue you’d like to resolve, feel free to [open a pull request](https://github.com/janhq/nitro/pulls). Start with issues tagged as `Good first issue`.

- Read through Nitro's documentation. If something is confusing or can be improved, click “Edit this page” at the bottom of most docs to propose changes directly on GitHub.

- Check out feature requests from the community. You can contribute by opening a [pull request](https://github.com/janhq/nitro/pulls) for something you’re interested in working on.

## Acknowledgements

Expand Down
7 changes: 0 additions & 7 deletions docs/docs/new/architecture.md

This file was deleted.

38 changes: 38 additions & 0 deletions docs/docs/new/architecture.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Architecture
---

![Nitro Architecture](img/architecture.drawio.png)

### Details element example

## Key Concepts
## Inference Server

An inference server is a type of server designed to process requests for running large language models and to return predictions. This server acts as the backbone for AI-powered applications, providing real-time execution of models to analyze data and make decisions.

## Batching

Batching refers to the process of grouping several tasks and processing them as a single batch. In large language models inference, this means combining multiple inference requests into one batch to improve computational efficiency, leading to quicker response times and higher throughput.

## Parallel Processing

Parallel processing involves executing multiple computations simultaneously. For web servers and applications, this enables the handling of multiple requests at the same time, ensuring high efficiency and preventing delays in request processing.

## Drogon Framework

Drogon is an HTTP application framework based on C++14/17, designed for its speed and simplicity. Utilizing a non-blocking I/O and event-driven architecture, Drogon manages HTTP requests efficiently for high-performance and scalable applications.

- **Event Loop**: Drogon uses an event loop to wait for and dispatch events or messages within a program. This allows for handling many tasks asynchronously, without relying on multi-threading.

- **Threads**: While the event loop allows for efficient task management, Drogon also employs threads to handle parallel operations. These "drogon threads" process multiple tasks concurrently.

- **Asynchronous Operations**: The framework supports non-blocking operations, permitting the server to continue processing other tasks while awaiting responses from databases or external services.

- **Scalability**: Drogon's architecture is built to scale, capable of managing numerous connections at once, suitable for applications with high traffic loads.



We should only have 1 document
- [ ] Refactor system/architecture
- [ ] Refactor system/key-concepts
File renamed without changes
1 change: 0 additions & 1 deletion docs/docs/system/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ title: Architecture
This document is being updated. Please stay tuned.
:::

![Nitro Architecture](img/architecture.drawio.png)

### Components

Expand Down
12 changes: 6 additions & 6 deletions docs/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ const config = {
position: "left",
label: "API Reference",
},
{
type: "docSidebar",
sidebarId: "communitySidebar",
position: "left",
label: "Community",
},
// {
// type: "docSidebar",
// sidebarId: "communitySidebar",
// position: "left",
// label: "Community",
// },
// Navbar right
// {
// type: "docSidebar",
Expand Down

0 comments on commit cba402f

Please sign in to comment.