diff --git a/docs/docs/features/prompt.md b/docs/docs/features/prompt.md index 64872cb64..901a8d3c6 100644 --- a/docs/docs/features/prompt.md +++ b/docs/docs/features/prompt.md @@ -2,10 +2,9 @@ title: Prompt Role Support --- -Understanding the roles of different prompts—system, user, and assistant is crucial for effectively utilizing the Large Language Model. These prompts work together to create a coherent and functional conversational flow. - -With Nitro, developers can easily config the dialog for "system prompt" or implement advanced prompt engineering like [few-shot learning](https://arxiv.org/abs/2005.14165). +System, user, and assistant prompt is crucial for effectively utilizing the Large Language Model. These prompts work together to create a coherent and functional conversational flow. +With Nitro, developers can easily config the dialog or implement advanced prompt engineering like [few-shot learning](https://arxiv.org/abs/2005.14165). ## System prompt - The system prompt is foundational in setting up the assistant's behavior. You can config it under `pre_prompt`. @@ -35,7 +34,7 @@ curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ }' ``` -For testing the assistant +For testing the assistant inference: ```zsh title="Pirate Assistant" curl -X POST 'http://localhost:3928/inferences/llamacpp/chat_completion' \ diff --git a/docs/docs/new/about.md b/docs/docs/new/about.md index f752db772..0544216fb 100644 --- a/docs/docs/new/about.md +++ b/docs/docs/new/about.md @@ -3,56 +3,75 @@ title: About Nitro slug: /docs --- -Nitro is a fast, lightweight (3mb) inference server that can be embedded in apps to run local AI. Nitro can be used to run a variety of popular open source AI models, and provides an OpenAI-compatible API. +Nitro is a high-efficiency C++ inference engine for edge computing, supporting [Jan](https://jan.ai/). It's lightweight and embeddable, ideal for product integration. -Nitro is used to power [Jan](https://jan.ai), a open source alternative to OpenAI's platform that can be run on your own computer or server. - - -Nitro is a fast, lightweight, and embeddable inference engine, powering [Jan](https://jan.ai/). Developed in C++, it's specially optimized for use in edge computing and is ready for deployment in products. - -⚡ Discover more about Nitro on [GitHub](https://github.com/janhq/nitro) +Learn more on [GitHub](https://github.com/janhq/nitro). ## Why Nitro? -### Lightweight & Fast +- **Fast Inference:** Built on `drogon` C++17/20, ensuring rapid data processing for real-time applications. +- **Lightweight:** Only 3MB, ideal for resource-sensitive environments. +- **Easily Embeddable:** Simple integration into existing applications, offering flexibility. +- **Quick Setup:** Approximately 10-second initialization for swift deployment. +- **Enhanced Web Framework:** Incorporates `drogon cpp`, boosting web service efficiency. +- **Feature Control:** Offers features like background process management and model unloading for optimized performance. -- Old materials - - At a mere 3MB, Nitro is a testament to efficiency. This stark difference in size makes Nitro an ideal choice for applications. - - Nitro is designed to blend seamlessly into your application without restricting the use of other tools. This flexibility is a crucial advantage. -- **Quick Setup:** -Nitro can be up and running in about 10 seconds. This rapid deployment means you can focus more on development and less on installation processes. +### OpenAI-compatible API -- Old material - - Nitro uses the `drogon` C++17/20 HTTP application framework, which makes a significant difference. This framework is known for its speed, ensuring that Nitro processes data swiftly. This means your applications can make quick decisions based on complex data, a crucial factor in today's fast-paced digital environment. - - Nitro elevates its game with drogon cpp, a C++ production-ready web framework. Its non-blocking socket IO ensures that your web services are efficient, robust, and reliable. - - [Batching Inference](features/batch) - - Non-blocking Socket IO +Nitro is OpenAI compatible which mean you can use curl call the same as OpenAI curl but for local Large Language Model + + + -### OpenAI-compatible API - [ ] OpenAI-compatible -- [ ] Given examples +- [ ] Given examples (make a column to compare) - [ ] What is not covered? (e.g. Assistants, Tools -> See Jan) - Extends OpenAI's API with helpful model methods - e.g. Load/Unload model - e.g. Checking model status - [Unload model](features/load-unload) -- With Nitro, you gain more control over `llama.cpp` features. You can now stop background slot processing and unload models as needed. This level of control optimizes resource usage and enhances application performance. ### Cross-Platform -- [ ] Cross-platform +- [ ] Cross-platform: Support Windows, Linux, MacOS. Also support wide variety of CPUs ARM, x86, and GPUs Nvidia, AMD ### Multi-modal -- [ ] Hint at what's coming +- [ ] Hint at what's coming: After Large language model go wild, Large Multimodal Model is the next wave. Multi-modal will be support in the short time. Please stay tuned to see Nitro could think, draw, see and speech. ## Architecture - - [ ] Link to Specifications +- [ ] Link to Specifications: For deep understanding about how Nitro works please refer to [Specifications](architecture.md) ## Support +- If you encounter problems with Nitro, create a [GitHub issue](https://github.com/janhq/nitro). +- Describe the issue in detail, including error logs and steps to reproduce it. + +- We have the [#nitro-dev](https://discord.gg/FTk2MvZwJH) channel on [Discord](https://discord.gg/FTk2MvZwJH) to discuss all things about Nitro development. You can also be of great help by helping other users in the help channel. - [ ] File a Github Issue - [ ] Go to Discord @@ -60,6 +79,15 @@ Nitro can be up and running in about 10 seconds. This rapid deployment means you ## Contributing - [ ] Link to Github +There are many ways to contribute to Nitro, and not all involve coding. Here's a few ideas to get started: + +- Begin by going through the [Getting Started](nitro/overview) guide. If you encounter issues or have suggestions, let us know by [opening an issue](https://github.com/janhq/nitro/issues). + +- Browse [open issues](https://github.com/janhq/nitro/issues). You can offer workarounds, clarification, or suggest labels to help organize issues. If you find an issue you’d like to resolve, feel free to [open a pull request](https://github.com/janhq/nitro/pulls). Start with issues tagged as `Good first issue`. + +- Read through Nitro's documentation. If something is confusing or can be improved, click “Edit this page” at the bottom of most docs to propose changes directly on GitHub. + +- Check out feature requests from the community. You can contribute by opening a [pull request](https://github.com/janhq/nitro/pulls) for something you’re interested in working on. ## Acknowledgements diff --git a/docs/docs/new/architecture.md b/docs/docs/new/architecture.md deleted file mode 100644 index c7a88ae6f..000000000 --- a/docs/docs/new/architecture.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: Architecture ---- - -We should only have 1 document -- [ ] Refactor system/architecture -- [ ] Refactor system/key-concepts \ No newline at end of file diff --git a/docs/docs/new/architecture.mdx b/docs/docs/new/architecture.mdx new file mode 100644 index 000000000..510b4e103 --- /dev/null +++ b/docs/docs/new/architecture.mdx @@ -0,0 +1,38 @@ +--- +title: Architecture +--- + +![Nitro Architecture](img/architecture.drawio.png) + +### Details element example + +## Key Concepts +## Inference Server + +An inference server is a type of server designed to process requests for running large language models and to return predictions. This server acts as the backbone for AI-powered applications, providing real-time execution of models to analyze data and make decisions. + +## Batching + +Batching refers to the process of grouping several tasks and processing them as a single batch. In large language models inference, this means combining multiple inference requests into one batch to improve computational efficiency, leading to quicker response times and higher throughput. + +## Parallel Processing + +Parallel processing involves executing multiple computations simultaneously. For web servers and applications, this enables the handling of multiple requests at the same time, ensuring high efficiency and preventing delays in request processing. + +## Drogon Framework + +Drogon is an HTTP application framework based on C++14/17, designed for its speed and simplicity. Utilizing a non-blocking I/O and event-driven architecture, Drogon manages HTTP requests efficiently for high-performance and scalable applications. + +- **Event Loop**: Drogon uses an event loop to wait for and dispatch events or messages within a program. This allows for handling many tasks asynchronously, without relying on multi-threading. + +- **Threads**: While the event loop allows for efficient task management, Drogon also employs threads to handle parallel operations. These "drogon threads" process multiple tasks concurrently. + +- **Asynchronous Operations**: The framework supports non-blocking operations, permitting the server to continue processing other tasks while awaiting responses from databases or external services. + +- **Scalability**: Drogon's architecture is built to scale, capable of managing numerous connections at once, suitable for applications with high traffic loads. + + + +We should only have 1 document +- [ ] Refactor system/architecture +- [ ] Refactor system/key-concepts \ No newline at end of file diff --git a/docs/docs/system/img/architecture.drawio.png b/docs/docs/new/img/architecture.drawio.png similarity index 100% rename from docs/docs/system/img/architecture.drawio.png rename to docs/docs/new/img/architecture.drawio.png diff --git a/docs/docs/system/architecture.md b/docs/docs/system/architecture.md index a127910ca..e280d2646 100644 --- a/docs/docs/system/architecture.md +++ b/docs/docs/system/architecture.md @@ -6,7 +6,6 @@ title: Architecture This document is being updated. Please stay tuned. ::: -![Nitro Architecture](img/architecture.drawio.png) ### Components diff --git a/docs/docusaurus.config.js b/docs/docusaurus.config.js index 1bcb36878..9bcd849d7 100644 --- a/docs/docusaurus.config.js +++ b/docs/docusaurus.config.js @@ -143,12 +143,12 @@ const config = { position: "left", label: "API Reference", }, - { - type: "docSidebar", - sidebarId: "communitySidebar", - position: "left", - label: "Community", - }, + // { + // type: "docSidebar", + // sidebarId: "communitySidebar", + // position: "left", + // label: "Community", + // }, // Navbar right // { // type: "docSidebar",