This is a tool that takes a text document (PDF or TXT) or YouTube transcript and generates a concise summary using GPT-4 or GPT-3.5-turbo. It can accurately summarize hundreds of pages of text. It's built with Python and Streamlit and leverages the langchain library for text processing. While the final output is generated with either GPT3.5 or GPT4 (the LLM's that power ChatGPT), only a small portion of the overall document is used in the prompts. Before any call is made to either LLM, the document is separated into small sections that contain the majority of the meaning of the document.
Demo it here: https://gptdoc-summarizer.streamlit.app/
- Supports PDF and TXT file formats
- Utilizes GPT-4 or GPT-3.5-turbo for generating summaries
- Automatic clustering of the input document to identify key sections
- Customizable number of clusters for the summarization process
- Launch the Streamlit app by running
streamlit run main.py
- Upload a document (TXT or PDF) to summarize.
- Enter your OpenAI API key if the free usage cap has been hit.
- Choose whether to use GPT-4 for the summarization (recommended, requires GPT-4 API access).
- Click the "Summarize" button and wait for the result.
main.py
: Streamlit app main fileutils.py
: Contains utility functions for document loading, token counting, and summarizationstreamlit_app_utils.py
: Contains utility functions specifically for the Streamlit app
main()
: Entry point for the Streamlit appprocess_summarize_button()
: Processes the "Summarize" button click and displays the generated summaryvalidate_input()
: Validates user input and displays warnings for invalid inputsvalidate_doc_size()
: Validates the document size for token limits
doc_loader()
: Loads a document from a file pathtoken_counter()
: Counts the number of tokens in a text stringdoc_to_text()
: Converts a langchain Document object to a text stringdoc_to_final_summary()
: Generates the final summary for a given documentsummary_prompt_creator()
: Creates a summary prompt list for the langchain summarize chainpdf_to_text()
: Converts a PDF file to a text stringcheck_gpt_4()
: Checks if the user has access to GPT-4token_limit()
: Checks if a document has more tokens than a specified maximumtoken_minimum()
: Checks if a document has more tokens than a specified minimum