chore(embeddings): use framework embeddings, refactor ai providers #143

jezekra1 · 2024-12-20T14:17:24Z

BREAKING CHANGE:

unification of LLM_BACKEND, EMBEDDING_BACKEND -> AI_BACKEND

open for discussion

.env.example

jezekra1 · 2024-12-20T15:07:36Z

src/runs/execution/tools/wikipedia-tool.ts

+
+  const wikipedia = new WikipediaTool({
+    filters: { minPageNameSimilarity: 0.25, excludeOthersOnExactMatch: false },
+    output: { maxSerializedLength: MAX_CONTENT_LENGTH_CHARS }


maxSerializedLength is not affecting the markdown output.

I removed the restrictions on max content length from markdown as well as the simplified extraction:

extraction: { fields: { markdown: {} } },

Previously we had table extraction disabled and the output was truncated to 25k characters due to slow embeddings, however the issue has mostly been resolved and we can include this data again.

Also this makes our implementation more aligned with framework defaults.

The maxSerializedLength property affects only the serialized output, which contains the markdown output and it works correctly.

const instance = new WikipediaTool({ output: { maxSerializedLength: 100, }, }); const response = await instance.run({ query: "ice hockey", }); expect(response.getTextContent()).toHaveLength(100);

Source: https://github.com/i-am-bee/bee-agent-framework/blob/6332b0c3a6cb82310fdcef7f576c4a5a6e2d40fd/src/tools/search/wikipedia.ts#L100-L111

But getTextContent aggregates text from all documents, which is not what we want here, because we add aditional information about the source to each chunk

Tomas2D

Just a few suggestions. Good work 👍🏻

Tomas2D · 2024-12-23T15:53:50Z

src/runs/execution/tools/wikipedia-tool.ts

+
+  const wikipedia = new WikipediaTool({
+    filters: { minPageNameSimilarity: 0.25, excludeOthersOnExactMatch: false },
+    output: { maxSerializedLength: MAX_CONTENT_LENGTH_CHARS }


The maxSerializedLength property affects only the serialized output, which contains the markdown output and it works correctly.

const instance = new WikipediaTool({ output: { maxSerializedLength: 100, }, }); const response = await instance.run({ query: "ice hockey", }); expect(response.getTextContent()).toHaveLength(100);

Source: https://github.com/i-am-bee/bee-agent-framework/blob/6332b0c3a6cb82310fdcef7f576c4a5a6e2d40fd/src/tools/search/wikipedia.ts#L100-L111

src/runs/execution/tools/file-search-tool.ts

src/runs/execution/provider.ts

.env.example

pilartomas · 2025-01-02T08:21:54Z

src/runs/execution/tools/wikipedia-tool.ts

+      query: input.question,
+      documents: output.results.flatMap((document, idx) =>
+        Array.from(
+          splitString(document.fields.markdown as string, {


This is weird, why isn't markdown a string already? If it can be something else, we must handle it or fail.

Also, is markdown good input type for splitString? 🤔 Splitting tags might cause problems.

It is typed as unknown, but we already have this in the examples:

https://github.com/i-am-bee/bee-agent-framework/blob/33f1db5971d269935878a4a83a484a44371c3cdd/examples/tools/custom/piping.ts#L29

That might be just for simplicity. We need to make sure it is type-safe here. With a typeof else throw guard if necessary.

pilartomas

Please make sure the markdown handling is safe at runtime, otherwise LGTM 👍

jezekra1 · 2025-01-06T08:19:20Z

Please make sure the markdown handling is safe at runtime, otherwise LGTM 👍

Actually, it could be null (src), the document will now be skipped in such case.

pilartomas · 2025-01-06T08:29:38Z

It is still types as unknown though, the typecast is just not safe there.

jezekra1 · 2025-01-06T09:17:38Z

It is still types as unknown though, the typecast is just not safe there.

@Tomas2D is this fix correct?

i-am-bee/bee-agent-framework#267

Signed-off-by: Radek Ježek <[email protected]>

jezekra1 requested a review from a team as a code owner December 20, 2024 14:17

jezekra1 force-pushed the use-framework-embedding branch from 25173d7 to b72da10 Compare December 20, 2024 14:18

jezekra1 added the env label Dec 20, 2024

jezekra1 force-pushed the use-framework-embedding branch from b72da10 to 4e37f54 Compare December 20, 2024 14:21

jezekra1 commented Dec 20, 2024

View reviewed changes

.env.example Show resolved Hide resolved

jezekra1 force-pushed the use-framework-embedding branch from 4e37f54 to a2ba5d8 Compare December 20, 2024 15:04

jezekra1 commented Dec 20, 2024

View reviewed changes

jezekra1 force-pushed the use-framework-embedding branch from a2ba5d8 to fab2e95 Compare December 20, 2024 15:14

Tomas2D approved these changes Dec 23, 2024

View reviewed changes

pilartomas reviewed Jan 2, 2025

View reviewed changes

src/runs/execution/provider.ts Outdated Show resolved Hide resolved

.env.example Show resolved Hide resolved

pilartomas reviewed Jan 2, 2025

View reviewed changes

jezekra1 requested a review from pilartomas January 2, 2025 14:50

pilartomas requested changes Jan 3, 2025

View reviewed changes

jezekra1 force-pushed the use-framework-embedding branch from 34d7a8d to 0eaa1a2 Compare January 6, 2025 08:05

jezekra1 mentioned this pull request Jan 6, 2025

fix(tools): add proper wikipedia result types i-am-bee/bee-agent-framework#267

Merged

jezekra1 added 2 commits January 7, 2025 13:21

chore(embeddings): use framework embeddings, refactor ai providers

67577fd

Signed-off-by: Radek Ježek <[email protected]>

resolve PR comments

44b6f53

jezekra1 force-pushed the use-framework-embedding branch from d2eae5b to 162a708 Compare January 7, 2025 12:21

fixup! resolve PR comments

a20b6af

Signed-off-by: Radek Ježek <[email protected]>

jezekra1 force-pushed the use-framework-embedding branch from 162a708 to a20b6af Compare January 8, 2025 12:27

jezekra1 requested a review from pilartomas January 8, 2025 12:27

pilartomas approved these changes Jan 8, 2025

View reviewed changes

jezekra1 merged commit b611ab9 into main Jan 8, 2025
6 checks passed

jezekra1 deleted the use-framework-embedding branch January 8, 2025 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(embeddings): use framework embeddings, refactor ai providers #143

chore(embeddings): use framework embeddings, refactor ai providers #143

jezekra1 commented Dec 20, 2024 •

edited

Loading

jezekra1 Dec 20, 2024

Tomas2D Dec 23, 2024

jezekra1 Jan 2, 2025

Tomas2D left a comment

Tomas2D Dec 23, 2024

pilartomas Jan 2, 2025 •

edited

Loading

pilartomas Jan 2, 2025 •

edited

Loading

jezekra1 Jan 2, 2025

pilartomas Jan 2, 2025 •

edited

Loading

pilartomas left a comment

jezekra1 commented Jan 6, 2025 •

edited

Loading

pilartomas commented Jan 6, 2025

jezekra1 commented Jan 6, 2025 •

edited

Loading

chore(embeddings): use framework embeddings, refactor ai providers #143

chore(embeddings): use framework embeddings, refactor ai providers #143

Conversation

jezekra1 commented Dec 20, 2024 • edited Loading

jezekra1 Dec 20, 2024

Choose a reason for hiding this comment

Tomas2D Dec 23, 2024

Choose a reason for hiding this comment

jezekra1 Jan 2, 2025

Choose a reason for hiding this comment

Tomas2D left a comment

Choose a reason for hiding this comment

Tomas2D Dec 23, 2024

Choose a reason for hiding this comment

pilartomas Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

pilartomas Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

jezekra1 Jan 2, 2025

Choose a reason for hiding this comment

pilartomas Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

pilartomas left a comment

Choose a reason for hiding this comment

jezekra1 commented Jan 6, 2025 • edited Loading

pilartomas commented Jan 6, 2025

jezekra1 commented Jan 6, 2025 • edited Loading

jezekra1 commented Dec 20, 2024 •

edited

Loading

pilartomas Jan 2, 2025 •

edited

Loading

pilartomas Jan 2, 2025 •

edited

Loading

pilartomas Jan 2, 2025 •

edited

Loading

jezekra1 commented Jan 6, 2025 •

edited

Loading

jezekra1 commented Jan 6, 2025 •

edited

Loading