-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embeddings included in Memory index OR serialized JSON index #834
Comments
Code based on current documentation to reproduce errors: import { create, insert, search } from '@orama/orama'
import { pluginEmbeddings } from '@orama/plugin-embeddings'
import '@tensorflow/tfjs-node'
const enableVectors = await pluginEmbeddings({
embeddings: {
defaultProperty: 'embedding',
onInsert: {
generate: true,
properties: ['title'],
verbose: true,
},
},
})
const db = create({
schema: {
title: 'string',
embedding: 'vector[512]',
},
plugins: [enableVectors],
})
// When using this plugin, document insertion becomes async
await insert(db, { title: 'The quick brown fox jumps over the lazy dog' })
await insert(db, {
title: "I've seen a lazy dog dreaming of jumping over a quick brown fox",
})
console.log('Async Indexing complete')
// const index = await persist(db, 'json')
// console.log('Saved to disk', JSON.stringify(index, null, 2))
// This fails whether index is saved or not
const results = await search(db, {
mode: 'vector',
term: 'dog',
includeVectors: true, // Defaults to `false`
// similarity: 0.85, // Minimum similarity. Defaults to `0.8`
})
console.log(JSON.stringify(results, null, 2)) |
@micheleriva Looks like you're in active development on some of the code related to this issue. The snippet above should allow you to reproduce the issue easily. I originally thought that the bug was around the persisted version of the index, but it looks like it's the index in general (in-memory as well). I've updated the title accordingly. Happy to help dagnose this further or assist with validating any fixes. |
Hi @drush, thank you so much for opening this. It looks like a bug. We're on it |
Describe the bug
When building an index with plugin-embeddings, the embeddings are not persisted to disk on save().
Building the index with plugin-astro, where we have made enhancements to support generating embeddings at indexing time. index.json file is the same size regardless of if embeddings are specified or not. We have verified embeddings are generated (added diagnostic logging), but the index file does not serialize the 'embeddings' key or any other embeddings data (other than specifying an 'embeddings' key as part of the schema)
To Reproduce
pnpm build
No embeddings can be found.
Expected behavior
I expected embeddings to be returned in search results, or the persisted file to be significantly larger.
Environment Info
Affected areas
Search, Serialization
Additional context
No response
The text was updated successfully, but these errors were encountered: