-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
512MB seems to be the max supported file size for disk persistence plugin #851
Comments
Hi @bennyzen, |
Ciao Michele, first of all, thank you for this amazing project. From my humble understanding, as I yet haven't studied the internals of Orama, I simply followed the instructions in the docs calling the provided There's a real chance that I've been delusional by ingesting so much data into the db, as it's maybe just not made for such volumes. BTW: Did someone successfully store, persisted and restored more than 512MB of data, or is it just me having this kind of issue? |
Can you try persisting this data in a JSON format? Using the As far as I know, 512MB shouldn't really be a problem. Especially in JSON! |
Yes, I'll surely try to persist using JSON. But it will take some time to embed and ingest all those records again to reach that volume. The only thing that still boggles me is what I've come across here. If I understand that right, it means that the max string length has regressed back to 0.5GB. But as always, please correct me if I'm wrong. |
Here's a quick'n'dirty bare bones reproduction using both import { create, insert } from '@orama/orama'
import {
persistToFile,
restoreFromFile,
} from '@orama/plugin-data-persistence/server'
const inserts = 512 * 10
const blockSize = 1048576 / 10 // 1MB / 10, as a whole 1MB block would cause another error
const mode = 'json'
const payload = () => {
let payload = ''
for (let i = 0; i < blockSize; i++) {
payload += 'a'
}
return payload
}
const db = create({
schema: {
payload: 'string',
},
})
console.time('inserting')
for (let i = 0; i < inserts; i++) {
await insert(db, {
payload: payload(),
})
}
console.timeEnd('inserting')
// persist the database to disk
console.time('persisting')
const path = await persistToFile(db, mode, 'db.dat')
console.timeEnd('persisting')
// restore the database from disk
console.time('restoring')
const restored = await restoreFromFile(mode, path)
console.timeEnd('restoring') JSON mode yields this error: inserting: 21.506s
file:///home/ben/tmp/orama-persist-limit/node_modules/.pnpm/@[email protected]/node_modules/@orama/plugin-data-persistence/dist/index.js:50
serialized = JSON.stringify(dbExport);
^
RangeError: Invalid string length
at JSON.stringify (<anonymous>)
at persist (file:///home/ben/tmp/orama-persist-limit/node_modules/.pnpm/@[email protected]/node_modules/@orama/plugin-data-persistence/dist/index.js:50:31)
at async persistToFile (file:///home/ben/tmp/orama-persist-limit/node_modules/.pnpm/@[email protected]/node_modules/@orama/plugin-data-persistence/dist/server.js:16:24)
at async file:///home/ben/tmp/orama-persist-limit/main.mjs:35:14
Node.js v22.11.0 BINARY mode yields this error: inserting: 21.573s
node:buffer:711
slice: (buf, start, end) => buf.hexSlice(start, end),
^
Error: Cannot create a string longer than 0x1fffffe8 characters
at Object.slice (node:buffer:711:37)
at Buffer.toString (node:buffer:863:14)
at persist (file:///home/ben/tmp/orama-persist-limit/node_modules/.pnpm/@[email protected]/node_modules/@orama/plugin-data-persistence/dist/index.js:60:45)
at async persistToFile (file:///home/ben/tmp/orama-persist-limit/node_modules/.pnpm/@[email protected]/node_modules/@orama/plugin-data-persistence/dist/server.js:16:24)
at async file:///home/ben/tmp/orama-persist-limit/main.mjs:34:14 {
code: 'ERR_STRING_TOO_LONG'
}
Node.js v22.11.0 So yes, the limit seems to be 512MB. Correct? |
It shouldn't be. We're investigating, we'll keep you posted (cc. @matijagaspar, @faustoq) |
It's just an assumption and probably too vague to be useful, but couldn't it be mitigated by using eg. a streaming ndjson parser/serializer? It surely would involve some significant rework of the actual code-base, but IMHO would remove these constraining "limitations" and significantly reduce memory consumption on larger data volumes. Another thing that I've noticed during testing: Field-size seems to be limited to 100KB (see rudimentary code above). Sure, which sane person puts 100 KB of data into a single field? But that's maybe stuff for another consideration/issue. |
Describe the bug
Using Orama with the persistence plugin, I seem to have hit a wall. While indexing some publications, everything was fine until the database grew. Now I keep getting the same error while trying to read from the persisted file:
To Reproduce
Expected behavior
Being able to reach more than 512MB in database size.
Environment Info
Affected areas
Initialization, Data Insertion
Additional context
Only tried Linux so far, as it's my daily driver.
The text was updated successfully, but these errors were encountered: