Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Self-Hosting and Removing 3rd Party dependencies. #4465

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
60303ff
Self-Hosting Changes
Podginator Oct 30, 2024
bb7b3c9
Fix Minio Environment Variable
Podginator Oct 30, 2024
593bac0
Just make pdfs successful, due to lack of PDFHandler
Podginator Oct 30, 2024
d4710a8
Fix issue where flag was set wrong
Podginator Oct 30, 2024
26c5ef3
Added an NGINX Example file
Podginator Oct 31, 2024
4607032
Add some documentation for self-hosting via Docker Compose
Podginator Oct 31, 2024
ae66e2e
Make some adjustments to Puppeteer due to failing sites.
Podginator Oct 31, 2024
b350fbd
adjust timings
Podginator Oct 31, 2024
322ec68
Add start of Mail Service
Podginator Nov 1, 2024
6f1ee6b
Fix Docker Files
Podginator Nov 1, 2024
222ba06
More email service stuff
Podginator Nov 2, 2024
34e039e
Add Guide to use Zapier for Email-Importing.
Podginator Nov 2, 2024
8b845b5
Ensure that if no env is provided it uses the old email settings
Podginator Nov 2, 2024
e557fd0
Add some instructions for self-hosted email
Podginator Nov 3, 2024
b8226db
Add SNS Endpoints for Mail Watcher
Podginator Nov 3, 2024
af70b25
Add steps and functionality for using SES and SNS for email
Podginator Nov 3, 2024
2e3134c
Uncomment a few jobs.
Podginator Nov 3, 2024
ab51fc9
Added option for Firefox for parser. Was having issues with Chromium …
Podginator Nov 4, 2024
0e6c675
Add missing space.
Podginator Nov 5, 2024
6b7f170
Fix some wording on the Guide
Podginator Nov 6, 2024
9d41cc5
Fix Package
Podginator Nov 11, 2024
a66f92b
Fix MV
Podginator Nov 13, 2024
c27af01
Do raw handlers for Medium
Podginator Nov 22, 2024
7bebb45
Fix images in Medium
Podginator Nov 22, 2024
7bdf222
Update self-hosting/GUIDE.md
Podginator Nov 25, 2024
d42656b
Update Guide with other variables
Podginator Nov 27, 2024
685f542
Merge
Podginator Nov 27, 2024
be7102b
Add The Verge to JS-less handlers
Podginator Nov 28, 2024
55ba7b0
Update regex and image-proxy
Podginator Nov 28, 2024
e729225
Update self-hosting/nginx/nginx.conf
Podginator Nov 28, 2024
7fd4095
Update regex and image-proxy
Podginator Nov 28, 2024
a0f6f14
Update regex and image-proxy
Podginator Nov 28, 2024
99ed2bb
Update self-hosting/docker-compose/docker-compose.yml
Podginator Nov 28, 2024
e423885
Fix Minio for Export
Podginator Nov 29, 2024
ad6a997
Revert yarn lock removal
Podginator Nov 29, 2024
da6ab7a
Merge to main
Podginator Nov 29, 2024
f16085f
Update GUIDE with newer NGINX
Podginator Dec 2, 2024
efe7e61
Update nginx config to include api/save route
Podginator Dec 3, 2024
ea69eb6
Enable Native PDF View for PDFS
Podginator Dec 8, 2024
eab1c2a
Enable Native PDF View for PDFS
Podginator Dec 8, 2024
b3fc97f
feat:lover packages test
m1xxos Dec 9, 2024
bc1aef7
feat:working build
m1xxos Dec 9, 2024
df80ae4
feat:alpine build
m1xxos Dec 9, 2024
ee32e16
docs:api dockerfile docs
m1xxos Dec 11, 2024
93d05c1
Write a PDF.js wrapper to replace pspdfkit
Podginator Dec 12, 2024
bc41ca2
Merge pull request #7 from m1xxos/self-host-updates
Podginator Dec 13, 2024
d9d0f01
Revert changes for replication, set settings to have default mode
Podginator Dec 13, 2024
0a00a45
Merge
Podginator Dec 13, 2024
a6074a9
build folder got removed due to gitignore on pdf
Podginator Dec 13, 2024
bb7c7c8
Add Box shadow to pdf pages
Podginator Dec 13, 2024
c7fa382
Add Toggle for Progress in PDFS, enabled native viewer toggle
Podginator Dec 13, 2024
e5f0c07
Update node version to LTS
Podginator Dec 14, 2024
9b9bf96
Update node version to LTS
Podginator Dec 14, 2024
9b926fe
Fix Linting issues
Podginator Dec 14, 2024
893d65d
Fix Linting issues
Podginator Dec 14, 2024
fe851a2
Make env variable nullable
Podginator Dec 14, 2024
39d6a61
Add touchend listener for mobile
Podginator Dec 14, 2024
3215435
Make changes to PDF for mobile
Podginator Dec 14, 2024
7b968c4
fix(android): change serverUrl to selfhosted first
weeebdev Dec 14, 2024
8a2906c
Merge branch 'self-host-updates' of github.com:Akasha-Chronicles/omni…
weeebdev Dec 14, 2024
735e396
Merge pull request #8 from Akasha-Chronicles/self-host-updates
Podginator Dec 14, 2024
d81f1bf
feat:2 stage alpine content fetch
m1xxos Dec 18, 2024
5785015
feat:separated start script
m1xxos Dec 18, 2024
7687bb0
fix:changed to node 22
m1xxos Dec 18, 2024
17ed3e3
Merge pull request #9 from m1xxos/self-host-updates
Podginator Dec 20, 2024
c9866da
Add back youtube functionality and add guide
Podginator Jan 1, 2025
2ea293d
Merge branch 'self-host-updates' of https://github.com/Podginator/omn…
Podginator Jan 4, 2025
b9e8aae
trigger build
Podginator Jan 5, 2025
63b8e48
Fix cache issue on YouTube
Podginator Jan 5, 2025
f9f2267
Allow empty AWS_S3_ENDPOINT
Podginator Jan 17, 2025
72a6419
Allow empty AWS_S3_ENDPOINT
Podginator Jan 17, 2025
2c17be6
Add GCHR for all images
Podginator Jan 17, 2025
72457ff
Add GCHR For self hosting.
Podginator Jan 17, 2025
e3e24a4
Add GCHR For self hosting.
Podginator Jan 17, 2025
0ac1cc6
Test prebuilt.
Podginator Jan 17, 2025
1a61453
Test prebuilt
Podginator Jan 17, 2025
4ae68a7
Test prebuilt...
Podginator Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/guides/images/cloudflare-tunnel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/create-new-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/imported-email.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/zapier-email-webhook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/zapier-javascript-step.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guides/images/zapier-webhook-step.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion packages/api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,10 @@
"voca": "^1.4.0",
"winston": "^3.3.3",
"yaml": "^2.4.1",
"youtubei": "^1.5.4"
"youtubei": "^1.5.4",
"@aws-sdk/client-s3": "^3.679.0",
"@aws-sdk/s3-request-presigner": "^3.679.0",
"@aws-sdk/lib-storage": "^3.679.0"
},
"devDependencies": {
"@istanbuljs/nyc-config-typescript": "^1.0.2",
Expand Down
61 changes: 61 additions & 0 deletions packages/api/queue-processor/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
FROM node:18.16 as builder

WORKDIR /app

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
RUN apt-get update && apt-get install -y g++ make python3

COPY package.json .
COPY yarn.lock .
COPY tsconfig.json .
COPY .prettierrc .
COPY .eslintrc .

COPY /packages/readabilityjs/package.json ./packages/readabilityjs/package.json
COPY /packages/api/package.json ./packages/api/package.json
COPY /packages/text-to-speech/package.json ./packages/text-to-speech/package.json
COPY /packages/content-handler/package.json ./packages/content-handler/package.json
COPY /packages/liqe/package.json ./packages/liqe/package.json
COPY /packages/utils/package.json ./packages/utils/package.json

RUN yarn install --pure-lockfile

ADD /packages/readabilityjs ./packages/readabilityjs
ADD /packages/api ./packages/api
ADD /packages/text-to-speech ./packages/text-to-speech
ADD /packages/content-handler ./packages/content-handler
ADD /packages/liqe ./packages/liqe
ADD /packages/utils ./packages/utils

RUN yarn workspace @omnivore/utils build
RUN yarn workspace @omnivore/text-to-speech-handler build
RUN yarn workspace @omnivore/content-handler build
RUN yarn workspace @omnivore/liqe build
RUN yarn workspace @omnivore/api build

# After building, fetch the production dependencies
RUN rm -rf /app/packages/api/node_modules
RUN rm -rf /app/node_modules
RUN yarn install --pure-lockfile --production

FROM node:18.16 as runner
LABEL org.opencontainers.image.source="https://github.com/omnivore-app/omnivore"

RUN apt-get update && apt-get install -y netcat-openbsd

WORKDIR /app

ENV NODE_ENV production

COPY --from=builder /app/packages/api/dist /app/packages/api/dist
COPY --from=builder /app/packages/readabilityjs/ /app/packages/readabilityjs/
COPY --from=builder /app/packages/api/package.json /app/packages/api/package.json
COPY --from=builder /app/packages/api/node_modules /app/packages/api/node_modules
COPY --from=builder /app/node_modules /app/node_modules
COPY --from=builder /app/package.json /app/package.json
COPY --from=builder /app/packages/text-to-speech/ /app/packages/text-to-speech/
COPY --from=builder /app/packages/content-handler/ /app/packages/content-handler/
COPY --from=builder /app/packages/liqe/ /app/packages/liqe/
COPY --from=builder /app/packages/utils/ /app/packages/utils/

CMD ["yarn", "workspace", "@omnivore/api", "start_queue_processor"]
13 changes: 12 additions & 1 deletion packages/api/src/jobs/export.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,24 @@ import { sendExportJobEmail } from '../services/send_emails'
import { findActiveUser } from '../services/user'
import { logger } from '../utils/logger'
import { highlightToMarkdown } from '../utils/parser'
import { contentFilePath, createGCSFile } from '../utils/uploads'
import { contentFilePath } from '../utils/uploads'
import { env } from '../env'
import { File, Storage } from '@google-cloud/storage'

export interface ExportJobData {
userId: string
exportId: string
}

export const storage = env.fileUpload?.gcsUploadSAKeyFilePath
? new Storage({ keyFilename: env.fileUpload.gcsUploadSAKeyFilePath })
: new Storage()
const bucketName = env.fileUpload.gcsUploadBucket

const createGCSFile = (filename: string): File => {
return storage.bucket(bucketName).file(filename)
}

export const EXPORT_JOB_NAME = 'export'

const itemStateMappping = (state: LibraryItemState) => {
Expand Down
38 changes: 26 additions & 12 deletions packages/api/src/queue-processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,10 @@ import {
PROCESS_YOUTUBE_VIDEO_JOB_NAME,
} from './jobs/process-youtube-video'
import { pruneTrashJob, PRUNE_TRASH_JOB } from './jobs/prune_trash'
import { refreshAllFeeds } from './jobs/rss/refreshAllFeeds'
import {
REFRESH_ALL_FEEDS_JOB_NAME,
refreshAllFeeds,
} from './jobs/rss/refreshAllFeeds'
import { refreshFeed } from './jobs/rss/refreshFeed'
import { savePageJob } from './jobs/save_page'
import {
Expand Down Expand Up @@ -159,17 +162,17 @@ export const createWorker = (connection: ConnectionOptions) =>
async (job: Job) => {
const executeJob = async (job: Job) => {
switch (job.name) {
// case 'refresh-all-feeds': {
// const queue = await getQueue()
// const counts = await queue?.getJobCounts('prioritized')
// if (counts && counts.wait > 1000) {
// return
// }
// return await refreshAllFeeds(appDataSource)
// }
// case 'refresh-feed': {
// return await refreshFeed(job.data)
// }
case 'refresh-all-feeds': {
const queue = await getQueue()
const counts = await queue?.getJobCounts('prioritized')
if (counts && counts.wait > 1000) {
return
}
return await refreshAllFeeds(appDataSource)
}
case 'refresh-feed': {
return await refreshFeed(job.data)
}
case 'save-page': {
return savePageJob(job.data, job.attemptsMade)
}
Expand Down Expand Up @@ -260,6 +263,17 @@ const setupCronJobs = async () => {
},
}
)

await queue.add(
REFRESH_ALL_FEEDS_JOB_NAME,
{},
{
priority: getJobPriority(REFRESH_ALL_FEEDS_JOB_NAME),
repeat: {
every: 14_400_000, // 4 Hours
},
}
)
}

const main = async () => {
Expand Down
71 changes: 71 additions & 0 deletions packages/api/src/repository/storage/GcsStorageClient.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import { SignedUrlParameters, StorageClient, File } from './StorageClient'
import { Storage, File as GCSFile } from '@google-cloud/storage'

export class GcsStorageClient implements StorageClient {
private storage: Storage

constructor(keyFilename: string | undefined) {
this.storage = new Storage({
keyFilename,
})
}

private convertFileToGeneric(gcsFile: GCSFile): File {
return {
isPublic: async () => {
const [isPublic] = await gcsFile.isPublic()
return isPublic
},
exists: async () => (await gcsFile.exists())[0],
download: async () => (await gcsFile.download())[0],
bucket: gcsFile.bucket.name,
publicUrl: () => gcsFile.publicUrl(),
getMetadataMd5: async () => {
const [metadata] = await gcsFile.getMetadata()
return metadata.md5Hash
},
}
}

downloadFile(bucket: string, filePath: string): Promise<File> {
const file = this.storage.bucket(bucket).file(filePath)
return Promise.resolve(this.convertFileToGeneric(file))
}

async getFilesFromPrefix(bucket: string, prefix: string): Promise<File[]> {
const [filesWithPrefix] = await this.storage
.bucket(bucket)
.getFiles({ prefix })

return filesWithPrefix.map((it: GCSFile) => this.convertFileToGeneric(it))
}

async signedUrl(
bucket: string,
filePath: string,
options: SignedUrlParameters
): Promise<string> {
const [url] = await this.storage
.bucket(bucket)
.file(filePath)
.getSignedUrl({ ...options, version: 'v4' })

return url
}

upload(
bucket: string,
filePath: string,
data: Buffer,
options: {
contentType?: string
public?: boolean
timeout?: number
}
): Promise<void> {
return this.storage
.bucket(bucket)
.file(filePath)
.save(data, { timeout: 30000, ...options })
}
}
140 changes: 140 additions & 0 deletions packages/api/src/repository/storage/S3StorageClient.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
import { SignedUrlParameters, StorageClient, File } from './StorageClient'
import {
GetObjectCommand,
GetObjectCommandOutput,
S3Client,
ListObjectsV2Command,
PutObjectCommand,
} from '@aws-sdk/client-s3'
import { getSignedUrl } from '@aws-sdk/s3-request-presigner'
import type { Readable } from 'stream'

// While this is listed as S3, for self hosting we will use MinIO, which is
Copy link

@lovebes lovebes Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Cloudflare R2 as well for self hosting? What was the decision behind using MinIO?
Asking because R2 is also S3 compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually familiar with R2 - but anything that is S3 Compatible should work. Let me take a look later to see whether or not the Storage Client I built works with it.

Minio was chosen because it can be self-hosted along with the rest of the application. There is a docker image, and it can all run on the same server without relying on anything external.

I'm trying to ensure everything here can be run self-contained without any need for external services.

That said, as with some of the email changes, I am looking into ways to simplify parts of it too, and having some external services is ok with me.

Copy link

@Mikilio Mikilio Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To find suitable services, I recommend consulting r/self-hosted.
Love the work so far.

Copy link

@volker-fr volker-fr Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 is a nice idea, provides various options, including self hosted ones.

How about local storage? This would reduce the required dependencies by one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow I didn't know Minio can be self-hosted! That sounds like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 is a nice idea, provides various options, including self hosted ones.

How about local storage? This would reduce the required dependencies by one.

The uploads are done via signed-urls, so while local-storage would be feasible it'd require a bit more development work.

// S3 Compatible.
export class S3StorageClient implements StorageClient {
private s3Client: S3Client
private urlOverride: string | undefined

constructor(urlOverride: string | undefined) {
this.urlOverride = urlOverride
this.s3Client = new S3Client({
forcePathStyle: true,
endpoint: urlOverride,
})
}

private convertFileToGeneric(
s3File: GetObjectCommandOutput
): Omit<File, 'bucket' | 'publicUrl'> {
return {
exists: () => {
return Promise.resolve(s3File.$metadata.httpStatusCode == 200)
},
isPublic: async () => Promise.resolve(true),
download: async () => this.getFileFromReadable(s3File.Body as Readable),
getMetadataMd5: () => Promise.resolve(s3File.ETag),
}
}

private getFileFromReadable(stream: Readable): Promise<Buffer> {
return new Promise<Buffer>((resolve, reject) => {
const chunks: Buffer[] = []
stream.on('data', (chunk) => chunks.push(chunk))
stream.once('end', () => resolve(Buffer.concat(chunks)))
stream.once('error', reject)
})
}

async downloadFile(bucket: string, filePath: string): Promise<File> {
const s3File = await this.s3Client.send(
new GetObjectCommand({
Bucket: bucket,
Key: filePath, // path to the file you want to download,
})
)

return {
...this.convertFileToGeneric(s3File),
bucket: bucket,
publicUrl: () => `${this.urlOverride ?? ''}/${bucket}/${filePath}`,
}
}

async getFilesFromPrefix(bucket: string, prefix: string): Promise<File[]> {
const s3PrefixedFiles = await this.s3Client.send(
new ListObjectsV2Command({
Bucket: bucket,
Prefix: prefix, // path to the file you want to download,
})
)

const prefixKeys = s3PrefixedFiles.CommonPrefixes || []

return prefixKeys
.map(({ Prefix }) => Prefix)
.map((key) => {
return {
exists: () => Promise.resolve(true),
isPublic: async () => Promise.resolve(true),
download: async () => {
const s3File = await this.s3Client.send(
new GetObjectCommand({
Bucket: bucket,
Key: key, // path to the file you want to download,
})
)

return this.getFileFromReadable(s3File.Body as Readable)
},
getMetadataMd5: () => Promise.resolve(key),
bucket: bucket,
publicUrl: () => `${this.urlOverride ?? ''}/${bucket}/${key}`,
}
})
}

async signedUrl(
bucket: string,
filePath: string,
options: SignedUrlParameters
): Promise<string> {
const command =
options.action == 'read'
? new GetObjectCommand({
Bucket: bucket,
Key: filePath, // path to the file you want to download,
})
: new PutObjectCommand({
Bucket: bucket,
Key: filePath, // path to the file you want to download,
})

// eslint-disable-next-line @typescript-eslint/no-unsafe-call
const url = await getSignedUrl(this.s3Client, command, {
expiresIn: 900,
})

return url
}

upload(
bucket: string,
filePath: string,
data: Buffer,
options: {
contentType?: string
public?: boolean
timeout?: number
}
): Promise<void> {
return this.s3Client
.send(
new PutObjectCommand({
Bucket: bucket,
Key: filePath,
Body: data,
})
)
.then(() => {})
}
}
Loading