Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[idea] Custom Resource Definitation of distribured spider in Kubernetes #29

Open
gaocegege opened this issue May 10, 2018 · 4 comments
Open

Comments

@gaocegege
Copy link
Member

利用 Kubernetes,应该可以很轻易地启动一个分布式爬虫任务。而如果将其实现成 crd 的方式,会变得非常 Kubernetes native,有更好的 scalability。可以基于 https://github.com/rmax/scrapy-redis 去尝试

skillset: 熟悉 Kubernetes,熟悉 scrapy,了解 redis

@gaocegege
Copy link
Member Author

Crawler as a service 🤔

@xplorld
Copy link
Contributor

xplorld commented Jul 30, 2018

这样的 craweler service 允许用户不写代码仅仅写yaml就能开启分布式爬虫任务了?比如在yaml中可以指定transformation规则?

@xplorld
Copy link
Contributor

xplorld commented Jul 30, 2018

🤔

@gaocegege
Copy link
Member Author

yaml 里制定规则我觉得不太可行,毕竟不是图灵完备

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants