Colly
Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
Features
Clean API
Fast (>1k request/sec on a single core)
Manages request delays and maximum concurrency per domain
Automatic cookie and session handling
Sync/async/parallel scraping
Caching
Automatic encoding of non-unicode responses
Robots.txt support
Distributed scraping
Configuration via environment variables
Extensions
Example
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/")
}
See examples folder for more detailed examples.
Installation
Add colly to your go.mod file:
module github.com/x/y
go 1.14
require (
github.com/gocolly/colly/v2 latest
)
Bugs
Bugs or suggestions? Visit the issue tracker or join #colly on freenode
Other Projects Using Colly
Below is a list of public, open source projects that use Colly:
greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
altsab/gowap Wappalyzer implementation in Go.
jesuiscamille/goquotes A quotes scrapper, making your day a little better!
jivesearch/jivesearch A search engine that doesn't track you.
Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.
gamedb/gamedb A database of Steam games.
lawzava/scrape CLI for email scraping from any website.
eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scrapper
Go-phie/gophie Search, Download and Stream movies from your terminal
imthaghost/goclone Clone websites to your computer within seconds.
If you are using Colly in a project please send a pull request to add it to the list.
Contributors
This project exists thanks to all the people who contribute. [Contribute].
Backers
Thank you to all our backers!
🙏 [Become a backer]
Sponsors
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]
License