Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

68747470733a2f2f676f646f632e6f72672f6769746875622e636f6d2f676f636f6c6c792f636f6c6c793f7374617475732e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f6261636b6572732f62616467652e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f72732f62616467652e73766768747470733a2f2f696d672e736869656c64732e696f2f7472617669732f676f636f6c6c792f636f6c6c792f6d61737465722e7376673f7374796c653d666c61742d73717561726568747470733a2f2f696d672e736869656c64732e696f2f62616467652f7265706f7274253230636172642d612532422d6666333333332e7376673f7374796c653d666c61742d73717561726568747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6561726e25323062792d6578616d706c65732d3030373762332e7376673f7374796c653d666c61742d73717561726568747470733a2f2f696d672e736869656c64732e696f2f636f6465636f762f632f6769746875622f676f636f6c6c792f636f6c6c792f6d61737465722e73766768747470733a2f2f6170702e666f7373612e696f2f6170692f70726f6a656374732f6769742532426769746875622e636f6d253246676f636f6c6c79253246636f6c6c792e7376673f747970653d736869656c6468747470733a2f2f696d672e736869656c64732e696f2f62616467652f747769747465722d666f6c6c6f772d677265656e2e737667

Features

Clean API

Fast (>1k request/sec on a single core)

Manages request delays and maximum concurrency per domain

Automatic cookie and session handling

Sync/async/parallel scraping

Caching

Automatic encoding of non-unicode responses

Robots.txt support

Distributed scraping

Configuration via environment variables

Extensions

Example

func main() {

c := colly.NewCollector()

// Find and visit all links

c.OnHTML("a[href]", func(e *colly.HTMLElement) {

e.Request.Visit(e.Attr("href"))

})

c.OnRequest(func(r *colly.Request) {

fmt.Println("Visiting", r.URL)

})

c.Visit("http://go-colly.org/")

}

See examples folder for more detailed examples.

Installation

Add colly to your go.mod file:

module github.com/x/y

go 1.14

require (

github.com/gocolly/colly/v2 latest

)

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.

altsab/gowap Wappalyzer implementation in Go.

jesuiscamille/goquotes A quotes scrapper, making your day a little better!

jivesearch/jivesearch A search engine that doesn't track you.

Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.

lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.

yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.

gamedb/gamedb A database of Steam games.

lawzava/scrape CLI for email scraping from any website.

eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scrapper

Go-phie/gophie Search, Download and Stream movies from your terminal

imthaghost/goclone Clone websites to your computer within seconds.

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute]. 68747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f636f6e7472696275746f72732e7376673f77696474683d383930

Backers

Thank you to all our backers!

🙏 [Become a backer]

68747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f6261636b6572732e7376673f77696474683d383930

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

68747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f302f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f312f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f322f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f332f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f342f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f352f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f362f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f372f6176617461722e73766768747470733a2f2f6f70656e636f6c6c6563746976652e636f6d2f636f6c6c792f73706f6e736f722f382f6176617461722e73766703d2704cff06d0620f915468a7b8300d.png

License

68747470733a2f2f6170702e666f7373612e696f2f6170692f70726f6a656374732f6769742532426769746875622e636f6d253246676f636f6c6c79253246636f6c6c792e7376673f747970653d6c61726765