For exercise I tried to discover companies from Slovenia which export their goods to Austria. Let’s say we are interested in 10 companies with greatest income. It seems like an easy task doesn’t it? In fact all the data is publicly available on http://sloexport.si but unfortunately ..
- the website lacks ordering by company’s income
- data exporting is limited to max 100 records.
If we wanted to get the data manually it would probably take us more days to finish the task. Can Rails help us do this? Read on …
What is the biggest problem?
Heard of Copybara?
We are attaching a video of data scraping in action and the code required to do this (less than 60 lines of code). The final result of the process is SQLLite database file which we can use to sort and filter crucial data.
Code of scraping spider is available here: https://gist.github.com/knagode/b0be5225e028d1d3c152