The short answer, considering the recent decision of the U.S. District Court in HiQ Labs vs. LinkedIn case, is that web scraping is illegal only if data you are collecting are protected by some sort of barrier (like password, login, etc..) and are sensitive information (like users personal data).
Over the years, individuals and businesses have deployed crawlers to source relevant information to advance commercial causes—whether personal research, conducting competitor analysis, tracking potential leads, or gaining real-time access to customer data.
Website scraping has proved an efficient tool to extract and collate relevant data from multiple websites without spending hours copying and pasting data into an excel file, manually. It can be an efficient strategy to help you stay ahead of the curve and competition within your industry when done neatly.
The concept of web scraping is as old as the internet itself.
The World Wide Web was originally created to share information between scientists around the world. The concept of scraping was born less than a decade later through the creation of JumpStation, the first crawler-based web search engine, in 1993.
Ever since, website scraping has advanced in leaps and bounds, forming an essential business strategy for many. If you’re new to the concept, this section gives you a brief insight into what to expect from this method of data collection.
Web scraping simply refers to the process of extracting large data fields from one or more websites.
It’s important not to confuse web scraping with web crawling which has to do with sourcing for websites to collect data. Web scraping and website crawling work hand-in-hand as tools to harvest data from the internet.
Web crawlers/spiders locate the relevant content you need and index website domains for scraping, while website scraping bots extract and process data gotten from the crawled resources.
Various methods exist for obtaining data from websites. You can leverage internet services, specific APIs, or if you’re tech-savvy, write your own web scraping code from scratch.
By indexing some of the biggest websites in a specific field, most search engines (including Google) already act as crawlers, so you simply need to use a web scraping tool to extract the necessary information.
It’s worth going over the different web scraping methods available to users today:
This is the oldest and slowest method of extracting data from the web.
Before automation processes through scraping bots came to light, individuals and businesses had to manually search and collect data from websites, locate the data they need, copy and paste them in a txt/word sheet.
Scraping bots eliminate the need for the more traditional method of data scraping by automating the process of data collection. Rather than pore through websites on your own, users can utilise top web scraper tools to harvest data quickly and more efficiently. Bots can come as either computer software, SaaS, API or as browser extensions.
Scraping a website with a bot typically involves a few steps. It starts by downloading web pages from which to collect data (a process called fetching). Once the bot has received the contents of these web pages, it will then extract, rearrange, and store the relevant data for your purpose. A search engine bot, for instance, aggregates data and displays the necessary information on a Search Engine Results Page (SERP).
These are some of best web scraping services you can find online:
Another popular method is to create a unique web scraping code with a programming language (usually Python or NodeJS).
This method is the most difficult one. However, if you are an experienced coder, leave you more room to define your own scraping rules.
Pro tip: Although Python and NodeJS are the most popular languages for web scraping, programmers can also use different ones like PHP, Java, Ruby, and C++ to create their web scraping frameworks.
The first step you should do is to start searching for something already done on GitHub by searching "Scraper" if you are looking a generalist scraper or Amazon Scraper" / eBay Scraper" if you are looking something more specific. You will immediately notice the huge amount of projects and code available online. This is save you a lot of time.
Then you could start learning how to build a scalable scraper watching an online course on Coursera
There are several grey areas and nuances that make this issue more complicated than a simple yes or no answer. A recent case that did well to shape the contours of the legality or illegality of scraping is:
In 2017, Linkedin brought an action against HiQ Labs for scraping LinkedIn Profiles that were publicly available to sell to clients as some “crystal ball that helps you determine skills gaps or turnover risks months ahead of time.”
This portended a risk for employees — imagine your employer using your publicly accessible LinkedIn profile against you in the course of your employment.
Despite the immediately apparent injustice, Judge Edward Chen of the U.S. District Court held that this was okay. He pitched his reasoning with the HiQ Lab’s argument — that Linkedin had violated antitrust laws when it mounted barriers so the startup could not access the data any longer, and ordered Linkedin to lift the barriers within 24 hours. Linkedin filed an appeal.
Now, this ruling tears at the seams of other previous decisions that have outlawed web scraping. In this case, the Ninth Circuit upheld the plaintiff's right to scrape public data from LinkedIn’s servers for its benefit.
In addition, the court rejected LinkedIn's argument that scraping web data from its platform breached its terms of service and, in essence, also constituted a violation of the Computer Fraud and Abuse Act. The court was able to absolve HiQ labs of any wrongdoing by pointing out that the data scraped was already public.
Hence, web scraping did not constitute "breaking and entering" into the platform. While, based on this, it seems safe to conclude that web scraping is not illegal, it’s important to note the court’s caveat—that it all depends on how you scrape the data and what you do with it afterward.
PS: Data scraping is also not the same as hacking competitor websites—this is a crime punishable by law.
Website scraping will not be deemed illegal as long as the data scraped from the websites is already available for public consumption. However, the position of the court changes when the data scraped is protected by user privacy and copyright laws.
If you are scraping data from a website without the owner’s permission, then it could be considered illegal. Suppose the data scraped from a website requires passwords or codes to access or gather sensitive user information that can lead to personal injury to the persons involved. In that case, such scraping will be deemed illegal.
The implication of this rule for e-commerce store owners is simple: Once you’re able to fully confirm that the data you’ve scraped from social media or e-commerce stores is publicly available information, you’ll have the full backing of the law to gather such data.