How your personal data is being scraped from social media

August 31, 2021

How much personal information do you share on your social media profile pages?

Name, location, age, job role, marital status, headshot? The amount of information people are comfortable with posting online varies.

But most people accept that whatever we put on our public profile page is out in the public domain.

So, how would you feel if all your information was catalogued by a hacker and put into a monster spreadsheet with millions of entries, to be sold online to the highest paying cyber-criminal?

That’s what a hacker calling himself Tom Liner did last month “for fun” when he compiled a database of 700 million LinkedIn users from all over the world, which he is selling for around $5,000 (£3,600; €4,200).

The incident, and other similar cases of social media scraping, have sparked a fierce debate about whether or not the basic personal information we share publicly on our profiles should be better protected.

In the case of Mr Liner, his latest exploit was announced at 08:57 BST in a post on a notorious hacking forum.

It was a strangely civilised hour for hackers, but of course we have no idea which time zone, the hacker who calls himself Tom Liner, lives in.

“Hi, I have 700 million 2021 LinkedIn records”, he wrote.

Included in the post was a link to a sample of a million records and an invite for other hackers to contact him privately and make him offers for his database.

Understandably the sale caused a stir in the hacking world he is selling his haul to “multiple” happy customers for around $5,000 (£3,600; €4,200).

He won’t say who his customers are, or why they would want this information, but he says the data is likely being used for further malicious hacking campaigns.

The news has also set the cyber-security and privacy world alight with arguments about whether or not we should be worried about this growing trend of mega scrapes.

What’s important to understand here is that these databases aren’t being created by breaking into the servers or websites of social networks.

They are largely constructed by scraping the public-facing surface of platforms using automatic programmes to take whatever information is freely available about users.

In theory, most of the data being compiled could be found by simply picking through individual social media profile pages one-by-one. Although of course it would take multiple lifetimes to gather as much data together, as the hackers are able to do.

*1.3 million user records were scraped from audio-only social media app, Clubhouse*

So far this year, there have been at least three other major “scraping” incidents.

In April, a hacker sold another database of around 500 million records scraped from LinkedIn.

In the same week another hacker posted a database of scraped information from 1.3 million Clubhouse profiles on a forum for free.

Also in April, 533 million Facebook user details were compiled from a mixture of old and new scraping before being given away on a hacking forum with a request for donations.

The hacker who says he is responsible for that Facebook database, calls himself Tom Liner.

He said he created the 700 million LinkedIn database using “almost the exact same technique” that he used to create the Facebook list.

He said: “It took me several months to do. It was very complex. I had to hack the API of LinkedIn. If you do too many requests for user data in one time then the system will permanently ban you.”

‘No ambiguity’

But cyber-security expert Troy Hunt, who spends most of his working life poring over the contents of hacked databases for his website haveibeenpwned.com, is less concerned about the recent scraping incidents and says we need to accept them as part of our public profile-sharing.

“These are definitely not breaches, there’s no ambiguity here. Most of this data is public anyway.

“The question to ask, in each case though, is how much of this information is by user choice publicly accessible and how much is not expected to be publicly accessible.”

Troy agrees with Amir that controls on social network’s API programmes need to be improved and says we can’t brush off these incidents.

“I don’t disagree with the stance of Facebook and others but I feel that the response of ‘this isn’t a problem’ is, whilst possibly technically accurate, missing the sentiment of how valuable this user data is and their perhaps downplaying their own roles in the creation of these databases.”

Mr Liner’s actions would be likely to get him sued by social networks for intellectual property theft or copyright infringement. He probably wouldn’t face the full force of the law for his actions if he were ever found but, when asked if he was worried about getting arrested he said “no, anyone can’t find me” and ended our conversation by saying “have a nice time”.

How your personal data is being scraped from social media

‘No ambiguity’

Related Articles

29-Year-Old Awarded Brand-New Car During the Safaricom Hook Circle Bootcamp at Kenyatta University

Kenyan President defends Ksh104B SHA technology after auditor general’s expose

Kenyan ICT & Digital Economy Cabinet Secretary meets TikTok delegation to push for better earnings for Kenyans

Stay Connected

Latest Articles

29-Year-Old Awarded Brand-New Car During the Safaricom Hook Circle Bootcamp at Kenyatta University

Kenyan President defends Ksh104B SHA technology after auditor general’s expose

Kenyan ICT & Digital Economy Cabinet Secretary meets TikTok delegation to push for better earnings for Kenyans

M-Pesa Foundation commissions a KES 60 million maternity and newborn unit in Vihiga County

Safaricom Hook Kicks Off Second Edition of Youth Empowerment Bootcamp at Kenyatta University