Posted in May 2015

Data about data

“Metadata absolutely tells you everything about somebody’s life. If you have enough metadata you don’t really need content.”
-Stewart  Baker

Many people do not feel safe when they know that another entity is collecting information about them. If an individual collects information about you and uses this data to follow you, then it is called stalking. But if an organization wants to collect information about you and millions of others, often it is tagged as security measure. Organizations require a lot of storage space to store data. Earlier the storage space used to be in the form of large rooms filled with papers, now it is in the form of hard drives. With increasing amount of data about each person, organizations which want to store data realized that they will run out of space or will need to spend more to add space. They had to come up with a way to store information in limited space.

Individuals who care for their privacy might be relieved that organizations may after all not collect data about them. But this is where metadata enters the picture. Metadata is data about data. For example, data would be the recording of a phone call Bob made to Alice while the metadata would include the time of call, duration of call, location of your phone, location of your friends phone and of course both your phone numbers and name. Storing a phone call would require many megabytes of space while the metadata requires only few kilobytes of space. Many might question, what organizations can do with phone metadata without knowing the contents of the call.

Phone metadata provides organizations with enough information to map the life of people. Your location is recorded in order to provide continuous connectivity. The user does not need to make calls in order to give away their location. Now assume the organization only collects the location metadata of hundreds of users, it can trace the movement of people and basically draw a map. Looking at the overlapping location of multiple users at a particular time, it can be known with high probability that this group of users are associated with each other. If the location happens to be a football stadium owned by the local club, and the same group of users’ meet at the place every time there is a game being played, it can be known that they are followers of this particular club. Note that the users did not voluntarily mention about their favourite football club. But it was known just by tracking their location. This is a trivial example with only one kind of metadata. So much more can be known if other types of metadata such as online search terms are also known by the organization.

A lot of people would not mind letting others know of their favourite football club. But let us say that two people are in an intimate relationship and want to keep it a secret. They call each other often late in the night. The call duration is often more than an hour long. But after few months, the duration of calls reduces and after few more months the frequency of calls also becomes irregular. Using this kind of metadata, it can be assumed that there was a relationship between these two people which has ended. When there is metadata, who needs data?

A Wealth of Information

“A Wealth of information creates a poverty of attention”
– Herbert Simon

In some parts of the world, it is assumed that most people who want and can afford, have access to the internet. I see advertisements saying “Rent includes internet. Who lives without internet these days?” I would say that the person who has placed the advertisement is living in a bubble. But does this person need to know about everything in the world? Over the past couple of years hundreds of Exabyte’s of information has been added.

Let us say that you are one of those who has the urge to know a little of everything that is happening in this world. You access news articles daily, probably watch news on television, read variety of books, follow “important” personalities on twitter, regularly check your emails, instant messages, watch new and old movies and search for other information on “popular” search engines. You are flooded with spam mails and advertisements as well. How much of this information do you need? What is the signal-to noise ratio? In addition you may not realise that you are living in a “filter bubble”.

Information pollution has an impact on individuals, businesses and society at large. But how can you reduce the impact of information pollution? Until unless you have read a book or watched a movie you don’t know if it is useful to you or not. Many corporations restrict usage of social media access of their employees during working hours. The question arises, why does anyone want to access social media sites while working? This is one example of continuous partial attention (CPA), in which a person is simultaneously paying attention to multiple information sources. Often it is an automatic process due to habit and not a conscious one.

The concept of attention economy discusses the abundance of information and how its immediate availability limits the human ability to process the information. This has lead to corporations incorporating intangibles such as personalization, ease of access and immediacy to attract consumers as reproduction of information does not cost anything.

In this ocean of information, how much is enough?

 

© 2011 TU Delft