University Post
University of Copenhagen
Independent of management


Big data, big challenges, big hopes

Machine learning promises to help us understand climate changes, monitor threatened animals, improve quality of life, and in general increase our knowledge of society. Head of Google Research, NY, and University of Copenhagen honorary alumni believes there are huge opportunities in Big Data

Advances in big data search, collection, and processing, promise services to mankind that so far have not been imagined. The data can provide insight into medical research and health care, climate phenomena, and social inequalities. And it can reveal business ideas that nobody ever thought of before.

This is according to Corinna Cortes, head of Google Research, NY, and associate professor at the University of Copenhagen, who has just been awarded an honorary alumni prize. She will be in Copenhagen to receive the prize on 27 September at the Alumni Day event.

“Technically, Big Data is simply a lot of data,” Corinna Cortes explains to the University Post.

“Let me give an example. Every morning you may check the outdoor temperature, the barometric pressure, and look at how much it rained yesterday. That is small data. Add to this the continuous electronic records of these numbers from millions of locations around the world. Add again to this satellite imagery of all of the Earth and water flow measurements of our oceans and you get Big Data.”

Hurricane Sandy and street view

Big Data (see box right) is defined by a bunch of V’s: It is high ‘Volume’ – there is lots of it. It has ‘Variety’ – it mixes text, dates, numbers, images, videos. It may not have ‘Veracity’ – meaning it is messy, raw, low-level, detailed and noisy data.

And then it has ‘Velocity’, or as Corinna puts it: “It keeps coming at us like open fire hoses”.

Corinna Cortes: “Big Data is so popular that a whole new education as a Data Scientist has emerged”

Here is an example. Combining temperature observations, ozone measurements, moisture maps, and satellite imagery pose immense computational challenges, but the payoff may be significant. If we can learn how to process all this data and form models we may faster detect changes in our climate and be able take corrective measures. But even better short term weather predictions, such as tsunami warnings are important.

Corinna Cortes offers a less dramatic example from her own life.

“Less than 2 years ago New York City was hit by hurricane Sandy. The predictions focused on the wind strengths, but it was actually the rise in water level that caused the biggest problems. We were ourselves without water, heat, and electricity for 6 days, and many were without it for much longer.”

“Since then we have seen many visualization tools overlaying rising water levels on Street View images so one can ‘first hand’ inspect the consequences of flooding,” she says.

Endangered species

“We see environmental research beginning to show up as collaborations between universities and industries. Google is itself contributing with its Crisis Response Efforts providing early warnings and public alerts.”

“Another great example of how Big Data can help us understand environmental changes are projects monitoring threatened animals. Various non-profit organizations use near real-time big data analysis of camera data and climate sensors to generate trends data on endangered species.”

Great possibilities, but also great challenges from a computational perspective.

“One definition of Big Data is exactly that it is too big to fit on any conventional data analysis platform”, says Corinna Cortes, adding that “Big Data is so popular that a whole new education as a Data Scientist has emerged. Around 200 universities around the world are offering this new degree with a curriculum including elements from computer science, statistics, and business understanding.”

Search, but also Analytics, Gmail, Google Earth

One thing is climate and nature, another popular area for Big Data is what is referred to as Urban Data. Urban Data covers everything from transit analysis, noise level recordings, hospital utilization, power distribution, and trash collection.

“The goal is to make our cities more efficient and effective, to increase quality of life and prevent problems”, she says. “I recently heard of a study that – not surprisingly – found that building fires were more frequent in buildings that hadn’t had a fire inspections for some time.”

“The hope is that this new degree of Data Scientists will teach the students to translate ill-posed concerns into data-oriented questions and extract the information from the Big Data,” says Corinna Cortes.

Google is almost synonymous with ‘search’, and access to information has so far been the biggest way that Google has changed our lives, by any reckoning. But Google’s other programmes range from Google Analytics for editors and website managers – tracking website traffic in real time, the ubiquitous Gmail, and Google Earth – that has turned us all into armchair travellers. To name just a few.

Fighting the spam

Corinna Cortes has made her academic career in the field of machine learning. In Google this finds practical application in almost all parts of Google.

“We have classifiers for speech recognition, image labeling, spam, and much, much more. Google is a paradise for machine learning,” she says.

“A recent learning problem from my team has been that of finding the good tables on the web. Most tables are formatting tables, and we were looking to find tables that are actually about something and have good information in them. It is reasonably easy for humans to tell good tables from bad tables, so we had a lot of tables labeled and trained classifiers to tell them apart. In the process we developed new learning algorithms,” she explains. A new Google structured snippets feature uses these principles.

Corinna Cortes is currently training for the New York City Marathon.“This may be my only chance ever to beat Caroline Wozniacki, the Danish tennis player that this year will also run”.

The last time the University Post interviewed Corinna Cortes she was searching for the best method to teach computers how to find similarities between things like words and images. Since then her team has moved forward.

“We have developed algorithms for clustering of items in a changing setting. One of the first applications has been gmail spam. New sort of spam pops up all the time and we quickly have to find clusters of similar spam messages so we can start filtering them out. It is easy enough if you look at the end of the day to locate new clusters, but we have to detect them much, much faster. To do that we have developed new similarity schemes and the new algorithms reduce user complaints with hundreds of thousands a day.

To run New York City Marathon

For almost every activity there is an app – a Google one, even.

Outside the office, Corinna Cortes is into running. She is currently training for the New York City Marathon.

“This may be my only chance ever to beat Caroline Wozniacki, the Danish tennis player that this year will also run this marathon”.

For her training, “I just bring my phone and keys with me,” she told the University Post, the last time we interviewed her.

“But I love Gmap-pedometer built on Google maps. I use it to map out my runs,” she says.

Read another interview with Corinna Cortes on the subject of Big Data and machine learning here.

Like us on Facebook for features, guides and tips on upcoming events. Follow us on Twitter for links to other Copenhagen academia news stories. Sign up for the University Post weekly newsletter here, and then follow the University Post on Instagram here.