Author:
Ravindra Bidnur

Current trend is Big data, which is the main source for ML and AI applications. This is getting more traction and becoming more and more popular and hence necessitating to understand more about what is this Big data? What are its properties and values? This blog tries to explain Big data in simplified terms and makes it easy to understand.

What is Big data? And associated properties?

The definition of big data is data that contains greater Variety, arriving in increasing Volumes and with more Velocity. This is also known as the three Vs.

In other words, big data is larger, more complex data sets, especially from new data sources. These massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before. These data sets are so voluminous that traditional data processing software just can’t manage them and hence a new domain of data analysis emerged by using more compute power know as Machine Learning and using this taking a decision based on Artificial Intelligence (with much less interference from human).

Let us look at more closely on three Vs (later 2 more Vs added)

Volume

The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as data feeds from twitter, Facebook or google, clickstreams on a web page or a mobile app, or sensor-enabled equipment. The amount of data varies from tens of terabytes of data to hundreds of petabytes based on the nature of feeds for a specific target.

Velocity

Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action. This property thrust on the processing power/speed to analyze the data. If it is going to be real time, then it requires high performance computing with minimum latency.

Variety

Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semi-structured data types, such as text, audio, and video, require additional pre-processing to derive meaning and support metadata.

As time progressed and 2 more Vs added to the above list of 3 Vs and are Value and Veracity

Value

Data has intrinsic value. But it’s of no use until that value is discovered. Its very important property and based on the value many more inferences can be drawn during the data analysis.

Today, big data has become capital/asset to the company. Think of some of the world’s biggest tech companies. A large part of the value they offer comes from their data, which they’re constantly analyzing to produce more efficiency and develop new products.

Finding value in big data isn’t only about analyzing it (which is a whole other benefit). It’s an entire discovery process that requires insightful analysts, business users, and executives who ask the right questions, recognize patterns, make informed assumptions, and predict behaviour.

Veracity

This refers to the quality of the collected data. If source data is not correct, analyses will be worthless. As the world moves toward automated decision-making, where computers make choices instead of humans, it becomes imperative that organizations be able to trust the quality of the data.

What are the benefits of Big-data?

There are many ways big data is beneficial. We can see its usefulness in almost all applications like communication, industrial, automobile, financial sector etc etc. In general we can say

  • Big data makes it possible for you to gain more complete answers because you have more information.
  • More complete answers mean more confidence in the data—which means a completely different approach to tackling problems.

Big data can help you address a range of business activities, from customer experience to analytics. Here are just a few

  • Product development
  • Predictive maintenance
  • Customer experience
  • Fraud and Compliance
  • Operational efficiency
  • Drive innovation

What are the challenges associated with Big data?

As this is an emerging domain, naturally it has associated challenges. Some of the main challenges are listed here

  • The volumes data and storing the data securely itself is a huge challenge.
  • Data curation. Clean data, or data that’s relevant to the client and organized in a way that enables meaningful analysis, requires a lot of work. This itself is an art and data analysts’ job are to curate the data. Getting good experienced data analysist is a challenge.
  • Keeping up with big data technology is an ongoing challenge

What are the best practices for managing Big-data?

After understanding what is big data, associated properties and also some of the challenges in managing the big data, we will now see what are some of the best practices used in the industry to work on this.

  • Align big data with specific business goals – To determine if you are on the right track, ask how big data supports and enables your top business and IT priorities. Examples include understanding how to filter web logs to understand ecommerce behavior, deriving sentiment from social media and customer support interactions, and understanding statistical correlation methods and their relevance for customer, product, manufacturing, and engineering data.
  • Ease skills shortage with standards and governance – This shortage of skills can be mitigated by ensuring that big data technologies, considerations, and decisions are added to your IT governance program. Standardizing your approach will allow you to manage costs and leverage resources. 
  • Develop center of excellence for knowledge optimization – Use a center of excellence approach to share knowledge, control oversight, and manage project communications.
  • Mix and match of data – Bring even greater business insights by connecting and integrating low density big data with the structured data you are already using today.

What are some big data examples and trends?

Here are some real time big data examples and trends

  • Discovering consumer shopping habits
  • Personalized marketing
  • Finding new customer leads
  • Fuel optimization tools for the transportation industry
  • User demand prediction for ridesharing companies
  • Monitoring health conditions through data from wearables
  • Live road mapping for autonomous vehicles
  • Streamlined media streaming
  • Predictive inventory ordering
  • Personalized health plans for cancer patients
  • Real-time data monitoring and cybersecurity protocols
References: