written 7.1 years ago by | modified 7.1 years ago by |
Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information.
‘Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.’ Is also referred as big data in short, the term Big data applies to information that can’t be processed or analyzed using traditional processes or tools.
Characteristics of Big Data:
Big data can be characterized by 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be must processed.
Volume:
Volume Refers to the vast amounts of data generated every second. We are not talking Terabytes but Zetta bytes or Bronto bytes. These volumes of data sets are too large to store and analyze using traditional database technology. New big data tools use distributed systems so that we can store and analyze data across databases that are dotted around anywhere in the world.
Variety:
Different Types: Variety describes different formats of data that do not lend themselves to storage in structured relational database systems. These include a long list of data such as documents, emails, social media text messages, video, still images, audio, graphs, and the output from all types of machine-generated data from sensors, devices, RFID tags, machine logs, cell phone GPS signals, DNA analysis devices, and more. This type of data is characterized as unstructured or semi-structured and has existed all along.
Different Sources: Variety is also used to mean data from many different sources, both inside and outside of the company.
Velocity:
Data-In-Motion: Data scientists like to talk about data-at-rest and data-in-motion. One meaning of Velocity is to describe data-in-motion, for example, the stream of readings taken from a sensor or the web log history of page visits and clicks by each visitor to a web site. This can be thought of as a fire hose of incoming data that needs to be captured, stored, and analyzed.
Lifetime of Data Utility: A second dimension of Velocity is how long the data will be valuable. Is it permanently valuable or does it rapidly age and lose its meaning and importance. Understanding this dimension of Velocity in the data you choose to store will be important in discarding data that is no longer meaningful and in fact may mislead.
Value:
Although Value is frequently shown as the fourth leg of the Big Data stool, Value does not differentiate Big Data from not so big data. It is equally true of both big and little data that if we are making the effort to store and analyze it then it must be perceived to have value.
There are at least four additional characteristics that pop up in the literature from time to time. All of these share the same definitional problems of Value. That is they may be a descriptor of data but not uniquely of Big Data
Veracity:
What is the provenance of the data? Does it come from a reliable source? It is accurate and by extension, complete.
Variability:
There are several potential meanings for Variability. Is the data consistent in terms of availability or interval of reporting? Does it accurately portray the event reported? When data contains many extreme values it presents a statistical problem to determine what to do with these ‘outlier’ values and whether they contain a new and important signal or are just noisy data.
Viscosity:
This term is sometimes used to describe the latency or lag time in the data relative to the event being described. We found that this is just as easily understood as an element of Velocity.
Virality:
Defined by some users as the rate at which the data spreads; how often it is picked up and repeated by other users or events.