H8W834 - Big Data: October 2018

8. Characteristic of big data analysis (including visualisation)

Big data analysis should be viewed from two perspectives:

Decision-oriented

Action-oriented

A decision-oriented analysis is more akin to traditional business intelligence. Look at selective subsets and representations of larger data sources and try to apply the results to the process of making business decisions. Certainly, these decisions might result in some kind of action or process change, but the purpose of the analysis is to augment decision making.

Action-oriented analysis is used for rapid response when a pattern emerges or specific kinds of data are detected and action is required. Taking advantage of big data through analysis and causing proactive or reactive behavior changes to offer great potential for early adopters.

Finding and utilizing big data by creating analysis applications can hold the key to extracting value sooner rather than later. To accomplish this task, it is more effective to build these custom applications from scratch or by leveraging platforms and/or components.

First, look at some of the additional characteristics of big data analysis that make it different from traditional kinds of analysis aside from the three Vs of volume, velocity, and variety.

Bibliography: dummies. (2018). Characteristics of Big Data Analysis - dummies. [online] Available at: https://www.dummies.com/programming/big-data/data-science/characteristics-of-big-data-analysis/ [Accessed 11 Oct. 2018].

Image 1: Tehseen.dbsdataprojects.com. (2018). [online] Available at: http://tehseen.dbsdataprojects.com/wp-content/uploads/sites/107/2016/04/3vs1.jpg [Accessed 11 Oct. 2018].

Big Data visualization involves the presentation of data of almost any type in a graphical format that makes it easy to understand and interpret. But it goes far beyond typical corporate graphs, histograms and pie charts to more complex representations like heat maps and fever charts, enabling decision makers to explore data sets to identify correlations or unexpected patterns.

A defining feature of Big Data visualization is scale. Today's enterprises collect and store vast amounts of data that would take years for a human to read, let alone understand. But researchers have determined that the human retina can transmit data to the brain at a rate of about 10 megabits per second. Big Data visualization relies on powerful computer systems to ingest raw corporate data and process it to generate graphical representations that allow humans to take in and understand vast amounts of data in seconds.

Bibliography: Datamation.com. (2018). What is Big Data Visualization?. [online] Available at: https://www.datamation.com/big-data/big-data-visualization.html [Accessed 11 Oct. 2018].

Image 2: Static.timesofisrael.com. (2018). [online] Available at: https://static.timesofisrael.com/www/uploads/2014/08/israel-gaza-war-data-media-clusters-1407246759.22-148293.png [Accessed 11 Oct. 2018].

6. Traditional statistic (descriptive and inferential).

In today’s fast-paced world, statistics is playing a major role in the field of research; that helps in the collection, analysis, and presentation of data in a measurable form. It is quite hard to identify, whether the research relies on descriptive statistics or inferential statistics, as people usually lack knowledge about these two branches of statistics. As the name suggests, descriptive statistics is one which describes the population.

On the other end, Inferential statistics are used to make the generalization about the population based on the samples. So, there is a big difference between descriptive and inferential statistics, i.e. what you do with your data. Let’s take a glance at this article to get some more details on the two topics.

BASIS FOR COMPARISON	DESCRIPTIVE STATISTICS	INFERENTIAL STATISTICS
Meaning	Descriptive Statistics is that branch of statistics which is concerned with describing the population under study.	Inferential Statistics is a type of statistics, that focuses on drawing conclusions about the population, on the basis of sample analysis and observation.
What it does?	Organize, analyze and present data in a meaningful way.	Compares, test and predicts data.
Form of Final Result	Charts, Graphs, and Tables	Probability
Usage	To describe a situation.	To explain the chances of occurrence of an event.
Function	It explains the data, which is already known, to summarize sample.	It attempts to reach the conclusion to learn about the population, that extends beyond the data available.

Difference Between Descriptive and Inferential Statistics

May 9, 2016, By Surbhi S 16 Comments

In today’s fast-paced world, statistics is playing a major role in the field of research; that helps in the collection, analysis, and presentation of data in a measurable form. It is quite hard to identify, whether the research relies on descriptive statistics or inferential statistics, as people usually, lack knowledge about these two branches of statistics. As the name suggests, descriptive statistics is one which describes the population.

On the other end, Inferential statistics are used to make the generalisation about the population based on the samples. So, there is a big difference between descriptive and inferential statistics, i.e. what you do with your data. Let’s take a glance at this article to get some more details on the two topics.

Comparison Chart

BASIS FOR COMPARISON	DESCRIPTIVE STATISTICS	INFERENTIAL STATISTICS
Meaning	Descriptive Statistics is that branch of statistics which is concerned with describing the population under study.	Inferential Statistics is a type of statistics, that focuses on drawing conclusions about the population, on the basis of sample analysis and observation.
What it does?	Organize, analyze and present data in a meaningful way.	Compares, test and predicts data.
Form of Final Result	Charts, Graphs, and Tables	Probability
Usage	To describe a situation.	To explain the chances of occurrence of an event.
Function	It explains the data, which is already known, to summarize sample.	It attempts to reach the conclusion to learn about the population, that extends beyond the data available.

Definition of Descriptive Statistics

Descriptive Statistics refers to a discipline that quantitatively describes the important characteristics of the dataset. For the purpose of describing properties, it uses measures of central tendency, i.e. mean, median, mode and the measures of dispersion i.e. range, standard deviation, quartile deviation, and variance, etc.

The data is summarised by the researcher, in a useful way, with the help of numerical and graphical tools such as charts, tables, and graphs, to represent data in an accurate way. Moreover, the text is presented in support of the diagrams, to explain what they represent.
Definition of Inferential Statistics

Inferential Statistics is all about generalising from the sample to the population, i.e. the results of analysis of the sample can be deduced to the larger population, from which the sample is taken. It is a convenient way to draw conclusions about the population when it is not possible to query each and every member of the universe. The sample chosen is a representative of the entire population; therefore, it should contain important features of the population.

Inferential Statistics is used to determine the probability of properties of the population on the basis of the properties of the sample, by employing probability theory. The major inferential statistics are based on the statistical models such as Analysis of Variance, chi-square test, Student's t distribution, regression analysis, etc. Methods of inferential statistics:

Estimation of parameters
Testing of hypothesis

Key Differences Between Descriptive and Inferential Statistics

The difference between descriptive and inferential statistics can be drawn clearly on the following grounds:

Descriptive Statistics is a discipline which is concerned with describing the population under study. Inferential Statistics is a type of statistics; that focuses on drawing conclusions about the population, on the basis of sample analysis and observation.
Descriptive Statistics collects, organises, analyzes and presents data in a meaningful way. On the contrary, Inferential Statistics, compares data, test hypothesis and make predictions of the future outcomes.
There is a diagrammatic or tabular representation of the final result in descriptive statistics whereas the final result is displayed in the form of probability.
Descriptive statistics describe a situation while inferential statistics explains the likelihood of the occurrence of an event.
Descriptive statistics explains the data, which is already known, to summarise sample. Conversely, inferential statistics attempts to reach the conclusion to learn about the population; that extends beyond the data available.

Bibliography: S, S. (2018). Difference Between Descriptive and Inferential Statistics (with Comparison Chart) - Key Differences. [online] Key Differences. Available at: https://keydifferences.com/difference-between-descriptive-and-inferential-statistics.html [Accessed 11 Oct. 2018].

Understand Statistical Inference

Bibliography: YouTube. (2018). Understanding Statistical Inference. [online] Available at: https://www.youtube.com/watch?v=tFRXsngz4UQ [Accessed 11 Oct. 2018].

2. Historical development of big data

In 1989 British computer scientist Tim Berners-Lee invented eventually the World Wide Web. He wanted to facilitate the sharing of information via a ‘hypertext’ system. Little could he know at the moment the impact of his invention.

As of the ‘90s, the creation of data is spurred as more and more devices are connected to the internet. In 1995 the first super-computer is built, which was able to do as much work in a second than a calculator operated by a single person can do in 30.000 years.

In 2005 Roger Mougalas from O’Reilly Media coined the term Big Data for the first time, only a year after they created the term Web 2.0. It refers to a large set of data that is almost impossible to manage and process using traditional business intelligence tools.

2005 is also the year that Hadoop was created by Yahoo! built on top of Google’s MapReduce. Its goal was to index the entire World Wide Web and nowadays the open-source Hadoop is used by a lot of organizations to crunch through huge amounts of data. As more and more social networks start appearing and the Web 2.0 takes flight, more and more data is created on a daily basis. Innovative startups slowly start to dig into this massive amount of data and also governments start working on Big Data projects. In 2009 the Indian government decides to take an iris scan, fingerprint, and photograph of all of tis 1.2 billion inhabitants. All this data is stored in the largest biometric database in the world.

In 2010 Eric Schmidt speaks at the Techonomy conference in Lake Tahoe in California and he states that "there were 5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days."

In 2011 the McKinsey report on Big Data: The next frontier for innovation, competition, and productivity, states that in 2018 the USA alone will face a shortage of 140.000 – 190.000 data scientist as well as 1.5 million data managers.

In the past few years, there has been a massive increase in Big Data startups, all trying to deal with Big Data and helping organizations to understand Big Data and more and more companies are slowly adopting and moving towards Big Data. However, while it looks like Big Data is around for a long time already, in fact, Big Data is as far as the internet was in 1993. The large Big Data revolution is still ahead of us so a lot will change in the coming years. Let the Big Data era begin!

Bibliography: Datafloq.com. (2018). A Short History Of Big Data. [online] Available at: https://datafloq.com/read/big-data-history/239 [Accessed 11 Oct. 2018].

Image.slidesharecdn.com. (2018). [online] Available at: https://image.slidesharecdn.com/nrfbigdatav14-present-140113110515-phpapp01/95/how-to-get-your-brand-elected-big-data-presentation-at-nrf-by-david-selinger-5-638.jpg?cb=1389612110 [Accessed 11 Oct. 2018].

1. Definition of Big Data

Is no one definition for Big Data.

Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it involves many areas of business and technology.

We can find the definition on Wikipedia:

Big data is a term used to refer to the study and applications of data sets that are so big and complex that traditional data-processing application software is inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source.

https://en.wikipedia.org/wiki/Big_data. 2018. Big Data. [ONLINE] Available at:https://en.wikipedia.org/wiki/Big_data. [Accessed 31 August 2018].

H8W834 - Big Data

Thursday, 11 October 2018