Friday, 18 January 2019

22. Google trends - how is work?

To check how Google trends works and displays big data, I followed Brexit, Donald Trump and inflation since 31 of August 2018 till9 January 2019.


I was able to see the data which is available on the website and am able to follow the data on a week by week basis.


The average from 31/08/2018-09/01/2019 was:-
Brexit - 19
Donald Trump - 4
Inflation - 1




31/08/2018
Brexit – 42
Donald Trump – 68
Inflation – 4



07/09/2018
Brexit – 42
Donald Trump – 83
Inflation – 4







14/09/2018
Brexit – 78
Donald Trump – 81
Inflation – 7



21/09/2018
Brexit – 100
Donald Trump – 57
Inflation – 4


28/09/2018
Brexit – 51
Donald Trump – 16
Inflation – 3




05/10/2018
Brexit – 86
Donald Trump – 29
Inflation – 7



12/10/2018
Brexit – 94
Donald Trump – 25
Inflation – 6



19/10/2018
Brexit – 76
Donald Trump – 16
Inflation – 6




26/10/2018
Brexit – 76
Donald Trump – 16
Inflation – 6


02/11/2018
Brexit – 85
Donald Trump – 27
Inflation – 7





09/11/2018
Brexit – 90
Donald Trump – 28
Inflation – 6


16/11/2018
Brexit – 56
Donald Trump – 3
Inflation – 1





23/11/2018
Brexit – 36
Donald Trump – 6
Inflation – 2

30/11/2018
Brexit – 66
Donald Trump – 13
Inflation – 4


07/12/2018
Brexit – 72
Donald Trump – 13
Inflation – 3




14/12/2018
Brexit – 54
Donald Trump – 8
Inflation – 2





21/12/2018
Brexit – 58
Donald Trump – 17
Inflation – 3




28/12/2018
Brexit – 69
Donald Trump – 22
Inflation – 4




04/01/2019
Brexit – 100
Donald Trump – 22
Inflation – 6






06/01/2019
Brexit – 95
Donald Trump – 22
Inflation – 5

17. Strategies for limiting the negative effects of big data

With people using Amazon, Facebook and Google it is very common that they will not think about signing out when finished, this is a problem when using public networks.  To avoid the problems and to limit the negative impact of big data, users should use private networks and keep password safe. From a security point of view passwords should be complex and users should always log out when finished to end the session.





Wednesday, 16 January 2019

5. Value of data (including the future value)


THE EXPLOSION IN SOCIAL NETWORKING
Nowadays the social media are available from mobile phones, laptops and tablets that the consumers and they have the access for it 24/7. Enormous numbers of consumer can tell as how many data is stored on the portals. For example, if we follow for one hour famouse brand on Twitter we can see exactly how it is work. Social networking is available on the whole world and people can communicate each other.

MOBILE TECHNOLOGY

"Today, smart devices have revolutionized mobile computing by offering the big data solution.Citing the latest finding from BI Intelligence, Business Insider’s Marcelo Ballve revealed that mobile’s big data surge induces a positive impact on all aspects of the trade.
(Datasciencecentral.com. (2019). The Present and the Future of Big Data and Mobile Technology. [online] Available at: https://www.datasciencecentral.com/profiles/blogs/the-present-and-the-future-of-big-data-and-mobile-technology [Accessed 9 Jan. 2019].)

If we look at the past we can see how mobile technologies are changed. Customers can use the devices everywhere. Companies are created for them better and faster devices. People are online any time and data are stored and running all the time.



Most users will have access to 3G coverage, whereas, in large built-up areas, the coverage is usually upgraded to 4G.  5G is being tested in London (Wired.co.uk. (2019). What is 5G and when will it come to the UK?. [online] Available at: https://www.wired.co.uk/article/what-is-5g-launch-date-mobile-networks-uk [Accessed 9 Jan. 2019].) which will allow for streaming and download speeds usually reserved for broadband.
The large operators are constantly adding more and more bandwidth to the current infrastructure, allowing for newer technology to access the network(s) at faster speeds.  There is a huge range of tariffs available to new customers and those customers who upgrade to newer handsets and packages.  More and more customers are electing to choose all-inclusive packages where they do not have to monitor the amount of data being used.  This allows for customers to stream data i.e. movies and TV shows as well as constant on-demand connection to apps and social media.

INTERNET AND WWW
The Internet is available for everyone everywhere. People can "surf" on the internet to look for information or check social media. Young people can use the internet at school to work together. If we look back 10 years was not possible to access for this scale.

TELECOMMUNICATIONS - 24h PER DAY

Companies invited new technology for the customers. We can see adverts which one show us new technology like mobile phones, tablets, laptops. People can communicate with each other around the world 24/7.

STORAGE COSTS

The study examined the cost of hardware, software, and support for 250TB of storage, growing 25% per year and used to store infrequently accessed data in an online archive.
Suse.com. (2019). Managing the Data Explosion with a Total Cost of Ownership | SUSE. [online] Available at https://www.suse.com/lp/data-storage-growth-total-cost-ownership/?campaign_description=FY19%20November%20Global%20GMOC%20Advertising%20online%20SEMPPC%20Data%20Explosion&campaign_id=GSDGNDX33768&gclid=Cj0KCQiA1NbhBRCBARIsAKOTmUvns-bQlgcxZ0B86T24r6UyLcB1KWLHgUwRkePWBzBCZKfiLmX0BJYaAhtAEALw_wcB [Accessed 9 Jan. 2019].)


3. Growth of data (including measures of data)



The big data is exploded because everyday life is changed and is more data created. People use the mobile phones, tablets, laptops 24/7 for serching information. Companies collect this information  to be sure they meet their expectations.

Facebook - one of most popular social media is the best examples of growth data. Users are able to comunicate each other even one is live in Australia and other in Africa. 


21. Application of big data techniques to a problem.

By applying the data from a Miriad of sources (Big Data), the analysis of the current and past data will allow for the analyst to look in to and solve any problems.

20. Types of visualization.

Data visualisation is an important visual method for effective communication and analysing large datasets. Through data visualisations, we are able to draw conclusions from data that are sometimes not immediately obvious and interact with the data in an entirely different way.
Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports. Data visualization is a quick, easy way to convey concepts in a universal manner – and you can experiment with different scenarios by making slight adjustments.

Sas.com. (2019). Data Visualization: What it is and why it matters. [online] Available at: https://www.sas.com/en_gb/insights/big-data/data-visualization.html [Accessed 18 Jan. 2019].


19. Data mining methods.

Data mining involves “processing data and identifying patterns and trends in that information,” according to IBM. “Data mining principles have been around for many years, but, with the advent of big data, it is even more prevalent.”
Ninety percent of data in the world today has been created in the last two years alone, IBM estimates. Every day, people create 2.5 quintillion bytes of data, enough to fill 10 million Blu-ray Discs.
Data mining techniques help professionals provide insights into available data sets. The techniques can offer descriptive and predictive power for businesses and other organizations.

5 Data Mining Techniques

1.    Association

Association makes a correlation between two or more items to identify a pattern. For instance, a supermarket could determine that customers often purchase whipped cream when they buy strawberries and vice versa. Association is often used at point-of-sale systems to determine common tendencies among products.
“It’s a very simple method, but you’d be surprised how much intelligence and insight it can provide—the kind of information many businesses use on a daily basis to improve efficiency and generate revenue,” according to technology company Galvanize. Application areas include physical organization of items, marketing and the cross-selling and up-selling of products.

2.    Classification

Multiple attributes can be used to identify a particular class of items. Classification assigns items into target categories or classes to accurately predict what will occur within the class.
Several industries use classification with customers. For instance, a banking company could use a classification model to identify loan applicants as low, medium or high credit risks. Other organizations classify current and target audiences into age and social groups for marketing campaigns.

3.    Clustering

“Clustering is the method by which like records are grouped together,” according to Alex Berson, Stephen Smith and Kurt Thearling in the book Building Data Mining Applications for CRM. “Usually this is done to give the end user a high level view of what is going on in the database.”
Seeing object groupings can help businesses in areas like marketing segmentation. Clustering can be used in this example to subdivide a market into subsets of customers. Each subset can then be targeted with a specific marketing strategy based on the attributes of the cluster, such as buying patterns for customers in one cluster vs. another cluster.

4.    Decision Trees

Decision trees are used to categorize or predict data. A decision tree starts with a simple question that has two or more answers. Each answer leads to a further question that is used to classify or identify data that can be categorized, or so that a prediction can be made based on each answer.
The graphic of a decision tree represents how a cell phone provider might classify customers who churn, or those who don’t renew their phone contracts. The authors of Building Data Mining Applications for CRM offer some interesting takeaways for the graphic.
  • It divides the data into each branch without losing any of the data. For instance, the total number of records in a parent node is equal to the sum of the records contained in its two children.
  • The number of churners and non-churners is conserved as you move up or down the tree.
  • It is fairly easy to understand how the model is being built.
  • The model would be pretty easy to use if you needed to target customers who are likely to churn with a marketing offer.
  • The company could develop intuition about its customer base; for instance, it could conclude that customers who have been with the provider for a couple of years and have up-to-date cell phones tend to be loyal.

5.    Sequential Patterns

Sequential patterns identify trends or regular occurrences of similar events. This data mining technique is often used to understand user buying behaviors. Many retailers use data and sequential patterns to decide on the products they display.
“With customer data you can identify that customers buy a particular collection of products together at different times of the year,” according to IBM. “In a shopping basket application, you can use this information to automatically suggest that certain items be added to a basket based on their frequency and past purchasing history.”

Datafloq.com. (2019). 5 Major Data Mining Techniques Being Used by Big Data. [online] Available at: https://datafloq.com/read/5-major-data-mining-techniques-being-used-big-data/3352 [Accessed 18 Jan. 2019].

18. Types of problem suited to big data analysis.

Here are 5 types of big data analytics:
Prescriptive Analytics
The most valuable and most underused big data analytics technique, prescriptive analytics gives you a laser-like focus to answer a specific question. It helps to determine the best solution among a variety of choices, given the known parameters and suggests options for how to take advantage of a future opportunity or mitigate a future risk. It can also illustrate the implications of each decision to improve decision-making. Examples of prescriptive analytics for customer retention include next best action and next best offer analysis.


  • Forward looking
  • Focused on optimal decisions for future situations
  • Simple rules to complex models that are applied on an automated or programmatic basis
  • Discrete prediction of individual data set members based on similarities and differences
  • Optimization and decision rules for future events
Diagnostic Analytics
Data scientists turn to this technique when trying to determine why something happened. It is useful when researching leading churn indicators and usage trends amongst your most loyal customers. Examples of diagnostic analytics include churn reason analysis and customer health score analysis. Key points:
  • Backward looking
  • Focused on causal relationships and sequences
  • Relative ranking of dimensions/variable based on inferred explanatory power)
  • Target/dependent variable with independent variables/dimensions
  • Includes both frequentist and Bayesian causal inferential analyses
Descriptive Analytics
This technique is the most time-intensive and often produces the least value; however, it is useful for uncovering patterns within a certain segment of customers. Descriptive analytics provide insight into what has happened historically and will provide you with trends to dig into in more detail. Examples of descriptive analytics include summary statistics, clustering and association rules used in market basket analysis. Key points:
  • Backward looking
  • Focused on descriptions and comparisons
  • Pattern detection and descriptions
  • MECE (mutually exclusive and collectively exhaustive) categorization
  • Category development based on similarities and differences (segmentation)
Predictive Analytics
The most commonly used technique; predictive analytics use models to forecast what might happen in specific scenarios. Examples of predictive analytics include next best offers, churn risk and renewal risk analysis.
  • Forward looking
  • Focused on non-discrete predictions of future states, relationship, and patterns
  • Description of prediction result set probability distributions and likelihoods
  • Model application
  • Non-discrete forecasting (forecasts communicated in probability distributions)
Outcome Analytics
Also referred to as consumption analytics, this technique provides insight into customer behavior that drives specific outcomes. This analysis is meant to help you know your customers better and learn how they are interacting with your products and services.
  • Backward looking, Real-time and Forward looking
  • Focused on consumption patterns and associated business outcomes
  • Description of usage thresholds
  • Model application

Business 2 Community. (2019). 5 Types of Big Data Analytics and How They Help Customer Success. [online] Available at: https://www.business2community.com/big-data/5-types-big-data-analytics-help-customer-success-01519563 [Accessed 18 Jan. 2019].

16. Implications of big data for society.



Since McKinsey wrote a well-known report about Big Data in May 2011 and called it the next frontier for innovation, competition and productivity, we have come a long way. In the past year 12% of organisations have developed a Big Data strategy, while 70% of the respondents of a SASsurvey said that they were planning or already started a Big Data project. More interesting however, is that 21% don’t know enough about Big Data, while 15% did not understand the benefits of Big Data for their organisation. So, while slowly more organisations are implementing a Big Data strategy, there is still a large part that has no clue about Big Data. When organisations do not understand Big Data, there probably is a large part of the consumers that have no clue about Big Data either.


And that is scary, as Big Data will have a gigantic impact on society, the way organisations are managed and operated, the way the government is organized and subsequently the global economy.

Big Data, as is with most of disruptive technologies happening at the moment, has at first an impact on organisations. The innovators and the early adopters among the organisations are already embracing Big Data. According to the Technology Adoption Cycle this group represents approximately 16% and with 12% having implemented a strategy we are almost bridging the gap to the rest of the organisations. Only when the technology has crossed the chasm, the rest of the organisations, the early and late majority and finally the laggards, will implement a Big Data strategy.


Obviously, an impact on the economy will affect society. Big Data will bring many quantified-selfmovement, consumers are able to track and monitor their every move and thus gain a better understanding of their own lives. 


Datafloq.com. (2019). Big Data Impacts Society, But What Will Be The Impact Of Society On Big Data?. [online] Available at: https://datafloq.com/read/big-data-will-have-a-big-impact-on-society/212 [Accessed 18 Jan. 2019].

15. Implications of big data for individuals.

The concept of Big Data is often the exclusive realm of big companies and international conglomerates, but it has such a huge impact on everyday people in ways that many don’t even realize.
From the most basic suggestion engines based on a few tags, through to the fertility of somebody who is trying to conceive a child, data is playing a significant part in everybody’s lives. 
So how exactly is it affecting the lives of people in the following areas?
Health & Fitness
Fitness monitors are not new, having been publicly available in some form or another for the past 10 years. What we are seeing through the increased use of Big Data though, is that the suggestions of what individuals should be doing to improve or track differently are having a profound difference. Where trackers could see how many steps had been taken, modern trackers can note anything from the pace to gradient, giving a far more accurate picture of fitness and how to improve it. 
The biggest impact may well come from how the healthcare industry uses data and how individuals will benefit from this in significant ways. Through pooling anonymized data and creating a vast database of case studies, it becomes possible to analyze millions of records to identify the best solutions for the individual patient. Through pooling this kind of data it is also possible to identify how diseases and viruses spread and what the most effective way to halt them may be.
One of the biggest personal impacts that Big Data can have is on the conception of a child. With many couples struggling to conceive within 12 months, apps like Kindara are hoping to help through asking women to input information such as details of their cycles, fitness activities, moods and temperature. This information is then analyzed and recommendations are given to help improve fertility. This kind of information can be invaluable for couples looking to start a family.
Travel
Even the simplest forms of travel today are powered by Big Data, so just moving from one place to another often comes down to how you and your apps are using data. Take Citymapper, one of the most popular transit apps in the world. The app takes GPS data of where you are, location data for public transport within a certain proximity and also the current state of them. So if a train line has delays, the app will know and be able to redirect you to another route that can get you there faster. 
Also by simply using public transport in many cities, people don’t realise that big data is helping to make sure that they are getting to their destinations effectively and safely. Sensors throughout many networks pinpoint potential safety hazards allowing technicians to be directed to the correct place and solve the issue quickly and efficiently.
Entertainment
When we watch Netflix, choosing a programme from the 36,000+ on there would be almost impossible unless there were some kind of personalized suggestions to point us in the right direction. This is exactly what happens, with suggestion engines looking at what you have previously watched and liked, then showing you similar programmes to help you choose. 
It is not only in the watching of programmes either, it is also in the creation of some of the best loved shows that networks make. House of Cards, for instance, was created because data showed that people liked Kevin Spacey and political dramas. Therefore the reimagining of a 1990s BBC miniseries was created and became one of the best loved shows of the last 5 years. 
Shopping
Companies like Amazon and Google have created entire businesses around their use of analytics to both predict what people may want to buy and to make it as simple as possible for consumers to do so. They do this through suggestion engines based on previous purchase history which is perhaps the most basic form of data use in shopping and one that people are more than aware of. 
Footfall analytics are now having a significant impact on how brick and mortar shops are both laid out and operated too. Through knowing the amount of people who have come through their doors whilst also knowing the sales done by these shops compared to the numbers of people arriving, means that they can see how effective their campaigns and shop layouts are. 
It is also personalizing shopping for many people, with custom offers and codes given to certain people based on their previous behavior. If somebody were to buy frequently from one site, then stopped, data would tell the marketing team to offer significant discounts to that person in order to get them back as a customer. Without this kind of data the company may lose these customers forever and it also allows the customer to receive specialist treatment to bring them back into the fold. 


Enterprise, B. and Hill, G. (2019). Big Data For Individuals | Articles | Big Data. [online] Channels.theinnovationenterprise.com. Available at: https://channels.theinnovationenterprise.com/articles/big-data-for-individuals [Accessed 18 Jan. 2019].

14. Limitations of predictive analytics.

The Limitations of the Data in Predictive Analytics. The data could be incomplete. Missing values, even the lack of a section or a substantial part of the data, could limit its usability. If you're using data from surveys, keep in mind that people don't always provide accurate information.

To determine the limitations of your data, be sure to:

  • Verify all the variables you’ll use in your model.
  • Assess the scope of the data, especially over time, so your model can avoid the seasonality trap.
  • Check for missing values, identify them, and assess their impact on the overall analysis.
  • Watch out for extreme values (outliers) and decide on whether to include them in the analysis.
  • Confirm that the pool of training and test data is large enough.
  • Make sure data type (integers, decimal values, or characters, and so forth) is correct and set the upper and lower bounds of possible values.
  • Pay extra attention to data integration when your data comes from multiple sources.
    • Choose a relevant dataset that is representative of the whole population.
    • Choose the right parameters for your analysis.
    • Any values missing from the data.
    • Any inconsistencies and/or errors existing in the data.
    • Any duplicates or outliers in the data.
    • Any normalization or other transformation of the data.
    • Any derived data needed for the analysis.

      dummies. (2019). The Limitations of the Data in Predictive Analytics - dummies. [online] Available at: https://www.dummies.com/programming/big-data/data-science/the-limitations-of-the-data-in-predictive-analytics/ [Accessed 18 Jan. 2019].

13. Technological requirements of big data

BIG DATA is a term used for a collection of data sets so large and complex that it is difficult to process using traditional applications/tools. It is the data exceeding Terabytes in size. Because of the variety of data that it encompasses, big data always brings a number of challenges relating to its volume and complexity. A recent survey says that 80% of the data created in the world are unstructured. One challenge is how these unstructured data can be structured, before we attempt to understand and capture the most important data. Another challenge is how we can store it. Here are the top technologies used to store and analyse Big Data. We can categorise them into two (storage and Querying/Analysis).
1. Apache Hadoop
Apache Hadoop is a java based free software framework that can effectively store large amount of data in a cluster. This framework runs in parallel on a cluster and has an ability to allow us to process data across all nodes. Hadoop Distributed File System (HDFS) is the storage system of Hadoop which splits big data and distribute across many nodes in a cluster. This also replicates data in a cluster thus providing high availability.
2. Microsoft HDInsight
It is a Big Data solution from Microsoft powered by Apache Hadoop which is available as a service in the cloud. HDInsight uses Windows Azure Blob storage as the default file system. This also provides high availability with low cost.
3. NoSQL
While the traditional SQL can be effectively used to handle large amount of structured data, we need NoSQL (Not Only SQL) to handle unstructured data. NoSQL databases store unstructured data with no particular schema. Each row can have its own set of column values. NoSQL gives better performance in storing massive amount of data. There are many open-source NoSQL DBs available to analyse big Data.
4. Hive
This is a distributed data management for Hadoop. This supports SQL-like query option HiveSQL (HSQL) to access big data. This can be primarily used for Data mining purpose. This runs on top of Hadoop.
5. Sqoop
This is a tool that connects Hadoop with various relational databases to transfer data. This can be effectively used to transfer structured data to Hadoop or Hive.
6. PolyBase
This works on top of SQL Server 2012 Parallel Data Warehouse (PDW) and is used to access data stored in PDW. PDW is a datawarhousing appliance built for processing any volume of relational data and provides an integration with Hadoop allowing us to access non-relational data as well.
7. Big data in EXCEL
As many people are comfortable in doing analysis in EXCEL, a popular tool from Microsoft, you can also connect data stored in Hadoop using EXCEL 2013. Hortonworks, which is primarily working in providing Enterprise Apache Hadoop, provides an option to access big data stored in their Hadoop platform using EXCEL 2013. You can use Power View feature of EXCEL 2013 to easily summarise the data.
Similarly, Microsoft’s HDInsight allows us to connect to Big data stored in Azure cloud using a power query option. 
8. Presto
Facebook has developed and recently open-sourced its Query engine (SQL-on-Hadoop) named Presto which is built to handle petabytes of data. Unlike Hive, Presto does not depend on MapReduce technique and can quickly retrieve data.
Top big data technologies used to store and analyse data - Crayon Blog. [online] Crayon Blog. Available at: https://www.crayondata.com/blog/top-big-data-technologies-used-store-analyse-data/ [Accessed 18 Jan. 2019].

11. Contemporary applications of big data in society

Big data is impact to our life: social media, trading, romance, career, shopping. It is change every year since 1970's and is more and more application available for us.



12. Future applications of big data


Early this century the rise of the relational database, public web access, wireless, and other technologies made the study and management of massive data sets a real and present challenge that needed a name. In July of 2013 the Oxford English Dictionary adopted the phrase, “big data,” but it’s been around since as early as World War II to apply to working with massive amounts of information.
Big data refers to data sets that are too large and complex for traditional data processing and data managementapplications.  Big Data became more popular with the advent of mobile and IoT technology, with people producing more and more data (geolocation, social apps, fitness apps, etc…) and accessing digital data on their devices.
It has also become the catch-all term for gathering, analyzing, and using massive amounts of digital information to improve business operations. As data sets continue to grow, and applications continue to become more real-time, big data and big data processing are more and more moving to the cloud.


 Talend Real-Time Open Source Data Integration Software. (2019). What is Big Data and Where Is It Going? - Talend. [online] Available at: https://www.talend.com/resources/future-big-data/ [Accessed 18 Jan. 2019].

22. Google trends - how is work?

To check how Google trends works and displays big data, I followed Brexit, Donald Trump and inflation since 31 of August 2018 till9 Janu...