Learn.

What is qualitative data?

Qualitative data is data (information about a population or sample of a population) which is not numerical. This could be car colour.

Initial data analysis (IDA) is your first step in data analysis. It involves looking through your data, this may be to generate more research questions, to determine whether your data is sufficient to answer the intended research question or determine your data’s main qualities such as whether it’s univariant, bivariant or multivariant.

IDA involves data cleaning – this is when you remove any potential errors in the data since this can have a huge effect down the line. Following this data screening is needed to determine whether the data is usable, reliable and valid for your intended research question. The final step in IDA is data reporting for statistical analysis – ideally this data is error free.

Within our three steps of IDA we usually perform a background check (looking at the history of the data), analysis of the data structure (determining what type of information has been collected), data wrangling and a data summary.

A variable is a subgroup in data. For example a data set about different cars may have a variable of car color, top speed, etc.

The number of variables in a data set = the number of dimensions of the data

These can be divided into types:

Representing data:

Bar plot is used to summarize qualitative data

Double bar plot can be used to split a bar plot into two discrete variables.

Graphically representing big data can be overwhelming. There are four V’s for big data

- Volume – the size and scale of data – 6 out of 7 billion people have cell phones.
- Velocity – how fast the data is generated – 1 TB of data is produced every trading session for the New York stock exchange
- Variety – the array of data – 30 billion pieces of content on Facebook is shared each month
- Veracity – uncertainty in the data – 1 in 3 leaders don’t trust the information they use to make decisions.

Memorize.

Univariant = 1

Bivariant = 2

Multivariant = 3

Master.

Question 1.

Draw out the divisions of data.

Question 2.

Define what the dimensions of a data set is.

Question 3.

What are the four v’s of big data

Question 4.

What is a bar plot and double bar plot useful for.

hi these notes look great, ive just found them and haven’t had a chance to actually read though. replying to ask: how do I get a downloaded copy of these?

Glad to hear that! We are constantly improving them so it’s best to have a live copy.