# What are Statistics?

Statistics is a field that provides tools to analyze data sets, allowing you to describe a set in detail (descriptive statistics) or make inferences about a larger population (inferential statistics).

## 🤔 Understanding statistics

Statistics is a set of tools that researchers use to gather, examine, and draw conclusions from data. There are different methods of analyzing data, which are typically grouped into descriptive and inferential statistics. Descriptive statistics helps you learn about the features of an entire data set, such as average and spread, and examine how data points relate to each other. Inferential statistics allows you to make generalizations about a larger population through a smaller sample. Statistical methods are useful to analyze, evaluate, and summarize large volumes of data and also have several applications in financial analysis and investing. For example, the standard deviation, R-squared, and the Sharpe ratio are statistical measures that may help you evaluate the performance of individual stocks.

Let’s say you wanted to figure out the average return of an investment portfolio with a mix of assets. By using a weighted average statistic, you can take into account how much you’re investing in each asset type within a portfolio.

Let’s assume that you have the following portfolio:

| Asset Type | Percentage invested in the asset type | Average return | | ---------- | ---------- | ---------- | | Equity index fund | 70% | 9% | | Bond index fund | 20% | 3% | | Money market fund | 5% | 1% | | Real estate fund | 5% | 4% |

To calculate the average return of the investment portfolio as a whole, you need to account for the fact that you have allocated more money to some asset types. The weighted average return statistic helps you to achieve this goal.

The formula to calculate the weighted average return is:

R represents the return for a particular asset class W represents the percentage (or weight) of that specific asset in the investment portfolio

Using this calculation, you can see that the weighted average return of the investment portfolio is 7.15%.

## Takeaway

Statistics is like looking at bacteria through the lens on a microscope…

With a microscope, you can analyze structures you couldn’t see with the naked eye, assess whether or not there are any hidden dangers, and gain a competitive advantage over the bacteria lurking inside the slide. Similarly, statistics allows you to see patterns you otherwise might not notice, which can often help you tackle problems with your newfound knowledge.

New customers need to sign up, get approved, and link their bank account. The cash value of the stock rewards may not be withdrawn for 30 days after the reward is claimed. Stock rewards not claimed within 60 days may expire. See full terms and conditions at rbnhd.co/freestock. Securities trading is offered through Robinhood Financial LLC.

## What is statistics?

Statistics is the science of studying and learning from data. It provides a framework to tackle problems in a systematic way.

For example, a t-test is a statistical method that compares the averages of two groups to check if differences or similarities are real or just random chance.

The field of statistics lumps statistical methods into two main categories: descriptive statistics and inferential statistics.

Descriptive statistics quantitatively describes the features of a data set and provides insights into how each data point relates to others within the same set.

Meanwhile, inferential statistics analyzes a sample from a larger population and uses statistical methods to make inferences about the broader population.

The critical difference between descriptive and inferential statistics is that descriptive statistics reviews the properties of a single data set, and inferential statistics uses a small data set to learn more about a larger one.

### Example of descriptive statistics

Descriptive statistics provides you with more insights into the characteristics of a group. For example, suppose a fast food chain with several locations across a state has to abide by storage temperature guidelines set by the state. Management can keep track of storage temperatures in all locations and use descriptive statistics to determine how close the company is in line with those guidelines. The average of all storage temperatures would reveal the typical temperature at a location, and the standard deviation would signal how much temperatures from across the fast food chain fluctuate as a whole from the average or another benchmark (e.g., the state’s lowest or highest allowed food storage temperature).

### Example of inferential statistics

Inferential statistics allows you to study a large group without having to take a look at every single item or person within the group. For example, a company that makes pasta sauce might want to know the taste preferences of every single American so the company can make more of the most popular sauce.

However, polling more than 330 million Americans isn’t feasible or economically viable. So the company could instead use statistical sampling methods to build a more manageable sample of 500 individuals who are representative of the entire U.S. population. It could then apply a range of statistical studies. For example, it could try testing a null hypothesis, which is the idea that there is no significant relationship between two variables (e.g., age and a preference for “chunky” pasta sauce).

## What are the types of statistics?

The two main types of statistics are descriptive statistics and inferential statistics. Within each broad type of statistics, there are additional categories as well.

### Types of descriptive statistics

Measures of central tendency

One of the most useful features of a data set is its typical, or average, value. Measures of central tendency focus on this feature in different ways. Let’s review some of them.

- Mean: Also known as the average, the mean is the central value in a data set. You calculate the average by dividing the sum of all values by the total number of values. For example, the mean of the set {3, 5, 6, 8, 9} is (3 + 5 + 6 + 8 + 9)/5 = 6.2.
- Median: The value that’s in the middle of the data set when you arrange them in order. For example, the median of the set {3, 5, 6, 8, 9} is 6. When there are two middle numbers, you take the mean of those two. For example, the median of the set {3, 5, 6, 8} is (5 + 6)/2 = 5.5.
- Mode: The value that appears the most in a data set, so it’s the value that’s most likely to be sampled.

Measures of spread

Measures of spread describe how the data points are distributed and relate to each other within the set. Let’s review some of these measures.

- Range: The difference between the highest and lowest values in a data set.
- Variance: A number that indicates the extent of deviation of the data points from the mean. It’s calculated by squaring the difference from the mean then dividing the sum by the number of data points.
- Standard deviation: A number that measures how the data points spread out from the mean by looking at the square root of the variance. That’s useful because, unlike in variance, standard deviation is expressed in the same units as the data.

### Types of inferential statistics

Confidence interval

A confidence interval provides an estimate of how certain you are that a data set contains the actual value of a feature of the population, such as the mean. For example, when the U.S. Census Bureau takes samples to make inferences about the entire population, it uses a 90% confidence interval for a specific estimate within a single survey year.

Hypothesis testing

Data analysts use hypothesis testing as statistical tests to check the validity of an idea. For example, a company could be interested in finding out whether a customer would still buy a product if its price increased from $4 to $4.50. To perform hypothesis testing, the company would calculate a p-value (probability value) and evaluate it against a benchmark (confidence interval). Depending on the relationship between the p-value and the benchmark, the company will be able to make a statistically-based decision on rejecting or accepting the hypothesis.

Inferential statistics provides several techniques (e.g., regression analysis, correlation analysis, structural equation modeling) and tests (e.g., t-test, chi-square test) to examine the relationships between data points and make inferences about a larger set.

## What is statistical analysis?

Statistical analysis is the use of statistical methods to draw conclusions from data objectively. By providing structure to every step of a research project, statistical analysis is a useful framework for researchers across many disciplines in both the private and public sectors.

Statistical analysis provides methodologies to collect, examine, evaluate, and draw conclusions from data. Over time, statisticians have developed different methods that are better suited to certain situations. The availability of many statistical methods allows analysts to take a look at an issue using different approaches.

## Why are statistics important?

Statistics are important because they provide a standardized and objective toolkit that allows individuals to analyze problems, develop numerical evidence, use validated evidence to draw conclusions, and improve decision making.

One of the many advantages of using statistics is the produced evidence that can be fact-checked. Statisticians use checks and balances to prevent certain pitfalls, such as selecting an inappropriate sample size, creating a biased sample, choosing a sample that is not representative of a population, making an “apples-to-oranges” comparison, or relying on causality.

Another advantage of using statistics is that the field is constantly evolving and being reassessed by practitioners and professional associations, such as the American Statistical Association.

## What are statistics used for?

Researchers use statistics in a wide variety of fields, including medicine, psychology, social sciences, and business.

### Example of statistics in medicine

Let’s say a pharmaceutical developer wants to determine what drug formula is more effective in reducing heartburn. The developer could commission a study in which one group of subjects receives one version of the formula, and another comparable group of subjects receives another version of the formula.

By using statistical methods, the researchers would follow standardized sampling procedures, ensure that the two groups of subjects are comparable, perform tests to compare effects of the formula, and determine which formula reduces heartburn more efficiently.

### Example of statistics in business

An investor can use statistics to perform research and analysis of the stock market and determine how to improve the performance of an investment portfolio. For example, an investor could perform hypothesis testing of a mutual fund’s claim that it can consistently deliver a 9% annual return.

In this case, the hypothesis is that the yearly return of the mutual fund is 9%. Assuming that the mutual fund has been in operations for 25 years, an investor can take a sample of five years, calculate the mean of that sample and the mean of the population, and run statistical tests to verify the hypothesis.

New customers need to sign up, get approved, and link their bank account. The cash value of the stock rewards may not be withdrawn for 30 days after the reward is claimed. Stock rewards not claimed within 60 days may expire. See full terms and conditions at rbnhd.co/freestock. Securities trading is offered through Robinhood Financial LLC.