It helps you understand how changes in one variable are associated with changes in another.
The Pandas library is commonly used in Python for data manipulation and analysis. You can calculate the correlation between variables using Pandas' built-in functions.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame({
'Country': ['USA', 'China', 'India', 'Canada', 'UK'],
'Year': [2015, 2016, 2017, 2018, 2019],
'GDP (Trillions USD)': [18.12, 11.20, 2.87, 1.85, 2.83],
'Unemployment Rate (%)': [5.3, 4.0, 3.6, 5.9, 4.0]
})
# Display the dataset
print("Economic Data:")
print(data)
# Calculate the correlation between GDP and Unemployment Rate
correlation = data['GDP (Trillions USD)'].corr(data['Unemployment Rate (%)'])
print(f"Correlation between GDP and Unemployment Rate: {correlation:.2f}")
# Visualize the data
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.scatter(data['Year'], data['GDP (Trillions USD)'])
plt.xlabel('Year')
plt.ylabel('GDP (Trillions USD)')
plt.title('GDP Over Time')
plt.subplot(1, 2, 2)
plt.scatter(data['Year'], data['Unemployment Rate (%)'], color='red')
plt.xlabel('Year')
plt.ylabel('Unemployment Rate (%)')
plt.title('Unemployment Rate Over Time')
plt.tight_layout()
plt.show()
Climate change data analysis
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame({
'Year': [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008],
'Temperature (°C)': [0.7, 0.8, 0.9, 0.9, 1.0, 1.1, 1.2, 1.2, 1.3],
'CO2 (ppm)': [369, 371, 373, 375, 378, 381, 383, 385, 387]
})
# Display the dataset
print("Climate Change Data:")
print(data)
# Calculate the correlation between Temperature and CO2
correlation = data['Temperature (°C)'].corr(data['CO2 (ppm)'])
print(f"Correlation between Temperature and CO2: {correlation:.2f}")
# Visualize the data
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(data['Year'], data['Temperature (°C)'], marker='o')
plt.xlabel('Year')
plt.ylabel('Temperature (°C)')
plt.title('Temperature Over Time')
plt.subplot(1, 2, 2)
plt.plot(data['Year'], data['CO2 (ppm)'], marker='o', color='red')
plt.xlabel('Year')
plt.ylabel('CO2 (ppm)')
plt.title('CO2 Levels Over Time')
plt.tight_layout()
plt.show()
Step 6: Interpret the correlation matrix
The correlation matrix is a square matrix where each cell represents the correlation between two variables. It ranges from -1 to 1, with the following interpretations:
Values close to 1: Strong positive correlation. As one variable increases, the other tends to increase as well.
Values close to -1: Strong negative correlation. As one variable increases, the other tends to decrease.
Values close to 0: Weak or no correlation. There is little to no relationship between the variables.
Step 7: Extract specific correlations
You can extract specific correlation values between variables by indexing the correlation matrix:
# Example: Extract the correlation between 'variable1' and 'variable2'
corr_value = correlation_matrix.loc['variable1', 'variable2']
0 Comments