linear regression in python

linear regression in python

Linear regression stands as a foundational pillar in the realm of machine learning, a beacon guiding data scientists and analysts through the intricate landscapes of predictive modeling

single linear regression




Single linear regression is a statistical method used to model the relationship between a single independent variable and a dependent variable. It assumes that the relationship between the variables can be approximated by a linear equation, which is represented as:

y=mx+b

where:y is the dependent variable (the one you're trying to predict),

x is the independent variable (the input feature),

m is the slope of the line (the change in y with respect to a one-unit change in x),

b is the y-intercept (the value of y when x is zero).



import pandas as pd

from sklearn import linear_model

import warnings

from sklearn.metrics import mean_squared_error


data = pd.DataFrame({"age": [25, 30, 35, 40, 45, 50, 65],

                     "premium": [18000, 32000, 40000, 47000, 55000, 60000, 67000]})


x = data[["age"]]

y = data["premium"]


new_x = pd.DataFrame({"age": [34, 54, 65, 76]})


# Suppress scikit-learn warnings

warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")


# Disable input validation check to suppress the warning

rg = linear_model.LinearRegression()

rg.fit(x, y)


# Make predictions for the test set

y_pred = rg.predict(new_x)


# Print actual vs. predicted values

results = pd.DataFrame({'Predicted': y_pred})

print(results)


multiple linear regression


Multiple linear regression is a statistical method used to model the relationship between multiple independent variables and a single dependent variable.

Let's consider a real-life example of predicting house prices. The dependent variable (y) is the house price, and the independent variables (1,2,3,…x1​,x2​,x3​,…) could be features such as:


1​: Square footage of the house

2​: Number of bedrooms

3​: Distance from the city center

4​: Average income in the neighborhood



 

import pandas as pd

from sklearn.linear_model import LinearRegression


# Sample data

data = pd.DataFrame({

    "SquareFootage": [1500, 2000, 2500, 1800, 2100],

    "Bedrooms": [3, 4, 3, 2, 4],

    "DistanceFromCenter": [10, 5, 12, 8, 15],

    "IncomeInNeighborhood": [50000, 60000, 55000, 45000, 70000],

    "HousePrice": [200000, 250000, 300000, 180000, 280000]

})


# Independent variables

X = data[["SquareFootage", "Bedrooms", "DistanceFromCenter", "IncomeInNeighborhood"]]


# Dependent variable

y = data["HousePrice"]


# Create a linear regression model

model = LinearRegression()


# Fit the model

model.fit(X, y)


# Make predictions

new_data = pd.DataFrame({

    "SquareFootage": [2200],

    "Bedrooms": [3],

    "DistanceFromCenter": [8],

    "IncomeInNeighborhood": [65000]

})


predicted_price = model.predict(new_data)

print("Predicted House Price:", predicted_price)


0 Comments