In machine learning, a decision tree is a predictive model that maps observations about an item to conclusions about its target value. It's called a decision tree because it starts with a single decision point and branches out into a number of outcomes.
Real-World Example:
Let's apply this to a scenario of predicting whether a customer will subscribe to a new service based on their demographic and behavioral data.
- Starting Point:
- Features: Age, Income, Subscription History
- Decision Nodes:
- First node might be based on income: "Is income greater than $50,000?"
- Branches:
- If income is greater than $50,000, next node might be age: "Is age less than 40?"
- If income is $50,000 or less, next node might be subscription history: "Has the customer subscribed to a similar service before?"
- Further Decisions:
- For those with high income and age less than 40, further decisions might include past subscription behavior.
- For those with lower income, decisions might involve different factors like age or education level.
- Leaf Nodes:
- Eventually, each path leads to a leaf node that predicts whether the customer is likely to subscribe to the new service based on their unique combination of features.
code:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
# Hypothetical data (income in thousands, age in years, subscription history as binary)
data = {
'Income': [55, 35, 75, 45, 60, 30, 80, 25],
'Age': [25, 30, 40, 35, 27, 32, 45, 28],
'SubscriptionHistory': [1, 0, 1, 0, 1, 0, 1, 0], # 1 for subscribed before, 0 for not
'WillSubscribe': [1, 0, 1, 0, 1, 0, 1, 0] # Target variable: 1 for yes, 0 for no
}
# Create a DataFrame
df = pd.DataFrame(data)
# Separate features (X) and target variable (y)
X = df[['Income', 'Age', 'SubscriptionHistory']]
y = df['WillSubscribe']
# Split data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize Decision Tree classifier
clf = DecisionTreeClassifier(random_state=42)
# Fit the classifier to the training data
clf.fit(X_train, y_train)
# Visualize the decision tree rules
tree_rules = export_text(clf, feature_names=list(X.columns))
print("Decision Tree Rules:")
print(tree_rules)
# Predict on test data
y_pred = clf.predict(X_test)
# Print predicted values and actual values
print("\nPredicted Values:", y_pred)
print("Actual Values:", y_test.values)
example: skin health prediction
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Define the data manually for simplicity
data = {
'itching': [1, 1, 1, 0, 0],
'red_spots': [1, 0, 0, 1, 0],
'rash': [0, 1, 0, 0, 0],
'diagnosis': ['Allergic Reaction', 'Contact Dermatitis', 'Dry Skin', 'Insect Bite', 'Normal Skin Condition']
}
# Convert data into a pandas DataFrame
import pandas as pd
df = pd.DataFrame(data)
# Features and target
X = df[['itching', 'red_spots', 'rash']] # Features
y = df['diagnosis'] # Target variable
# Initialize the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
# Train the classifier
clf.fit(X, y)
# Function to predict diagnosis based on user input
def predict_diagnosis(itching, red_spots, rash):
symptoms = np.array([[itching, red_spots, rash]])
prediction = clf.predict(symptoms)
return prediction[0]
# Main function to interact with the user
def main():
print("Welcome to the Skin Condition Diagnosis System!")
print("Please enter your symptoms:")
itching = int(input("Do you have itching? (0 for no, 1 for yes): "))
red_spots = int(input("Do you have red spots? (0 for no, 1 for yes): "))
rash = int(input("Do you have a rash? (0 for no, 1 for yes): "))
diagnosis = predict_diagnosis(itching, red_spots, rash)
print(f"Based on your symptoms:")
print(f"Itching: {'Yes' if itching == 1 else 'No'}")
print(f"Red Spots: {'Yes' if red_spots == 1 else 'No'}")
print(f"Rash: {'Yes' if rash == 1 else 'No'}")
print(f"The predicted diagnosis is: {diagnosis}")
if __name__ == "__main__":
main()
0 Comments