Math Explorer ā Functions, Derivatives & Probability
Interactive learning with hands-on exercises, visualizations, and quizzes
Complete tutorials and reference notes for data processing, machine learning algorithms, interactive math tools, OpenRefine, Flask development, and advanced ML concepts.
Interactive learning with hands-on exercises, visualizations, and quizzes
Master data cleaning, transformation, and quality improvement with OpenRefine
Download OpenRefine 3.6.2 from: https://github.com/OpenRefine/OpenRefine/releases/tag/3.6.2
Choose your OS version (Windows, Mac, Linux). Version 3.6.2 includes embedded Java.
1. Extract the downloaded file
2. Run the executable (openrefine.exe on Windows)
3. Open browser and go to: http://127.0.0.1:3333
1. Click "Create Project"
2. Choose "This Computer" ā Browse for your CSV file
3. Click "Next" to preview data
4. Verify column headers and data types
5. Click "Create Project" (top right)
⢠Always preview your data before creating the project
⢠Check if headers are properly detected
⢠Verify column separation (comma, tab, semicolon)
⢠Note any encoding issues with special characters
1. Click on column dropdown ā Facet ā Text facet
2. In the facet panel (left side), you'll see all unique values
3. Click on (blank) to select only blank rows
4. Click All ā Edit rows ā Remove all matching rows
5. Close the facet when done
| Function | Purpose | Example |
|---|---|---|
contains(value, "text") |
Check if value contains text | contains(value, "Airport") |
value.match(/pattern/) |
Extract text matching regex | value.match(/([A-Za-z\s]+Airport)/) |
split(value, delimiter) |
Split text into array | split(value, ",") |
trim(value) |
Remove leading/trailing spaces | trim(value) |
if(condition, true, false) |
Conditional logic | if(contains(value, "Airport"), "Yes", "No") |
if(contains(value, "Airport"), value.match(/([A-Za-z\s]+Airport)/)[0].trim(), "" )
⢠Always backup original data before major operations
⢠Test GREL expressions on small datasets first
⢠Check for case sensitivity in filters and matching
⢠Verify row counts after filtering operations
Build data-driven web applications with Python Flask
# Install Flask pip install Flask # Optional: Create virtual environment first python -m venv flask_env # Windows: flask_env\Scripts\activate # Mac/Linux: source flask_env/bin/activate pip install Flask
project_folder/
āāā run.py # Main application runner
āāā flaskapp/
ā āāā __init__.py # Flask app initialization
ā āāā routes.py # URL routes and view functions
ā āāā templates/
ā āāā index.html # HTML templates
āāā data/
āāā dataset.csv # Data files
""" run.py - Run the Flask app """
from flaskapp import app
if __name__ == '__main__':
app.run(host='127.0.0.1', port=3001, debug=True)
from flask import Flask app = Flask(__name__) from flaskapp import routes
⢠Use debug=True for development (auto-reload on changes)
⢠Organize code into modules (routes, models, utilities)
⢠Use templates for all HTML (avoid HTML in Python code)
⢠Handle errors gracefully with try-catch blocks
Master pattern matching for text processing and data extraction
| Pattern | Description | Example | Matches |
|---|---|---|---|
. |
Any single character | a.c |
abc, axc, a1c |
* |
Zero or more of preceding | ab*c |
ac, abc, abbc |
+ |
One or more of preceding | ab+c |
abc, abbc (not ac) |
? |
Zero or one of preceding | ab?c |
ac, abc |
^ |
Start of string | ^Hello |
Hello world |
$ |
End of string | world$ |
Hello world |
import re
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
text = "Contact us at support@example.com or admin@site.org"
emails = re.findall(pattern, text)
print(emails) # ['support@example.com', 'admin@site.org']
Essential Python techniques for data manipulation and analysis
import pandas as pd
import numpy as np
# Load CSV data
df = pd.read_csv('data.csv')
# Basic data exploration
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"Data types:\n{df.dtypes}")
print(f"Missing values:\n{df.isnull().sum()}")
# Display first few rows
print(df.head())
# Remove duplicate rows
df_clean = df.drop_duplicates()
# Remove rows with missing values
df_clean = df.dropna(subset=['important_column'])
# Fill missing values
df['column'] = df['column'].fillna('default_value')
df['numeric'] = df['numeric'].fillna(df['numeric'].mean())
Understanding parallel vs sequential ensemble learning approaches
Classification and Regression Trees for interpretable machine learning
⢠max_depth: Start with 5-10, tune based on validation
⢠min_samples_split: 2-20, higher for noisy data
⢠min_samples_leaf: 1-10, higher prevents overfitting
Sparse signal recovery from underdetermined systems
Adaptive boosting for sequential weak learner combination
Ensemble of decision trees with bootstrap aggregating
⢠max_features: āp for classification, p/3 for regression
⢠n_estimators: Start with 100, increase until OOB error stabilizes
⢠Feature importance: Use for feature selection and interpretation
Novelty detection and anomaly identification
Non-parametric regression with local model fitting
⢠Bandwidth selection: Use cross-validation for optimal h
⢠Numerical stability: Add small regularization to XįµWX
⢠Computational efficiency: Pre-compute distances when possible
Understanding the fundamental tradeoff in machine learning
⢠Cross-validation: Use for model selection and hyperparameter tuning
⢠Regularization: Add penalty terms to control model complexity
⢠Ensemble methods: Combine multiple models to reduce variance
⢠Data size: More data generally reduces variance
Robust model evaluation and selection methods
from sklearn.model_selection import cross_val_score
# 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")
Solutions to frequent problems in data processing and machine learning
Problem: High training accuracy, poor test performance
Solutions: Reduce model complexity, add regularization, use cross-validation, get more data
Problem: Pandas warning about chained assignments
Solution: Use .loc for assignment: df.loc[df['A'] > 5, 'B'] = 'value'
Problem: Flask can't locate HTML templates
Solutions: Ensure templates are in templates/ folder, check file name spelling
Structured areas for expanding knowledge and skills
Neural networks, TensorFlow/PyTorch, CNNs, RNNs, transformers
Matplotlib, seaborn, plotly, dashboard creation, interactive charts
AWS, Azure, Google Cloud, serverless computing