Skip to content

This project is an intelligent expense tracking system that revolutionizes financial management through Machine Learning and OCR technology.

License

Notifications You must be signed in to change notification settings

ravindran-dev/SmartSpend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Expense Tracker with ML-Powered Bill Extraction

Project Overview

This project is an intelligent expense tracking system that revolutionizes financial management through Machine Learning and OCR technology. Users can simply upload bill/receipt images, and the system automatically extracts all relevant information including amounts, vendors, dates, and line items, then intelligently categorizes expenses using a trained ML model.

The system provides a comprehensive dashboard with insights, charts, and analytics to help individuals and businesses make data-driven financial decisions.

Key Features

ML-Powered Bill Extraction

  • Advanced OCR Processing – Extract text from any bill/receipt format
  • Smart Data Extraction – Automatically identify amounts, vendors, dates, items
  • Intelligent Categorization – ML model categorizes expenses with high accuracy
  • Multi-format Support – Works with PNG, JPEG, TIFF, and other image formats

Intelligent Analytics

  • Interactive Dashboard – Real-time spending visualizations
  • Predictive Insights – ML-driven spending pattern analysis
  • Budget Tracking – Smart alerts and recommendations
  • Custom Reports – Export detailed reports in PDF/CSV formats

Advanced Technology

  • Real-time Processing – Instant bill analysis and categorization
  • High Accuracy OCR – Optimized image preprocessing for better text extraction
  • Scalable Architecture – Handles multiple bill uploads efficiently
  • Error Handling – Robust validation and fallback mechanisms

Tech Stack

Frontend

  • React.js 18 with modern hooks
  • Tailwind CSS for responsive design
  • Framer Motion for smooth animations
  • Lucide React for beautiful icons
  • Vite for fast development

Backend & ML

  • Flask with CORS support
  • OpenCV for image preprocessing
  • Tesseract OCR for text extraction
  • scikit-learn for ML categorization
  • NumPy & Pandas for data processing

Machine Learning Pipeline

  • Text Processing: TF-IDF vectorization for bill descriptions
  • Feature Engineering: Amount, date, vendor analysis
  • Classification: Logistic Regression with hybrid features
  • Model Persistence: Joblib for model serialization

Project Structure

SmartSpend/
├── frontend/                 # React application
│   ├── src/
│   │   ├── components/      # UI components
│   │   │   ├── Dashboard.jsx
│   │   │   ├── UploadCard.jsx  # ML bill upload
│   │   │   ├── ExpenseTable.jsx
│   │   │   └── ...
│   │   ├── App.jsx
│   │   └── main.jsx
│   ├── package.json
│   └── vite.config.js

├── backend/                  # Flask API server
│   ├── app.py               # Main ML API application
│   ├── test_extraction.py   # Testing utilities
│   └── requirements.txt     # Python dependencies

├── Expense_model/           # ML model training
│   ├── Expense_Categorization_Model.ipynb
│   ├── expense_model.pkl    # Trained ML model
│   └── exp.csv             # Training dataset

├── ML_SETUP.md             # Detailed setup guide
├── setup.bat               # Windows setup script
└── requirements.txt        # Project dependencies

Quick Setup

Automated Setup (Windows)

# Run the setup script
setup.bat

Manual Setup

1️ Install Tesseract OCR

  • Windows: Download from Tesseract GitHub
  • macOS: brew install tesseract
  • Linux: sudo apt-get install tesseract-ocr

2️ Backend Setup

cd backend
pip install -r requirements.txt
python app.py  # Starts on http://localhost:5000

3️ Frontend Setup

cd frontend
npm install
npm start  # Starts on http://localhost:5173

4️ Test the System

cd backend
python test_extraction.py

How It Works

1. Upload Bill Image

  • Drag & drop or click to upload bill/receipt
  • Supports all common image formats
  • Real-time upload progress

2. ML Processing Pipeline

Image → OCR Preprocessing → Text Extraction → 
Data Parsing → ML Categorization → Results Display

3. Intelligent Extraction

  • Vendor Detection: Identifies merchant/store name
  • Amount Recognition: Finds total and line item amounts
  • Date Extraction: Parses transaction dates
  • Item Analysis: Lists individual purchased items
  • Smart Categorization: ML model assigns expense category

4. Review & Save

  • Review extracted information
  • Make manual corrections if needed
  • Add to expense database with one click

ML Model Details

Training Features

  • Text Features: TF-IDF vectors from bill descriptions
  • Numeric Features: Amount, day of week, month
  • Hybrid Pipeline: Combines text and numeric processing

Model Performance

  • Algorithm: Logistic Regression with regularization
  • Feature Processing: StandardScaler + TfidfVectorizer
  • Validation: Cross-validation with 80/20 split
  • Categories: Food, Transportation, Utilities, Shopping, etc.

Continuous Learning

  • Model can be retrained with new data
  • User corrections improve future predictions
  • Regular model updates for better accuracy

API Documentation

Endpoints

  • POST /api/process-bill - Upload and process bill images
  • POST /api/categorize-expense - Categorize individual expenses
  • GET /api/health - System health check

Example Response

{
  "success": true,
  "vendor": "Walmart Supercenter",
  "total_amount": 45.67,
  "dates": ["2025-10-01"],
  "category": "Groceries",
  "items": ["Milk", "Bread", "Eggs"],
  "confidence": 0.89
}

Advanced Features

Image Preprocessing

  • Gaussian blur for noise reduction
  • OTSU thresholding for optimal binarization
  • Morphological operations for text clarity

Error Handling

  • Fallback mechanisms for poor image quality
  • Manual correction interface
  • Confidence scoring for predictions

Performance Optimization

  • Async processing for large images
  • Caching for repeated requests
  • Batch processing capabilities

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Installation & Setup

1️ Clone the Repository

git clone https://github.com/your-username/expense-tracker-ocr-ml.git
cd expense-tracker-ocr-ml

2️ Backend Setup

cd backend
python -m venv venv
source venv/bin/activate   # (Linux/Mac)
venv\Scripts\activate      # (Windows)
pip install -r requirements.txt
python app.py

Backend will start at http://localhost:5000/

3️ Frontend Setup

cd frontend
npm install
npm start

Frontend will start at http://localhost:3000/

Contribution Guidelines

  • Fork this repo

  • Create a new branch (feature-new)

  • Commit changes (git commit -m "Add new feature")

  • Push to branch (git push origin feature-new)

  • Create a Pull Request

License

This project is licensed under the MIT License – free to use and modify.

Team / Contributors

  • Thulasiram K – Frontend & UI (Team Lead)
  • Ravindran S – ML & Backend

About

This project is an intelligent expense tracking system that revolutionizes financial management through Machine Learning and OCR technology.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •