This project is an intelligent expense tracking system that revolutionizes financial management through Machine Learning and OCR technology. Users can simply upload bill/receipt images, and the system automatically extracts all relevant information including amounts, vendors, dates, and line items, then intelligently categorizes expenses using a trained ML model.
The system provides a comprehensive dashboard with insights, charts, and analytics to help individuals and businesses make data-driven financial decisions.
- Advanced OCR Processing – Extract text from any bill/receipt format
- Smart Data Extraction – Automatically identify amounts, vendors, dates, items
- Intelligent Categorization – ML model categorizes expenses with high accuracy
- Multi-format Support – Works with PNG, JPEG, TIFF, and other image formats
- Interactive Dashboard – Real-time spending visualizations
- Predictive Insights – ML-driven spending pattern analysis
- Budget Tracking – Smart alerts and recommendations
- Custom Reports – Export detailed reports in PDF/CSV formats
- Real-time Processing – Instant bill analysis and categorization
- High Accuracy OCR – Optimized image preprocessing for better text extraction
- Scalable Architecture – Handles multiple bill uploads efficiently
- Error Handling – Robust validation and fallback mechanisms
- React.js 18 with modern hooks
- Tailwind CSS for responsive design
- Framer Motion for smooth animations
- Lucide React for beautiful icons
- Vite for fast development
- Flask with CORS support
- OpenCV for image preprocessing
- Tesseract OCR for text extraction
- scikit-learn for ML categorization
- NumPy & Pandas for data processing
- Text Processing: TF-IDF vectorization for bill descriptions
- Feature Engineering: Amount, date, vendor analysis
- Classification: Logistic Regression with hybrid features
- Model Persistence: Joblib for model serialization
SmartSpend/
├── frontend/ # React application
│ ├── src/
│ │ ├── components/ # UI components
│ │ │ ├── Dashboard.jsx
│ │ │ ├── UploadCard.jsx # ML bill upload
│ │ │ ├── ExpenseTable.jsx
│ │ │ └── ...
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ └── vite.config.js
│
├── backend/ # Flask API server
│ ├── app.py # Main ML API application
│ ├── test_extraction.py # Testing utilities
│ └── requirements.txt # Python dependencies
│
├── Expense_model/ # ML model training
│ ├── Expense_Categorization_Model.ipynb
│ ├── expense_model.pkl # Trained ML model
│ └── exp.csv # Training dataset
│
├── ML_SETUP.md # Detailed setup guide
├── setup.bat # Windows setup script
└── requirements.txt # Project dependencies# Run the setup script
setup.bat- Windows: Download from Tesseract GitHub
- macOS:
brew install tesseract - Linux:
sudo apt-get install tesseract-ocr
cd backend
pip install -r requirements.txt
python app.py # Starts on http://localhost:5000cd frontend
npm install
npm start # Starts on http://localhost:5173cd backend
python test_extraction.py- Drag & drop or click to upload bill/receipt
- Supports all common image formats
- Real-time upload progress
Image → OCR Preprocessing → Text Extraction →
Data Parsing → ML Categorization → Results Display
- Vendor Detection: Identifies merchant/store name
- Amount Recognition: Finds total and line item amounts
- Date Extraction: Parses transaction dates
- Item Analysis: Lists individual purchased items
- Smart Categorization: ML model assigns expense category
- Review extracted information
- Make manual corrections if needed
- Add to expense database with one click
- Text Features: TF-IDF vectors from bill descriptions
- Numeric Features: Amount, day of week, month
- Hybrid Pipeline: Combines text and numeric processing
- Algorithm: Logistic Regression with regularization
- Feature Processing: StandardScaler + TfidfVectorizer
- Validation: Cross-validation with 80/20 split
- Categories: Food, Transportation, Utilities, Shopping, etc.
- Model can be retrained with new data
- User corrections improve future predictions
- Regular model updates for better accuracy
POST /api/process-bill- Upload and process bill imagesPOST /api/categorize-expense- Categorize individual expensesGET /api/health- System health check
{
"success": true,
"vendor": "Walmart Supercenter",
"total_amount": 45.67,
"dates": ["2025-10-01"],
"category": "Groceries",
"items": ["Milk", "Bread", "Eggs"],
"confidence": 0.89
}- Gaussian blur for noise reduction
- OTSU thresholding for optimal binarization
- Morphological operations for text clarity
- Fallback mechanisms for poor image quality
- Manual correction interface
- Confidence scoring for predictions
- Async processing for large images
- Caching for repeated requests
- Batch processing capabilities
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
git clone https://github.com/your-username/expense-tracker-ocr-ml.git
cd expense-tracker-ocr-mlcd backend
python -m venv venv
source venv/bin/activate # (Linux/Mac)
venv\Scripts\activate # (Windows)
pip install -r requirements.txt
python app.pyBackend will start at http://localhost:5000/
cd frontend
npm install
npm startFrontend will start at http://localhost:3000/
-
Fork this repo
-
Create a new branch (feature-new)
-
Commit changes (git commit -m "Add new feature")
-
Push to branch (git push origin feature-new)
-
Create a Pull Request
This project is licensed under the MIT License – free to use and modify.
- Thulasiram K – Frontend & UI (Team Lead)
- Ravindran S – ML & Backend