top of page
Financial Fraud Detection: TinyBERT and AI Data Generation
- Fine-tuned TinyBERT for financial fraud detection on the FineWeb dataset, using knowledge distillation from BERT to achieve a 7.5x size reduction while maintaining competitive performance.
- Employed an AI-enhanced data generation pipeline using an Agno reasoning agent and OpenAI Gpt4.1 mini model to create realistic financial fraud examples across multiple categories as seed samples.
- Implemented a comprehensive evaluation framework with robust metrics and visualizations for comparing teacher and student model performance.
- Developed a production-ready architecture for the entire pipeline, from data preparation to model deployment, accompanied by clear documentation.
- The implementation aims to provide a cost-effective alternative to LLMs for specialized financial fraud detection tasks, particularly in production environments where resource efficiency is critical.
- The code for the project is open-sourced at `https://github.com/Cenrax/fraud-security-experiments`.
Source:
bottom of page