top of page

Financial Fraud Detection: TinyBERT and AI Data Generation

  • Fine-tuned TinyBERT for financial fraud detection on the FineWeb dataset, using knowledge distillation from BERT to achieve a 7.5x size reduction while maintaining competitive performance.
  • Employed an AI-enhanced data generation pipeline using an Agno reasoning agent and OpenAI Gpt4.1 mini model to create realistic financial fraud examples across multiple categories as seed samples.
  • Implemented a comprehensive evaluation framework with robust metrics and visualizations for comparing teacher and student model performance.
  • Developed a production-ready architecture for the entire pipeline, from data preparation to model deployment, accompanied by clear documentation.
  • The implementation aims to provide a cost-effective alternative to LLMs for specialized financial fraud detection tasks, particularly in production environments where resource efficiency is critical.
  • The code for the project is open-sourced at `https://github.com/Cenrax/fraud-security-experiments`.
Source:
bottom of page