Features Overview โ
SumoData Toolbox provides 10 powerful tools organized into 3 categories, plus a killer feature for comprehensive code quality auditing.
๐ฅ NEW: Data Quality Auditor (Killer Feature) โ
Comprehensive multi-file code analysis for data professionals
The Data Quality Auditor is a game-changing feature that automatically scans your Python and SQL files for issues that could cause production failures, performance problems, or maintainability headaches.
What It Does โ
- ๐ Multi-File Analysis: Select and analyze multiple files simultaneously
- โ Critical Issue Detection: Missing error handling, type safety problems, data validation gaps
- โก Performance Warnings: Inefficient loops, memory issues, slow operations
- ๐ก Best Practice Suggestions: Code style, documentation, maintainability improvements
Why It's Powerful โ
Unlike generic linters, the Data Quality Auditor understands data engineering and ML workflows:
- Detects pandas anti-patterns (inefficient loops, memory issues)
- Identifies SQL performance problems (missing indexes, cartesian products)
- Catches data validation gaps that cause production crashes
- Provides specific, actionable fixes with line numbers
Example Output โ
## ๐ Data Quality Audit Report
### โ ๏ธ Critical Issues (3 found)
- Reading CSV without error handling - Line 13
- Impact: correctness/reliability (will crash on missing file)
- Fix: wrap pd.read_csv in try/except, validate file existence
### โก Performance Warnings (3 found)
- Inefficient Python loop (can be vectorized) - Lines 4-9
- Impact: medium for large lists
- Fix: use list comprehension or numpy vectorization
### ๐ก Best Practice Suggestions (6 found)
- Add type hints and docstrings - Lines 1-2
- Recommendation: annotate parameters/returns
### โ
Summary
- Total issues: 12 (3 Critical, 3 Performance, 6 Best Practice)
- Estimated performance impact: Medium-High
- Recommended priority: HighUse Cases โ
- Pre-Commit Reviews: Catch issues before they reach production
- Legacy Code Cleanup: Identify technical debt in old codebases
- Team Onboarding: Help new members understand quality standards
- Performance Optimization: Find bottlenecks across multiple files
- Production Readiness: Ensure code meets reliability standards
How to Use โ
- Enable "๐ Multi-File Mode" in the sidebar
- Click "๐ Run Quality Audit"
- Select files to analyze (Python, SQL, or both)
- Review comprehensive report with actionable fixes
๐ฏ Philosophy โ
Unlike chatty AI assistants, SumoData Toolbox follows the Swiss Army Knife approach:
- One-Click Actions: Each tool does ONE thing exceptionally well
- Token Efficient: Minimalist prompts that save your API quota
- No Conversation: Direct results without back-and-forth
- Context-Aware: Tools appear based on file type (.py or .sql)
๐ Sumo Pipes (Data Engineering) โ
Tools for building and optimizing data pipelines.
| Tool | Purpose | Input | Output |
|---|---|---|---|
| SQL Optimizer | Improve query performance | SQL query | Optimized SQL + indexes |
| JSON to DDL | Generate database schema | JSON structure | CREATE TABLE statements |
| Cron Generator | Create schedule expressions | Plain English | Cron expression |
Learn more about Sumo Pipes โ
๐ Sumo Lens (Data Analysis) โ
Tools for understanding and improving data analysis code.
| Tool | Purpose | Input | Output |
|---|---|---|---|
| Explain Regex | Decode regex patterns | Regex pattern | Plain English explanation |
| Pandas Cleaner | Improve data cleaning | Pandas code | Optimized cleaning logic |
| SQL Explainer | Understand queries | SQL query | Step-by-step explanation |
Learn more about Sumo Lens โ
๐ค Sumo Core (Data Science/ML) โ
Tools for ML development and code quality.
| Tool | Purpose | Input | Output |
|---|---|---|---|
| Generate Docstrings | Add documentation | Python function | Google-style docstring |
| Type Hinting | Add type safety | Python code | Code with type hints |
| ML Boilerplate | Training loop template | Description | Complete training code |
Learn more about Sumo Core โ
๐ Quick Comparison โ
By Speed โ
- ๐จ SQL Optimizer (< 2s)
- โก Cron Generator (< 2s)
- ๐ Generate Docstrings (< 3s)
- ๐ Explain Regex (< 3s)
- ๐งน Pandas Cleaner (< 5s)
By Complexity โ
- ๐ข Simple: Cron Generator, Explain Regex
- ๐ก Medium: SQL Optimizer, Docstrings, Type Hints
- ๐ด Complex: ML Boilerplate, Pandas Cleaner, JSON to DDL
By Use Frequency โ
- ๐ฅ SQL Optimizer
- ๐ฅ Generate Docstrings
- ๐ฅ Pandas Cleaner
- โญ Type Hinting
- โญ SQL Explainer
๐ก Usage Patterns โ
For Daily Work โ
- SQL Optimizer before committing queries
- Generate Docstrings during code review
- Explain Regex when reviewing validation logic
For New Projects โ
- JSON to DDL for schema design
- ML Boilerplate for training setup
- Type Hinting for better code quality
For Learning โ
- SQL Explainer to understand complex queries
- Explain Regex to learn pattern matching
- Pandas Cleaner to learn best practices
๐จ How It Works โ
graph LR
A[Select Code] --> B[Choose Tool]
B --> C[AI Processing]
C --> D[Results in New Tab]
D --> E[Copy or Insert]- Select code in your editor
- Choose tool from sidebar or context menu
- Wait for AI processing (2-10 seconds)
- Review results in new editor tab
- Apply by copying or inserting at cursor
๐ง Customization โ
All tools respect your configuration:
- Model Selection: Choose speed vs quality
- Max Code Length: Limit input size
- Timeout: Adjust for slow connections
- Context Menu: Enable/disable right-click access
๐ Feature Matrix โ
| Feature | Pipes | Lens | Core |
|---|---|---|---|
| SQL Support | โ | โ | โ |
| Python Support | โ | โ | โ |
| Code Generation | โ | โ | โ |
| Code Explanation | โ | โ | โ |
| Optimization | โ | โ | โ |
| Documentation | โ | โ | โ |