Smart Table Selection - Quick Start Guide
Get started with intelligent, usage-based table selection in 5 minutes.
What is Smart Table Selection?
Instead of manually specifying every table to monitor, Baselinr can automatically recommend tables based on:
- Query frequency: How often tables are accessed
- Usage recency: When tables were last queried
- Data freshness: When tables were last updated
- Table characteristics: Size, type, and metadata
Quick Start
Step 1: Basic Configuration
Create or update your config.yaml:
source:
type: snowflake # or postgres, bigquery, redshift, etc.
account: myaccount
database: prod
warehouse: compute_wh
username: baselinr_user
password: ${SNOWFLAKE_PASSWORD}
storage:
connection:
type: postgres
host: localhost
database: baselinr_metadata
smart_selection:
enabled: true
mode: "recommend" # Start with recommend mode
profiling:
tables: [] # Can be empty - smart selection will populate
Step 2: Generate Recommendations
Run the recommend command:
baselinr recommend --config config.yaml --explain
You'll see output like:
📊 Generating smart table recommendations...
Database: prod (snowflake)
Lookback period: 30 days
✅ Analysis complete!
Tables analyzed: 156
Recommended: 23
Excluded: 133
Confidence distribution:
high (0.8+): 15 tables
medium (0.6-0.8): 6 tables
low (<0.6): 2 tables
Top recommendations:
1. analytics.user_events (confidence: 0.95, score: 92.3)
• Heavily used: 1,247 queries (41.6 per day)
• Last queried 2 hours ago
• Updated daily
• Medium table with 5,000,000 rows
Suggested checks: freshness, row_count, completeness, numeric_distribution
2. analytics.revenue_summary (confidence: 0.88, score: 87.1)
• Actively queried: 423 queries (14.1 per day)
• Last queried 1 day ago
• Large table with 12,500,000 rows
Suggested checks: freshness, numeric_distribution
... and 21 more (see recommendations.yaml)
💾 Saved recommendations to: recommendations.yaml
✨ Next steps:
1. Review recommendations in: recommendations.yaml
2. Apply with: baselinr recommend --config config.yaml --apply
3. Or manually add tables to your config file
Step 3: Review Recommendations
Check recommendations.yaml:
metadata:
generated_at: "2025-01-15T10:30:00"
lookback_days: 30
database_type: "snowflake"
summary:
total_tables_analyzed: 156
total_recommended: 23
total_excluded: 133
recommended_tables:
- schema: analytics
table: user_events
confidence: 0.95
reasons:
- "Heavily used: 1,247 queries (41.6 per day)"
- "Last queried 2 hours ago"
- "Updated daily"
suggested_checks:
- freshness
- row_count
- completeness
# ... more recommendations ...
Step 4: Apply Recommendations
When you're ready:
baselinr recommend --config config.yaml --apply
This will:
- Show a summary of changes
- Ask for confirmation
- Backup your config to
config.yaml.backup - Add recommended tables to
profiling.tables
Step 5: Run Profiling
baselinr profile --config config.yaml
Baselinr will now profile all recommended tables!
Customization
Adjust Criteria
Make recommendations more or less strict:
smart_selection:
criteria:
min_query_count: 50 # More strict (default: 10)
min_queries_per_day: 5 # More strict (default: 1)
lookback_days: 60 # Longer period (default: 30)
exclude_patterns:
- "temp_*"
- "*_backup"
- "dev_*"
Focus on Specific Schema
baselinr recommend --config config.yaml --schema analytics
Adjust Scoring Weights
Prioritize different factors:
smart_selection:
criteria:
weights:
query_frequency: 0.6 # Emphasize usage (default: 0.4)
query_recency: 0.2 # Default: 0.25
write_activity: 0.1 # Default: 0.2
table_size: 0.1 # Default: 0.15
Auto Mode (Advanced)
Once comfortable with recommendations, enable auto mode:
smart_selection:
mode: "auto"
auto_apply:
confidence_threshold: 0.85 # Only auto-apply high confidence
max_tables: 50 # Safety limit
Then just run:
baselinr profile --config config.yaml
Baselinr will automatically select and profile high-confidence tables!
Combining with Explicit Configs
Smart selection works alongside explicit table configurations:
profiling:
tables:
# Critical tables (always included)
- schema: core
table: customers
- schema: analytics
table: revenue_daily
smart_selection:
enabled: true
auto_apply:
skip_existing: true # Won't duplicate above tables
Database-Specific Tips
Snowflake
Best experience with full metadata:
source:
type: snowflake
role: ACCOUNTADMIN # Or role with ACCOUNT_USAGE access
Without ACCOUNT_USAGE access, it falls back to INFORMATION_SCHEMA (basic metadata only).
PostgreSQL
Works well with pg_stat_user_tables:
smart_selection:
criteria:
min_query_count: 20 # Scan counts used as proxy
BigQuery
Limited to dataset level:
# Recommend per dataset
baselinr recommend --config config.yaml --schema my_dataset
MySQL / SQLite
Limited query history:
smart_selection:
criteria:
min_rows: 1000 # Focus on size-based selection
auto_apply:
confidence_threshold: 0.6 # Lower threshold
Troubleshooting
No Recommendations
Try more lenient criteria:
smart_selection:
criteria:
min_query_count: 5
min_queries_per_day: 0.5
lookback_days: 60
Too Many Recommendations
Try stricter criteria:
smart_selection:
criteria:
min_query_count: 100
min_queries_per_day: 5
auto_apply:
max_tables: 25
Permission Errors
Snowflake:
-- Grant access to query history
GRANT USAGE ON DATABASE SNOWFLAKE TO ROLE baselinr_role;
GRANT USAGE ON SCHEMA ACCOUNT_USAGE TO ROLE baselinr_role;
PostgreSQL:
-- Should already have access to pg_stat_user_tables
Next Steps
- Review Periodically: Run
baselinr recommend --refreshmonthly - Tune Criteria: Adjust based on your recommendations
- Combine Approaches: Use both smart selection and explicit configs
- Monitor Results: Track profiling coverage and data quality
For more details, see the full Smart Table Selection guide.
Common Patterns
Pattern: Discovery Phase
# 1. Generate recommendations
baselinr recommend --config config.yaml --explain
# 2. Review top 10
head -n 50 recommendations.yaml
# 3. Start with high-confidence only
# Edit config: confidence_threshold: 0.9, max_tables: 10
# 4. Apply
baselinr recommend --config config.yaml --apply
# 5. Profile
baselinr profile --config config.yaml
Pattern: Schema-by-Schema
# Analytics schema
baselinr recommend --config config.yaml --schema analytics --output recs_analytics.yaml
# Core schema
baselinr recommend --config config.yaml --schema core --output recs_core.yaml
# Review separately, then apply
Pattern: Conservative Auto
smart_selection:
mode: "auto"
criteria:
min_query_count: 100
min_queries_per_day: 10
auto_apply:
confidence_threshold: 0.95
max_tables: 20
Then schedule:
# Cron: Daily at 2 AM
0 2 * * * cd /path/to/baselinr && baselinr profile --config config.yaml