BigQuery Insights
Build an AI agent that connects to Google BigQuery, analyzes large datasets, and generates data-driven insights using TeamDay
BigQuery Insights Agent
Build an AI agent that connects to Google BigQuery, runs complex SQL queries, and generates actionable insights from your data warehouse.
What You'll Build
A data analysis agent that:
- Connects to Google BigQuery via MCP
- Writes and executes SQL queries on your datasets
- Analyzes trends, detects anomalies, and segments customers
- Runs on a schedule to deliver daily or weekly reports
Time to complete: 45-60 minutes
Prerequisites
- A TeamDay account with an organization
- Google Cloud project with BigQuery enabled
- BigQuery datasets with data to analyze
- Basic SQL knowledge
Architecture
graph LR
A[User / Mission] --> B[Data Analyst Agent]
B --> C[BigQuery MCP Server]
C --> D[BigQuery API]
D --> E[Your Datasets]
B --> F[Insights & Recommendations]Step 1: Google Cloud Setup
1.1 Enable BigQuery API
gcloud config set project your-project-id
gcloud services enable bigquery.googleapis.com
gcloud services enable bigquerystorage.googleapis.comOr enable via the Console: APIs & Services > Enable APIs > BigQuery API.
1.2 Create a Service Account
# Create service account
gcloud iam service-accounts create teamday-bigquery \
--display-name="TeamDay BigQuery Agent"
SA_EMAIL="teamday-bigquery@your-project-id.iam.gserviceaccount.com"
# Grant BigQuery permissions
gcloud projects add-iam-policy-binding your-project-id \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/bigquery.dataViewer"
gcloud projects add-iam-policy-binding your-project-id \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/bigquery.jobUser"
# Download key file
gcloud iam service-accounts keys create ~/teamday-bigquery-key.json \
--iam-account=$SA_EMAILRequired roles:
bigquery.dataViewer-- Read dataset contentsbigquery.jobUser-- Run queriesbigquery.dataEditor-- (Optional) Write data back
1.3 Create Sample Data (Optional)
If you want to follow along with the examples below:
bq mk --dataset --location=US --description="Analytics Data" \
your-project-id:analytics
bq query --use_legacy_sql=false "
CREATE TABLE analytics.user_events AS
SELECT
GENERATE_UUID() as event_id,
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(RAND()*30 AS INT64) DAY) as timestamp,
CONCAT('user_', CAST(CAST(RAND()*1000 AS INT64) AS STRING)) as user_id,
['page_view', 'click', 'signup', 'purchase'][CAST(RAND()*4 AS INT64)] as event_type,
RAND()*100 as value
FROM UNNEST(GENERATE_ARRAY(1, 10000))
"Step 2: TeamDay Setup
2.1 Store GCP Credentials as Space Secrets
Base64-encode your service account key and store it alongside your project ID:
# Base64 encode the service account file
SA_ENCODED=$(base64 < ~/teamday-bigquery-key.json)
# Store secrets on the space
curl -X POST https://cc.teamday.ai/api/v1/spaces/$SPACE_ID/secrets \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"secrets\": {
\"GCP_SERVICE_ACCOUNT_JSON\": \"$SA_ENCODED\",
\"GCP_PROJECT_ID\": \"your-project-id\"
}
}"Secret keys must be UPPER_SNAKE_CASE.
2.2 Create the Data Analyst Agent
curl -X POST https://cc.teamday.ai/api/v1/agents \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "BigQuery Data Analyst",
"role": "Data Analyst",
"model": "claude-sonnet-4-6",
"visibility": "organization",
"tags": ["bigquery", "analytics", "data"],
"systemPrompt": "You are an expert data analyst with deep knowledge of SQL and BigQuery.\n\n## Capabilities\n- Write efficient BigQuery SQL queries with proper partitioning and clustering\n- Analyze query results for trends, patterns, and anomalies\n- Generate clear insights with business context\n- Make data-driven recommendations\n\n## BigQuery Best Practices\n- Select only needed columns (avoid SELECT *)\n- Use LIMIT for exploratory queries\n- Use approximate aggregation functions when precision is not critical\n- Leverage window functions for complex calculations\n- Use WITH clauses for query readability\n\n## Response Format\nFor each analysis:\n1. Explain your approach\n2. Show the SQL query\n3. Summarize the results\n4. List key insights\n5. Provide actionable recommendations"
}'Response:
{
"success": true,
"id": "kf8x2mNp4qRt",
"name": "BigQuery Data Analyst",
"status": "active",
"chatUrl": "/agents/kf8x2mNp4qRt/chat"
}Save the agent ID (kf8x2mNp4qRt) -- you will need it in subsequent steps.
Step 3: Configure BigQuery MCP
3.1 Register the MCP Instance
curl -X POST https://cc.teamday.ai/api/v1/mcps \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"mcpType": "bigquery",
"name": "BigQuery Analytics",
"description": "Connection to production BigQuery datasets",
"isActive": true,
"credentials": {
"GCP_PROJECT_ID": {
"value": "your-project-id",
"isSecret": false
},
"GCP_SERVICE_ACCOUNT_JSON": {
"value": "'"$SA_ENCODED"'",
"isSecret": true
}
}
}'Response:
{
"success": true,
"id": "wR3jLm9vKxYz",
"name": "BigQuery Analytics",
"mcpType": "bigquery"
}3.2 Attach Agent and MCP to the Space
curl -X PATCH https://cc.teamday.ai/api/v1/spaces/$SPACE_ID \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"addAgents": ["kf8x2mNp4qRt"],
"addMcps": ["wR3jLm9vKxYz"]
}'3.3 Alternative: .mcp.json in the Space
If you prefer file-based configuration, place an .mcp.json in the space's root directory:
{
"mcpServers": {
"bigquery": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-bigquery"],
"env": {
"GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
"GCP_PROJECT_ID": "${GCP_PROJECT_ID}"
}
}
}
}3.4 Verify the Connection
Execute the agent with a simple test query:
curl -X POST https://cc.teamday.ai/api/v1/agents/kf8x2mNp4qRt/execute \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"message\": \"List all available datasets and tables in the project.\",
\"spaceId\": \"$SPACE_ID\"
}"Step 4: Usage Examples
4.1 Basic Query
You: How many total events do we have, broken down by type?
Agent: I'll query the user_events table.
SELECT
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users,
ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
GROUP BY event_type
ORDER BY event_count DESC
Results:
| Event Type | Events | Unique Users | Avg Value |
|------------|--------|--------------|-----------|
| page_view | 2,534 | 412 | 49.82 |
| click | 2,489 | 398 | 51.34 |
| signup | 2,501 | 497 | 48.91 |
| purchase | 2,476 | 491 | 50.12 |
Total: 10,000 events across ~500 users over the last 30 days.4.2 User Behavior Analysis
SELECT
DATE(timestamp) as date,
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users,
ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY date, event_type
ORDER BY date DESC, event_count DESCThe agent will interpret results and surface insights like:
- Conversion funnel health (view > click > signup > purchase rates)
- Day-over-day trends and growth rate
- Recommendations for top-of-funnel optimization
4.3 Cohort Retention Analysis
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC(MIN(timestamp), WEEK) as cohort_week
FROM `your-project-id.analytics.user_events`
GROUP BY user_id
),
user_activity AS (
SELECT
user_id,
DATE_TRUNC(timestamp, WEEK) as activity_week
FROM `your-project-id.analytics.user_events`
GROUP BY user_id, activity_week
)
SELECT
cohort_week,
COUNT(DISTINCT uc.user_id) as cohort_size,
DATE_DIFF(ua.activity_week, uc.cohort_week, WEEK) as weeks_since_signup,
COUNT(DISTINCT ua.user_id) as active_users,
ROUND(
COUNT(DISTINCT ua.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 1
) as retention_pct
FROM user_cohorts uc
LEFT JOIN user_activity ua ON uc.user_id = ua.user_id
GROUP BY cohort_week, weeks_since_signup
HAVING cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 5 WEEK)
ORDER BY cohort_week, weeks_since_signupExample output:
| Cohort | Size | Week 0 | Week 1 | Week 2 | Week 3 | Week 4 |
|---|---|---|---|---|---|---|
| Jan 20 | 178 | 100% | 56.2% | 38.8% | 29.2% | 24.7% |
| Jan 27 | 195 | 100% | 58.5% | 41.0% | 31.3% | -- |
| Feb 3 | 203 | 100% | 59.1% | 43.3% | -- | -- |
| Feb 10 | 189 | 100% | 61.4% | -- | -- | -- |
The agent will identify trends (improving Week 1 retention), pinpoint the biggest drop-off window (Week 1 to Week 2), and recommend interventions like onboarding email sequences or re-engagement triggers.
4.4 Revenue & Customer Segmentation
WITH customer_metrics AS (
SELECT
user_id,
COUNT(*) as total_purchases,
SUM(value) as total_revenue,
AVG(value) as avg_order_value,
DATE_DIFF(CURRENT_TIMESTAMP(), MAX(timestamp), DAY) as days_since_last_purchase
FROM `your-project-id.analytics.user_events`
WHERE event_type = 'purchase'
GROUP BY user_id
)
SELECT
CASE
WHEN total_revenue >= 500 THEN 'VIP'
WHEN total_revenue >= 200 THEN 'High Value'
WHEN total_revenue >= 100 THEN 'Medium Value'
ELSE 'Low Value'
END as segment,
COUNT(*) as customers,
ROUND(SUM(total_revenue), 2) as segment_revenue,
ROUND(AVG(total_revenue), 2) as avg_customer_value,
ROUND(AVG(avg_order_value), 2) as avg_order_value,
ROUND(AVG(days_since_last_purchase), 1) as avg_days_since_purchase
FROM customer_metrics
GROUP BY segment
ORDER BY segment_revenue DESCExample output:
| Segment | Customers | Revenue | Avg Value | AOV | Days Since |
|---|---|---|---|---|---|
| VIP | 8 | $4,234 | $529.25 | $77.87 | 3.2 |
| High Value | 23 | $5,892 | $256.17 | $80.05 | 5.1 |
| Medium Value | 67 | $8,123 | $121.24 | $80.83 | 8.7 |
| Low Value | 156 | $7,234 | $46.37 | $42.15 | 12.3 |
The agent surfaces the Pareto distribution (top 12% of customers drive 40% of revenue), flags the at-risk low-value segment (12+ days inactive), and recommends VIP retention programs and win-back campaigns.
4.5 Anomaly Detection
WITH hourly_metrics AS (
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY hour, event_type
),
historical_avg AS (
SELECT
EXTRACT(HOUR FROM timestamp) as hour_of_day,
event_type,
AVG(hourly_count) as avg_count,
STDDEV(hourly_count) as stddev_count
FROM (
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) as timestamp,
event_type,
COUNT(*) as hourly_count
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY timestamp, event_type
)
GROUP BY hour_of_day, event_type
)
SELECT
hm.hour,
hm.event_type,
hm.event_count as actual,
ROUND(ha.avg_count, 1) as expected,
ROUND(
(hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0), 2
) as z_score,
CASE
WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 2
THEN 'ANOMALY'
WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
THEN 'WARNING'
ELSE 'NORMAL'
END as status
FROM hourly_metrics hm
JOIN historical_avg ha
ON EXTRACT(HOUR FROM hm.hour) = ha.hour_of_day
AND hm.event_type = ha.event_type
WHERE ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
ORDER BY z_score DESCThe agent compares current hourly metrics against the trailing 7-day average, flags anything beyond 1.5 standard deviations, and recommends immediate investigation steps for critical anomalies (z-score > 2).
Step 5: Advanced Patterns
5.1 Funnel Analysis
WITH funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) as step_2_click,
MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as step_3_signup,
MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as step_4_purchase
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY user_id
)
SELECT
SUM(step_1_view) as viewed,
SUM(step_2_click) as clicked,
SUM(step_3_signup) as signed_up,
SUM(step_4_purchase) as purchased,
ROUND(SUM(step_2_click) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
as view_to_click_pct,
ROUND(SUM(step_3_signup) * 100.0 / NULLIF(SUM(step_2_click), 0), 1)
as click_to_signup_pct,
ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_3_signup), 0), 1)
as signup_to_purchase_pct,
ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
as overall_conversion_pct
FROM funnel_steps5.2 Churn Prediction
Identify users likely to churn based on activity patterns:
WITH user_features AS (
SELECT
user_id,
COUNT(*) as total_events,
COUNT(DISTINCT DATE(timestamp)) as active_days,
DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) as days_inactive,
AVG(CASE WHEN event_type = 'purchase' THEN value ELSE 0 END) as avg_purchase_value,
COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count
FROM `your-project-id.analytics.user_events`
GROUP BY user_id
)
SELECT
user_id,
CASE
WHEN days_inactive > 14 AND purchase_count > 0 THEN 'HIGH_RISK'
WHEN days_inactive > 7 AND active_days < 3 THEN 'MEDIUM_RISK'
WHEN days_inactive > 3 AND total_events < 5 THEN 'LOW_RISK'
ELSE 'ACTIVE'
END as churn_risk,
days_inactive,
total_events,
purchase_count,
ROUND(avg_purchase_value, 2) as avg_purchase_value
FROM user_features
WHERE days_inactive > 3
ORDER BY
CASE
WHEN days_inactive > 14 AND purchase_count > 0 THEN 1
WHEN days_inactive > 7 AND active_days < 3 THEN 2
ELSE 3
END,
days_inactive DESCStep 6: Schedule with Missions
6.1 Daily Insights Report
curl -X POST https://cc.teamday.ai/api/v1/missions \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Daily BigQuery Insights",
"goal": "Analyze yesterday data from BigQuery: 1) Overall event metrics and trends 2) Revenue summary 3) Any anomalies detected 4) Top recommendations",
"characterId": "kf8x2mNp4qRt",
"spaceId": "'"$SPACE_ID"'",
"schedule": {
"type": "cron",
"value": "0 8 * * *"
}
}'Note: characterId is the agent's ID in the API.
Response:
{
"success": true,
"id": "nV7bTq2wXpLk",
"title": "Daily BigQuery Insights",
"status": "pending",
"schedule": {
"type": "cron",
"value": "0 8 * * *"
}
}6.2 Weekly Deep Dive
curl -X POST https://cc.teamday.ai/api/v1/missions \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Weekly Analytics Deep Dive",
"goal": "Comprehensive weekly analysis: 1) Cohort retention trends 2) Revenue by customer segment 3) Funnel conversion rates 4) Churn risk report 5) Key insights and strategic recommendations",
"characterId": "kf8x2mNp4qRt",
"spaceId": "'"$SPACE_ID"'",
"schedule": {
"type": "cron",
"value": "0 9 * * 1"
}
}'Troubleshooting
Authentication failed (403)
- Verify the service account JSON is valid and base64-encoded correctly
- Confirm the service account has
bigquery.dataViewerandbigquery.jobUserroles - Check that the BigQuery API is enabled in your GCP project
Query timeout
- Add a
LIMITclause for exploratory queries - Use partitioned and clustered tables for large datasets
- Break complex queries into smaller CTEs
No data returned
- Verify dataset and table names (they are case-sensitive)
- Check date range filters -- timestamps may be in UTC
- Test with a simple
SELECT COUNT(*) FROM tablefirst
Cost concerns
- BigQuery charges $5 per TB scanned -- always select only needed columns
- Use
maximumBytesBilledin your query settings to cap cost - Partition tables by date and cluster by frequently filtered columns
Cost Estimation
| Component | Estimate |
|---|---|
| BigQuery queries (typical: 10-100 MB each) | $0.00005-$0.0005 per query |
| BigQuery storage | $0.02 per GB/month |
| TeamDay agent execution (~15K input + ~4K output tokens) | ~$0.18 per analysis |
| Monthly total (daily reports) | ~$6-10/month |
Next Steps
- Analytics Reporter -- Combine with GA4 data
- Code Review Bot -- Analyze engineering metrics
- API Reference -- Full TeamDay API documentation