Deploying Your AI Model to Production: A Practical Guide

You've connected your data, trained your model, and tested it in the chat playground. Now it's time to deploy it to production and integrate it into your application.

This guide walks you through deploying your AI model using Cuadra AI's production-ready API, from getting your API keys to handling real-world scenarios.

Prerequisites

Before deploying, make sure you've completed:

✅ Connect Phase - Your datasets are created and files are processed
✅ Train Phase - Your model is configured with instructions and datasets attached
✅ Testing - You've tested your model in the chat playground and verified it works correctly

Getting Your API Credentials

The first step is getting your API endpoint and authentication key.

Step 1: Navigate to Deploy Section

Go to your model in the dashboard
Click on the Deploy tab
You'll see your API endpoint and authentication key

Step 2: Copy Your Credentials

You'll see:

API Endpoint - Your model's unique API URL
API Key - Your authentication token (keep this secure!)

Important: Treat your API key like a password. Never commit it to version control or expose it in client-side code.

Step 3: Regenerate if Needed

If you need to regenerate your API key:

Click "Regenerate API Key"
Confirm the action
Update your application with the new key
The old key will no longer work

Understanding the API

Cuadra AI provides a standard REST API that's easy to integrate into any application.

API Basics

Protocol: HTTPS
Format: JSON request/response
Authentication: API key in Authorization header
Method: POST for chat requests

Request Structure

json

{
  "message": "Your user's message here",
  "stream": false,
  "responseFormat": {
    "type": "json_schema",
    "json_schema": {
      // Optional: JSON schema for structured outputs
    }
  }
}

Response Structure

json

{
  "response": "The AI's response text",
  "usage": {
    "input_tokens": 150,
    "output_tokens": 200,
    "total_tokens": 350
  }
}

Integration Examples

JavaScript/Node.js

javascript

async function callCuadraAPI(userMessage) {
  const response = await fetch('YOUR_API_ENDPOINT', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: userMessage,
      stream: false
    })
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const data = await response.json();
  return data.response;
}

// Usage
const answer = await callCuadraAPI("What are your return policies?");
console.log(answer);

Python

python

import requests
import os

def call_cuadra_api(user_message):
    url = os.getenv('CUADRA_API_ENDPOINT')
    headers = {
        'Authorization': f"Bearer {os.getenv('CUADRA_API_KEY')}",
        'Content-Type': 'application/json'
    }
    data = {
        'message': user_message,
        'stream': False
    }
    
    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()
    
    return response.json()['response']

# Usage
answer = call_cuadra_api("What are your return policies?")
print(answer)

Streaming Responses

For better user experience, use streaming:

javascript

async function streamCuadraAPI(userMessage, onToken) {
  const response = await fetch('YOUR_API_ENDPOINT', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: userMessage,
      stream: true  // Enable streaming
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.trim());

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));
        if (data.token) {
          onToken(data.token);  // Callback for each token
        }
      }
    }
  }
}

// Usage
let fullResponse = '';
streamCuadraAPI("Tell me about your products", (token) => {
  fullResponse += token;
  // Update UI with each token for real-time display
  updateChatUI(fullResponse);
});

Production Best Practices

1. Environment Variables

Never hardcode API keys. Use environment variables:

bash

# .env file
CUADRA_API_ENDPOINT=https://api.cuadra.ai/v1/models/your-model-id
CUADRA_API_KEY=your-api-key-here

2. Error Handling

Implement robust error handling:

javascript

async function callCuadraAPIWithRetry(userMessage, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch('YOUR_API_ENDPOINT', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ message: userMessage })
      });

      if (response.status === 429) {
        // Rate limit - wait and retry
        const waitTime = Math.pow(2, attempt) * 1000; // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      if (!response.ok) {
        throw new Error(`API error: ${response.status}`);
      }

      return await response.json();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      // Wait before retry
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
}

3. Rate Limiting

Respect rate limits and implement client-side throttling:

Check your plan's rate limits
Implement request queuing if needed
Use exponential backoff for 429 responses
Monitor your usage in the dashboard

4. Timeout Handling

Set appropriate timeouts:

javascript

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout

try {
  const response = await fetch('YOUR_API_ENDPOINT', {
    signal: controller.signal,
    // ... other options
  });
  clearTimeout(timeoutId);
} catch (error) {
  if (error.name === 'AbortError') {
    // Handle timeout
  }
}

5. Logging and Monitoring

Track your API usage:

javascript

async function callCuadraAPI(userMessage) {
  const startTime = Date.now();
  
  try {
    const response = await callCuadraAPI(userMessage);
    const duration = Date.now() - startTime;
    
    // Log successful request
    logMetrics({
      success: true,
      duration,
      tokens: response.usage.total_tokens
    });
    
    return response;
  } catch (error) {
    // Log error
    logMetrics({
      success: false,
      duration: Date.now() - startTime,
      error: error.message
    });
    throw error;
  }
}

Structured Outputs

For consistent, parseable responses, use JSON schema:

javascript

const response = await fetch('YOUR_API_ENDPOINT', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    message: "What are your return policies?",
    responseFormat: {
      type: "json_schema",
      json_schema: {
        name: "return_policy_response",
        schema: {
          type: "object",
          properties: {
            policy: { type: "string" },
            timeframe: { type: "string" },
            conditions: { type: "array", items: { type: "string" } }
          },
          required: ["policy", "timeframe"]
        }
      }
    }
  })
});

Security Considerations

API Key Security

Server-Side Only - Never expose API keys in client-side code
Environment Variables - Store keys in environment variables
Rotation - Rotate keys periodically
Access Control - Limit who can access API keys

Data Privacy

Encryption - Use HTTPS for all API calls
Data Minimization - Only send necessary data
Compliance - Ensure compliance with GDPR, CCPA, etc.
Audit Logs - Log API usage for security audits

Monitoring and Analytics

Usage Dashboard

Monitor your deployment in the Cuadra AI dashboard:

API Calls - Track request volume
Token Usage - Monitor token consumption
Costs - View costs per model
Performance - Track response times

Custom Analytics

Implement your own analytics:

Track user satisfaction
Monitor response quality
Measure business metrics (resolution rate, etc.)
Identify common questions

Troubleshooting

Common Issues

Issue: 401 Unauthorized

Check your API key is correct
Verify the key hasn't been regenerated
Ensure the Authorization header format is correct

Issue: 429 Rate Limit

Check your plan's rate limits
Implement request throttling
Use exponential backoff

Issue: Slow Responses

Check your model configuration
Review knowledge base size
Consider using streaming for better UX
Monitor API performance in dashboard

Issue: Unexpected Responses

Review your model's system instructions
Check attached datasets are correct
Test in chat playground first
Verify knowledge base is up to date

Scaling Your Deployment

As your usage grows:

Monitor Usage - Track API calls and costs
Optimize Configuration - Refine model settings for efficiency
Upgrade Plan - Consider higher-tier plans for more resources
Cache Responses - Cache common queries when appropriate
Load Balancing - Distribute requests across multiple endpoints if needed

Next Steps

You're now ready to deploy your AI model to production! Remember:

Secure Your Keys - Keep API keys safe and never expose them
Handle Errors - Implement robust error handling
Monitor Usage - Track performance and costs
Iterate - Continuously improve based on real-world usage

Start deploying your model and see how easy it is to integrate AI into your application.

Need help with deployment? Check our API documentation or contact support.

Deploying Your AI Model

to Production: A

Practical Guide

Deploying Your AI Model to Production: A Practical Guide

Prerequisites

Getting Your API Credentials

Step 1: Navigate to Deploy Section

Step 2: Copy Your Credentials

Step 3: Regenerate if Needed

Understanding the API

API Basics

Request Structure

Response Structure

Integration Examples

JavaScript/Node.js

Python

Streaming Responses

Production Best Practices

1. Environment Variables

2. Error Handling

3. Rate Limiting

4. Timeout Handling

5. Logging and Monitoring

Structured Outputs

Security Considerations

API Key Security

Data Privacy

Monitoring and Analytics

Usage Dashboard

Custom Analytics

Troubleshooting

Common Issues

Scaling Your Deployment

Next Steps

Helpful Resources