Skip to main content
Cuadra AI

Deploying Your AI Model

to Production: A

Practical Guide

Engineering Team
Technical
deployment
api
production
technical
integration

Deploying Your AI Model to Production: A Practical Guide

You've connected your data, trained your model, and tested it in the chat playground. Now it's time to deploy it to production and integrate it into your application.

This guide walks you through deploying your AI model using Cuadra AI's production-ready API, from getting your API keys to handling real-world scenarios.

Prerequisites

Before deploying, make sure you've completed:

  • Connect Phase - Your datasets are created and files are processed
  • Train Phase - Your model is configured with instructions and datasets attached
  • Testing - You've tested your model in the chat playground and verified it works correctly

Getting Your API Credentials

The first step is getting your API endpoint and authentication key.

Step 1: Navigate to Deploy Section

  1. Go to your model in the dashboard
  2. Click on the Deploy tab
  3. You'll see your API endpoint and authentication key

Step 2: Copy Your Credentials

You'll see:

  • API Endpoint - Your model's unique API URL
  • API Key - Your authentication token (keep this secure!)

Important: Treat your API key like a password. Never commit it to version control or expose it in client-side code.

Step 3: Regenerate if Needed

If you need to regenerate your API key:

  1. Click "Regenerate API Key"
  2. Confirm the action
  3. Update your application with the new key
  4. The old key will no longer work

Understanding the API

Cuadra AI provides a standard REST API that's easy to integrate into any application.

API Basics

  • Protocol: HTTPS
  • Format: JSON request/response
  • Authentication: API key in Authorization header
  • Method: POST for chat requests

Request Structure

json
{
  "message": "Your user's message here",
  "stream": false,
  "responseFormat": {
    "type": "json_schema",
    "json_schema": {
      // Optional: JSON schema for structured outputs
    }
  }
}

Response Structure

json
{
  "response": "The AI's response text",
  "usage": {
    "input_tokens": 150,
    "output_tokens": 200,
    "total_tokens": 350
  }
}

Integration Examples

JavaScript/Node.js

javascript
async function callCuadraAPI(userMessage) {
  const response = await fetch('YOUR_API_ENDPOINT', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: userMessage,
      stream: false
    })
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const data = await response.json();
  return data.response;
}

// Usage
const answer = await callCuadraAPI("What are your return policies?");
console.log(answer);

Python

python
import requests
import os

def call_cuadra_api(user_message):
    url = os.getenv('CUADRA_API_ENDPOINT')
    headers = {
        'Authorization': f"Bearer {os.getenv('CUADRA_API_KEY')}",
        'Content-Type': 'application/json'
    }
    data = {
        'message': user_message,
        'stream': False
    }
    
    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()
    
    return response.json()['response']

# Usage
answer = call_cuadra_api("What are your return policies?")
print(answer)

Streaming Responses

For better user experience, use streaming:

javascript
async function streamCuadraAPI(userMessage, onToken) {
  const response = await fetch('YOUR_API_ENDPOINT', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      message: userMessage,
      stream: true  // Enable streaming
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.trim());

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));
        if (data.token) {
          onToken(data.token);  // Callback for each token
        }
      }
    }
  }
}

// Usage
let fullResponse = '';
streamCuadraAPI("Tell me about your products", (token) => {
  fullResponse += token;
  // Update UI with each token for real-time display
  updateChatUI(fullResponse);
});

Production Best Practices

1. Environment Variables

Never hardcode API keys. Use environment variables:

bash
# .env file
CUADRA_API_ENDPOINT=https://api.cuadra.ai/v1/models/your-model-id
CUADRA_API_KEY=your-api-key-here

2. Error Handling

Implement robust error handling:

javascript
async function callCuadraAPIWithRetry(userMessage, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch('YOUR_API_ENDPOINT', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ message: userMessage })
      });

      if (response.status === 429) {
        // Rate limit - wait and retry
        const waitTime = Math.pow(2, attempt) * 1000; // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      if (!response.ok) {
        throw new Error(`API error: ${response.status}`);
      }

      return await response.json();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      // Wait before retry
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
}

3. Rate Limiting

Respect rate limits and implement client-side throttling:

  • Check your plan's rate limits
  • Implement request queuing if needed
  • Use exponential backoff for 429 responses
  • Monitor your usage in the dashboard

4. Timeout Handling

Set appropriate timeouts:

javascript
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout

try {
  const response = await fetch('YOUR_API_ENDPOINT', {
    signal: controller.signal,
    // ... other options
  });
  clearTimeout(timeoutId);
} catch (error) {
  if (error.name === 'AbortError') {
    // Handle timeout
  }
}

5. Logging and Monitoring

Track your API usage:

javascript
async function callCuadraAPI(userMessage) {
  const startTime = Date.now();
  
  try {
    const response = await callCuadraAPI(userMessage);
    const duration = Date.now() - startTime;
    
    // Log successful request
    logMetrics({
      success: true,
      duration,
      tokens: response.usage.total_tokens
    });
    
    return response;
  } catch (error) {
    // Log error
    logMetrics({
      success: false,
      duration: Date.now() - startTime,
      error: error.message
    });
    throw error;
  }
}

Structured Outputs

For consistent, parseable responses, use JSON schema:

javascript
const response = await fetch('YOUR_API_ENDPOINT', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.CUADRA_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    message: "What are your return policies?",
    responseFormat: {
      type: "json_schema",
      json_schema: {
        name: "return_policy_response",
        schema: {
          type: "object",
          properties: {
            policy: { type: "string" },
            timeframe: { type: "string" },
            conditions: { type: "array", items: { type: "string" } }
          },
          required: ["policy", "timeframe"]
        }
      }
    }
  })
});

Security Considerations

API Key Security

  • Server-Side Only - Never expose API keys in client-side code
  • Environment Variables - Store keys in environment variables
  • Rotation - Rotate keys periodically
  • Access Control - Limit who can access API keys

Data Privacy

  • Encryption - Use HTTPS for all API calls
  • Data Minimization - Only send necessary data
  • Compliance - Ensure compliance with GDPR, CCPA, etc.
  • Audit Logs - Log API usage for security audits

Monitoring and Analytics

Usage Dashboard

Monitor your deployment in the Cuadra AI dashboard:

  • API Calls - Track request volume
  • Token Usage - Monitor token consumption
  • Costs - View costs per model
  • Performance - Track response times

Custom Analytics

Implement your own analytics:

  • Track user satisfaction
  • Monitor response quality
  • Measure business metrics (resolution rate, etc.)
  • Identify common questions

Troubleshooting

Common Issues

Issue: 401 Unauthorized

  • Check your API key is correct
  • Verify the key hasn't been regenerated
  • Ensure the Authorization header format is correct

Issue: 429 Rate Limit

  • Check your plan's rate limits
  • Implement request throttling
  • Use exponential backoff

Issue: Slow Responses

  • Check your model configuration
  • Review knowledge base size
  • Consider using streaming for better UX
  • Monitor API performance in dashboard

Issue: Unexpected Responses

  • Review your model's system instructions
  • Check attached datasets are correct
  • Test in chat playground first
  • Verify knowledge base is up to date

Scaling Your Deployment

As your usage grows:

  • Monitor Usage - Track API calls and costs
  • Optimize Configuration - Refine model settings for efficiency
  • Upgrade Plan - Consider higher-tier plans for more resources
  • Cache Responses - Cache common queries when appropriate
  • Load Balancing - Distribute requests across multiple endpoints if needed

Next Steps

You're now ready to deploy your AI model to production! Remember:

  1. Secure Your Keys - Keep API keys safe and never expose them
  2. Handle Errors - Implement robust error handling
  3. Monitor Usage - Track performance and costs
  4. Iterate - Continuously improve based on real-world usage

Start deploying your model and see how easy it is to integrate AI into your application.


Need help with deployment? Check our API documentation or contact support.