Pinch PINCH

Real-time API Reference

Authentication

All API requests require authentication using a Bearer token in the Authorization header.

Authorization header
Authorization: Bearer <your-api-token>

Base URL: https://api.startpinch.com

Endpoints

Create Translation Session

Creates a new real-time translation session and returns connection credentials.

POST /api/beta1/session

Request Headers

HeaderValueRequired
AuthorizationBearer <your-api-token>Yes
Content-Typeapplication/jsonYes

Request Body

{
  "source_language": "string",
  "target_language": "string",
  "voice_type": "string"
}

Parameters:

FieldTypeRequiredDescription
source_languagestringYesSource language code hint (e.g., "en-US")
target_languagestringYesTarget language code (e.g., "es-ES")
voice_typestringNoVoice: "clone" or "female" or "male" (default: "clone")

Response

{
  "url": "string",
  "token": "string",
  "room_name": "string"
}

Response Fields:

FieldTypeDescription
urlstringURL for connection
tokenstringJWT token for authenticating to room
room_namestringUnique room identifier (format: api-<random-id>)

Example Requests

JavaScript (Fetch)

const response = await fetch('https://api.startpinch.com/api/beta1/session', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <your-api-token>',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    source_language: 'en-US',
    target_language: 'es-ES',
    voice_type: 'clone'
  })
});

const session = await response.json();
console.log(session);

Error Responses

{
  "error": {
    "code": "invalid_language",
    "message": "Unsupported target language: xx-XX"
  }
}

Data Messages

Transcript data is published via data channel using the DataReceived event.

The system publishes two types of transcripts:

Original Transcript

The recognized speech in the source language:

{
  "type": "original_transcript",
  "text": "Hello, how are you?",
  "timestamp": 1770933604.048,
  "is_final": true,
  "language_detected": "en-US",
}

Translated Transcript

The translated text in the target language:

{
  "type": "translated_transcript",
  "text": "Hola, ¿cómo estás?",
  "timestamp": 1770933604.945,
  "is_final": true,
  "language_detected": "en-US",
}

Interim vs final results

Interim results (is_final: false)

  • Partial transcripts generated as speech is detected
  • Only for original_transcript messages, and some languages do not support interim
  • May change as more audio is processed
  • Useful for displaying real-time feedback

Final results (is_final: true)

  • Complete, stable transcripts
  • translated_transcript messages are always final

Language detection

  • Transcript messages include language_detected (for example, en-US) so clients can see what language the model identified for that segment.
  • Represents the end of a speech segment

Transcript segmentation

The system intelligently segments speech into translatable units. It identifies natural sentence boundaries and waits for speech pauses to finalize segments.

Example:

EXAMPLE
[interim] "One has red color and other has yellow so" [interim] "One has red colour and other has yellow. So, I really..." [final] "One has red colour and other has yellow." [final] "So, I really love these colours."

Voice Types

Three voice options are available:

  • clone: Your own voice
  • female: Female voice synthesis
  • male: Male voice synthesis

The default voice type is clone.