Notes Transcription API

The Dorsum Transcription API allows you to convert audio consultations into clinical notes in real-time using WebSocket communication.

Transcribe Endpoint

WebSocket URL: wss://staging-transcriber.azurewebsites.net/api/medconnect/transcribe

Input parameters

Include the following query parameters in the WebSocket URL:

api_key: Your unique API key
client_key: Your client-specific key
input_lang: The language of the audio input (optional, default is 'en')
output_lang: The language of the clinical notes (optional, default is 'en')
output_format: The format of the clinical notes (optional, default is 'text')

Example:

wss://staging-transcriber.azurewebsites.net/api/medconnect/transcribe?api_key=YOUR_API_KEY&client_key=YOUR_CLIENT_KEY&input_lang=en&output_lang=en&output_format=text

Connection Lifecycle

Connect: Establish a WebSocket connection to the endpoint URL.
Audio Transmission: Stream audio data to the WebSocket.
Commands: Send control commands as needed.
Receive Notes: Get generated clinical notes from the server.
Disconnect: Close the WebSocket connection when finished.

Audio Requirements

Format: Raw PCM audio data
Sample Rate: 16000 Hz
Channels: Mono
Bits per sample: 16-bit

Implementation

Connecting to WebSocket

const apiKey = 'YOUR_API_KEY';
const clientKey = 'YOUR_CLIENT_KEY';

const inputLang = 'en';
const outputLang = 'en';
const outputFormat = 'text'; // or 'json'

const wsUrl = `wss://staging-transcriber.azurewebsites.net/api/medconnect/transcribe?api_key=${apiKey}&
    client_key=${clientKey}&input_lang=${inputLang}&output_lang=${outputLang}&output_format=${outputFormat}`;

const webSocket = new WebSocket(wsUrl);
webSocket.binaryType = 'arraybuffer';

webSocket.onopen = () => {
    console.log('WebSocket connected');
};

webSocket.onerror = (error) => {
    console.error('WebSocket error:', error);
};

webSocket.onclose = (event) => {
    console.log('WebSocket closed:', event.code, event.reason);
};

This code establishes a WebSocket connection to the Dorsum API. The binaryType is set to 'arraybuffer' to handle binary audio data. Event handlers are set up for connection opening, errors, and closing.

Starting Audio Transmission

async function startAudioTransmission() {
    try {
        const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
        const audioContext = new (window.AudioContext || window.webkitAudioContext)();
        const source = audioContext.createMediaStreamSource(stream);
        const processor = audioContext.createScriptProcessor(16384, 1, 1);

        processor.onaudioprocess = (e) => {
            let inputData = e.inputBuffer.getChannelData(0);

            if (!checkIfSilent(inputData)) {
                if (audioContext.sampleRate !== 16000) {
                    inputData = resampleAudio(inputData, audioContext.sampleRate, 16000);
                }

                if (webSocket.readyState === WebSocket.OPEN) {
                    webSocket.send(inputData.buffer);
                }
            }
        };

        source.connect(processor);
        processor.connect(audioContext.destination);
    } catch (error) {
        console.error('Error accessing microphone:', error);
    }
}

This function starts the audio transmission process. It accesses the user's microphone, sets up an AudioContext for processing, and sends audio data through the WebSocket. The checkIfSilent and resampleAudio functions are used to optimize the audio data before sending.

Checking for Silence (optional step)

function checkIfSilent(audioBuffer) {
    const threshold = 0.01;
    return !audioBuffer.some(sample => Math.abs(sample) > threshold);
}

This function checks if the audio buffer is silent. It uses a threshold value to determine if any sample in the buffer is above the noise level. This helps reduce unnecessary data transmission during periods of silence.

Resampling Audio

function resampleAudio(audioBuffer, originalSampleRate, targetSampleRate) {
    const ratio = targetSampleRate / originalSampleRate;
    const newLength = Math.round(audioBuffer.length * ratio);
    const result = new Float32Array(newLength);

    for (let i = 0; i < newLength; i++) {
        const index = Math.floor(i / ratio);
        result[i] = audioBuffer[index];
    }

    return result;
}

This function resamples the audio data to the required 16kHz sample rate if needed. It uses a simple linear interpolation method to adjust the sample rate. While not as accurate as more complex resampling algorithms, it's efficient for real-time processing.

Sending Commands

function sendCommand(cmd) {
    if (webSocket && webSocket.readyState === WebSocket.OPEN) {
        webSocket.send(JSON.stringify({ command: cmd }));
    }
}

// Usage examples:
// To pause transcription
sendCommand('pause');

// To resume transcription
sendCommand('resume');

// To stop transcription and generate notes
sendCommand('transcribe');

This function sends control commands to the API. It's used to pause, resume, or stop the transcription process and generate notes.

Receiving Messages

webSocket.onmessage = (event) => {
    const message = JSON.parse(event.data);
    if (message.ping) {
        webSocket.current.send(JSON.stringify({ pong: true }));
    } else if (message.clinical_notes) {
        console.log('Clinical notes:', message.clinical_notes);
    } else if (message.transcription) {
        console.log('Transcription:', message.transcription);
    } else if (message.message) {
        console.log('message:', message.message);  // example: "Transcript is too short."
    }
};

This event handler processes incoming messages from the server. It handles two types of messages:

Clinical notes, which are logged to the console.

Example clinical notes response

{
  "code": "201",
  "clinical_notes": "some clinical notes..."
}

Ping messages, which are responded to with a pong to keep the connection alive.

Language Support

The API supports transcription in multiple languages, including:

English (en)
Many other languages upon request

Note: Generating clinical notes in non-English languages may require assistance from a medical professional fluent in the target language.