Notes Transcription API
The Dorsum Transcription API allows you to convert audio consultations into clinical notes in real-time using WebSocket communication.
Transcribe Endpoint
WebSocket URL:
wss://staging-transcriber.azurewebsites.net/api/medconnect/transcribe
Input parameters
Include the following query parameters in the WebSocket URL:
api_key
: Your unique API keyclient_key
: Your client-specific keyinput_lang
: The language of the audio input (optional, default is 'en')output_lang
: The language of the clinical notes (optional, default is 'en')output_format
: The format of the clinical notes (optional, default is 'text')
Example:
wss://staging-transcriber.azurewebsites.net/api/medconnect/transcribe?api_key=YOUR_API_KEY&client_key=YOUR_CLIENT_KEY&input_lang=en&output_lang=en&output_format=text
Connection Lifecycle
- Connect: Establish a WebSocket connection to the endpoint URL.
- Audio Transmission: Stream audio data to the WebSocket.
- Commands: Send control commands as needed.
- Receive Notes: Get generated clinical notes from the server.
- Disconnect: Close the WebSocket connection when finished.
Audio Requirements
- Format: Raw PCM audio data
- Sample Rate: 16000 Hz
- Channels: Mono
- Bits per sample: 16-bit
Implementation
Connecting to WebSocket
const apiKey = 'YOUR_API_KEY';
const clientKey = 'YOUR_CLIENT_KEY';
const inputLang = 'en';
const outputLang = 'en';
const outputFormat = 'text'; // or 'json'
const wsUrl = `wss://staging-transcriber.azurewebsites.net/api/medconnect/transcribe?api_key=${apiKey}&
client_key=${clientKey}&input_lang=${inputLang}&output_lang=${outputLang}&output_format=${outputFormat}`;
const webSocket = new WebSocket(wsUrl);
webSocket.binaryType = 'arraybuffer';
webSocket.onopen = () => {
console.log('WebSocket connected');
};
webSocket.onerror = (error) => {
console.error('WebSocket error:', error);
};
webSocket.onclose = (event) => {
console.log('WebSocket closed:', event.code, event.reason);
};
This code establishes a WebSocket connection to the Dorsum API. The binaryType
is set to 'arraybuffer' to handle binary audio data. Event handlers are set up for connection opening, errors, and closing.
Starting Audio Transmission
async function startAudioTransmission() {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(16384, 1, 1);
processor.onaudioprocess = (e) => {
let inputData = e.inputBuffer.getChannelData(0);
if (!checkIfSilent(inputData)) {
if (audioContext.sampleRate !== 16000) {
inputData = resampleAudio(inputData, audioContext.sampleRate, 16000);
}
if (webSocket.readyState === WebSocket.OPEN) {
webSocket.send(inputData.buffer);
}
}
};
source.connect(processor);
processor.connect(audioContext.destination);
} catch (error) {
console.error('Error accessing microphone:', error);
}
}
This function starts the audio transmission process. It accesses the user's microphone, sets up an AudioContext for processing, and sends audio data through the WebSocket. The checkIfSilent
and resampleAudio
functions are used to optimize the audio data before sending.
Checking for Silence (optional step)
function checkIfSilent(audioBuffer) {
const threshold = 0.01;
return !audioBuffer.some(sample => Math.abs(sample) > threshold);
}
This function checks if the audio buffer is silent. It uses a threshold value to determine if any sample in the buffer is above the noise level. This helps reduce unnecessary data transmission during periods of silence.
Resampling Audio
function resampleAudio(audioBuffer, originalSampleRate, targetSampleRate) {
const ratio = targetSampleRate / originalSampleRate;
const newLength = Math.round(audioBuffer.length * ratio);
const result = new Float32Array(newLength);
for (let i = 0; i < newLength; i++) {
const index = Math.floor(i / ratio);
result[i] = audioBuffer[index];
}
return result;
}
This function resamples the audio data to the required 16kHz sample rate if needed. It uses a simple linear interpolation method to adjust the sample rate. While not as accurate as more complex resampling algorithms, it's efficient for real-time processing.
Sending Commands
function sendCommand(cmd) {
if (webSocket && webSocket.readyState === WebSocket.OPEN) {
webSocket.send(JSON.stringify({ command: cmd }));
}
}
// Usage examples:
// To pause transcription
sendCommand('pause');
// To resume transcription
sendCommand('resume');
// To stop transcription and generate notes
sendCommand('transcribe');
This function sends control commands to the API. It's used to pause, resume, or stop the transcription process and generate notes.
Receiving Messages
webSocket.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.ping) {
webSocket.current.send(JSON.stringify({ pong: true }));
} else if (message.clinical_notes) {
console.log('Clinical notes:', message.clinical_notes);
} else if (message.transcription) {
console.log('Transcription:', message.transcription);
} else if (message.message) {
console.log('message:', message.message); // example: "Transcript is too short."
}
};
This event handler processes incoming messages from the server. It handles two types of messages:
- Clinical notes, which are logged to the console.
Example clinical notes response
{
"code": "201",
"clinical_notes": "some clinical notes..."
}
- Ping messages, which are responded to with a pong to keep the connection alive.
Language Support
The API supports transcription in multiple languages, including:
- English (en)
- Many other languages upon request
Note: Generating clinical notes in non-English languages may require assistance from a medical professional fluent in the target language.