投稿

Conversing via Local Microphone and Speaker using Realtime API

イメージ
Conversing via Local Microphone and Speaker using Realtime API Several code samples using the Realtime API provided by OpenAI and Azure are available online. However, Python code is only available on Azure’s GitHub, and it assumes the use of an audio file as input. Therefore, I modified the code to accept real-time audio input from the local microphone using Python. The modified version is available on GitHub . Since the code is simple and concise, it should be easy to integrate into other projects. The original code is based on low_level_sample.py , and a detailed explanation is available in this article , which you can refer to. About the Modifications This article explains how to modify a Python application that processes audio to accept input from the local microphone and output audio data returned by the Realtime API through the local speaker. The implementation mainly uses the pyaudio library. The modifications consist of the following two points:

Comparing Prompt Caching: OpenAI, Anthropic, and Gemini

イメージ
Comparing Prompt Caching: OpenAI, Anthropic, and Gemini In recent years, the rapid development of large language models (LLMs) has led to significant increases in context window sizes. A context window refers to the amount of information a model can process at one time, and innovations like Retrieval-Augmented Generation (RAG), video, and image inputs have expanded the usable context length in LLMs. This evolution is aimed at handling more complex tasks and a wider range of information. In response, major providers have introduced “prompt caching” for efficient prompt management. Prompt caching stores previously used prompts and their results for reuse, avoiding repeated processing of the same tasks. This leads to faster processing times and cost savings. In this article, we will compare the prompt caching features of the key LLM providers: OpenAI, Anthropic, and Gemini, focusing on their specifications and differences. Models Supporting Prompt Caching Promp

OpenAI Realtime API Python Code: Understanding the Low-Level Sample Code for Azure's Realtime Audio Python Code

イメージ
OpenAI Realtime API Python Code: Understanding the Low-Level Sample Code for Azure's Realtime Audio Python Code Introduction The “gpt-4o-realtime-preview” has been released. In addition to text and audio input/output, it also allows custom function calling via function calling. As of October 2, 2024, there are issues such as 403 errors, and it seems the API is not usable. This article will be updated once it becomes available. OpenAI has provided a JavaScript code sample on its website. Additionally, Azure has also published a Python code sample on GitHub. In this article, we will analyze Azure’s sample code, “ low_level_sample.py ,” to understand how it works. Libraries The required libraries are as follows: python-dotenv soundfile numpy scipy Code Explanation main Function In the main function, it first loads the dotenv file to retrieve the API key and endpoint: load_dotenv ( ) Next, it checks the arguments. This file is executed