GigaChat 2.0 has brought Russian artificial intelligence to a new level

GigaChat 2.0 has brought Russian artificial intelligence to a new level

      The model can work with audio, video, texts, pictures and analyze data from the Internet in real time.

       Sber has introduced an updated version of its neural network platform, GigaChat 2.0. It is no longer just a chatbot: it is now a full—fledged multimodal assistant capable of understanding speech, images, videos, and large amounts of text, providing up-to-date, verified data linked to sources. The new GigaChat 2.0 is also integrated into the voice assistants of the Sber smart speakers, and one of the first digital platforms to integrate it was MAX from VK.

       What has changed in GigaChat 2.0

       The main innovation is the support of multimodality. Artificial intelligence now recognizes voice audio files, understands images, works with video links (including YouTube), and analyzes documents of up to 200 pages. For example, a user can download a lease agreement and get an analysis based on current Russian laws, decode an audio recording of medical recommendations, or understand the essence of a video tutorial.

       Audio processing capabilities have been significantly improved. The model perceives audio data directly, without converting it to text. She is able to identify the main meanings, answer questions about the content, recognize spoken language, accents, music and extraneous sounds. The possibilities are limited by file sizes: up to 60 minutes or 30 megabytes. However, in practice, there are still limitations on formats and volumes when working with audio files.

       Working with up-to-date data in real time has become another key feature. Now GigaChat 2.0 can search for information on the Internet, filter it out, highlight the main thing and provide links to trusted sources. This avoids outdated data on which the model was trained, and reduces the risk of so—called "hallucinations" - errors in the output.

       Two versions — for different tasks

       There are two versions in the line: GigaChat 2 Pro — for everyday tasks such as writing texts or quick reference information, and GigaChat 2 Max — for complex professional queries. The Max model has already won the first place among AI models in the MERA benchmark for the Russian language and competes confidently with foreign analogues like the GPT-4 and LLaMA 70B.

       Music, pictures, and videos

       GigaChat 2.0 has learned how to generate music and songs based on a text query — now the maximum track length is three minutes, and you can create it in one. The model supports song generation even in foreign languages, such as Chinese.

       Working with images has also reached a new level. Artificial intelligence can analyze the contents of a photo, decipher text, advise on clothing styles, solve problems, or explain the contents of receipts.

       With regard to video, GigaChat 2.0 is able to process links: the model analyzes the audio track, tells the main gist, answers questions, or highlights key points, including videos in English or other languages.

       Smart speakers and live dialogue

       For the first time in Russia, all of Sberbank's smart speakers have been converted to a large language model. This allows you to have a live dialogue with the user in a clear language or in a given role. Now the speaker keeps the thread of conversation 10 times longer than before, explains complex things in simple words, or responds on behalf of, for example, a movie star.

       The model supports 18 communication settings: voice selection, address format ("you" or "you"), and communication style. You can set several commands in one message, and the speaker will automatically understand when to set an alarm, turn on music, or find information.

       GigaChat 2.0 on the MAX platform from VK

       One of the first new partners was the MAX platform from VK, a domestic equivalent of WeChat with a messenger, mini-applications, chatbots and a payment service. Users can use GigaChat 2.0 to create texts, decode audio, briefly retell videos and articles, and receive help with professional and everyday issues.

       GigaChat 2.0 has become an important step in the development of Russian AI services. Thanks to integration with smart speakers, platforms, and expanded functionality, it has become a full—fledged universal assistant that understands text, sound, video, and image - and can not only respond, but analyze, advise, and even create music.

       Read also

       Digital doubles in construction. Efficiency, challenges and prospects

      

       A digital twin is a virtual replica of an object, such as a car or a building, or a process, such as the production of a product. According to a study by the Higher School of Economics, almost 22% of companies from 15 sectors of the economy are already using this technology, and 34% are planning to implement it. Kirill Polyakov, founder of the digital construction management platform Pragmacore (a small Skolkovo technology company), told IT-World about how such twins help the construction industry.

GigaChat 2.0 has brought Russian artificial intelligence to a new level

Other articles

GigaChat 2.0 has brought Russian artificial intelligence to a new level

The model can work with audio, video, texts, pictures and analyze data from the Internet in real time.