c't Working with AI (eBook)

Inrease your Productivity with Artificial Intelligence

c't-Redaktion (Autor)

eBook Download: EPUB

2024 | 1. Auflage
164 Seiten
Heise Zeitschriften Verlag
978-3-95788-399-5 (ISBN)

The special issue of c't KI-Praxis provides tests and practical instructions for working with chatbots. It explains why language models make mistakes and how they can be minimised. This not only helps when you send questions and orders to one of the chatbots offered online. If you do not want to or are not allowed to use the cloud services for data protection reasons, for example, you can also set up your own voice AI. The c't editorial team explains where to find a suitable voice model, how to host it locally and which service providers can host it. The fact that generative AI is becoming increasingly productive harbours both opportunities and risks. Suitable rules for the use of AI in schools, training and at work help to exploit opportunities and minimise risks.

c't magazin für computertechnik is the most subscribed computer title in Europe. For over 40 years, c't has been characterised by thematically diverse, technically sound and editorially independent journalism. Over 80 experts regularly report on current developments in the IT sector and scrutinise the latest hardware and software in the c't test lab. Online, c't offers computer professionals and discerning users a comprehensive collection of tips and tricks for PC use, samples from magazine articles, comprehensive services and information about the magazine (e.g. research in the magazine archive). This service is rounded off by a dedicated forum.

Operating free voice AIs without the cloud

LLaMA, Alpaca, Vicuna: You can find heaps of medium-sized language models with all their parameters online. They also run on your own computer.

Pina Merkert

Picture: Moritz Reichartz

Do the work for me, AI: ‘Write me a friendly email to support asking where my package of lubricant has gone.’ Some tasks for AI language models are not so easy to send to a cloud service. Companies can even run into legal problems if they have AI read text with personal data outside the EU. But that doesn't mean you have to do without voice AI. Open source models, i.e. neural networks whose parameters are publicly available, can also run on your own hardware without the cloud.

The quality of the responses comes close to that of the GPT-3.5 language model of the free ChatGPT. In addition, the free models offer many more possibilities because there are dozens of variants that can be fine-tuned to different data sets, i.e. retrained, and because you have full control over the prompt. With your own prompt, you can elicit cynical or funny responses from an AI chat partner in no time at all, specify different language styles or provide up-to-date context information, for example from your own database, before the chat.

Live on the bleeding edge

Pina Merkert

It‘s great that you can now play around with gigantic language models on your own computer. However, the models are all still very new and the software to run them is subject to constant change. The GGML format has already gone through three versions this year and it can easily happen that the data format of a downloaded model does not match the latest version of llama.cpp after all.

This software is opening up to other architectures (especially in order to support LLaMA-1 and -2 at the same time) by switching to the GGML variant GGUF. For the Falcon models not based on LLaMA, however, there has long been a fork called ggllm.cpp, for which it is not clear whether GGUF will make it obsolete.

These are just examples of the fast-moving nature of the entire software infrastructure. The fact that libraries work with new CUDA versions is to a certain extent a matter of luck. New Torch versions can have an impact on libraries that do not use Torch at all. Wrappers and helpers are often written for a specific version and then no longer maintained because developers have switched to a different base model.

At the moment, you can neither rely on stable APIs nor on long-term support. Libraries and frontends usually only work together if the developers have worked with precisely this combination. However, information on what works together is difficult to find and almost never documented.

During our research, we wasted days trying to compile Torch for CUDA 12.2, compile llama.cpp with the Visual Studio compiler and use the Python binding to llama.cpp with its low-level API for a browser-based chat. Most of the time something worked, but not everything, and we had to stop the research because we didn't want to force anyone to patch code with their own patches that wasn't yet ready for a reasonably stress-free use.

Feel free to try out the models mentioned here. It's fun to experiment and you'll get a feel for where the language model AI community is heading. However, we do not recommend investing too much time in individual experiments. For 99 percent of users, it is worth waiting until the software has matured and is no longer somewhere between a technical preview and an alpha version.

We give you a short tour through the forest of freely available language models and explain how to install llama.cpp, a program for the command line that makes them easy to use.

Wooly creatures everywhere

Unlike OpenAI and Google, the AI research department of Meta (Facebook) published its LLaMA models with all parameters in February 2023. Like GPT and Bart, LLaMA uses pre-trained transformers [1] on gigantic amounts of text for enormously large language models. At the same time, models with 7, 13, 33 and 65 billion parameters appeared. Meta actually only wanted to make the parameters available to other researchers, but just one day after the release they were available for download from several places on the Internet.

No PyTorch, no training

Apart from Google, pretty much all research groups use the PyTorch framework to structure and train their models. LLaMA and all post-trained variants such as Alpaca and Guanaco are Torch models. Unfortunately, the installation of Torch is not trivial because you have to install different libraries in advance depending on the hardware. For example, it is particularly difficult to achieve full hardware acceleration with PyTorch on Apple's M1 and M2 chips.

In addition, neural networks can only be trained if the parameters are available in a numerical format that can also represent very small changes to the values. This is necessary because training proceeds in tiny, but many steps. Anyone who has trained networks themselves knows that the training quickly becomes unstable if the learning rate is set too high, which ultimately results in steps that are too large.

If you estimate how much memory 7, 13, 40 or 75 billion parameters occupy if each number is 16 bits long (training is now usually done in bfloat16 format instead of float32 ), you will find that the graphics memory and often even the entire main memory of the PC overflow in no time at all. In practice, this means that most models cannot be trained or fine-tuned on the PC (Parameter Efficient Fine Tuning – PEFT is possible, but the trick leaves the original parameters untouched).

The sum of these difficulties is also the reason why we had previously only run the ‘small’ Falcon model with 7 billion parameters in Google Colab, i.e. in the cloud (see following article).

For inferencing, i.e. using an already trained model, the software does not have to be able to display small steps. If you multiply the neuron activations by numbers significantly greater than 1, you can even continue calculating with integer weights. Both CPUs and GPUs calculate integers with fewer clock cycles and if they are just a few bits long, more of them fit next to each other in the registers of the vector units. Converting a model so that it calculates with short integers is called quantization [2]. For the price of being untrainable, quantized models can be calculated much faster and on less complex hardware and consume much less memory.

The developers of llama.cpp describe the program as a playground for the development of the GGML library. GGML is both a quantization algorithm and a data format for packing the parameters of AI models together with meta-information about the architecture into a file. When quantizing, the need for such a data format quickly arises because the same neural network can handle data types of different lengths in different places and large models generate a lot of meta-information that has to be shared with the parameters anyway.

Pre-training such large models costs gigawatt hours of electricity and requires well-equipped data centers. Various research groups therefore often lack the money to retrain their own models with randomly initialized parameters. With LLaMA, however, a trained model was suddenly available that even small teams could retrain (fine-tune) in days on various data sets.

A group from Stanford University took a particularly brazen approach. They used the OpenAI API to have ChatGPT first generate thousands of questions and then answer them. Using the data set collected in this way, they retrained the smallest LLaMA model with 7 billion parameters. The result was a much smaller AI than ChatGPT (1/25 of the memory requirement), but it knew almost as much and could write almost as well. Because of the woolly relationship with LLaMA, they called the new model Alpaca.

More models, more data sets

Research groups and open source projects began to fine-tune LLaMA on their own data sets immediately after its release. However, data sets such as Orcas, OpenAssistant, Falcon RefinedWeb and ShareGPT are usually too small on their own to improve the large models in all respects. This is why many models are trained with several of these data sets in parallel. Worth mentioning are Vicuna, Uni-Tian-Yan, Guanaco and Godzilla2-70B, which is based on Guanaco.

In addition, there are models that are fine-tuned to mostly small data sets in languages other than English or to source code instead of natural language. These models usually only get better in the application trained during fine-tuning and worse in all other conversations. However, they can still be worth a look. Hugging Face has established itself as a platform for sharing the models, where thousands of variants are now available for download.

There are also some models such as Cerebras-GPT, GPT-J, OpenChatKit and Falcon, which are not based on LLaMA but have been retrained from scratch under the Apache 2license. These are particularly interesting for companies that have problems with the license of the LLaMA models. It has not yet been legally clarified whether...

Erscheint lt. Verlag	24.1.2024
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik
ISBN-10	3-95788-399-7 / 3957883997
ISBN-13	978-3-95788-399-5 / 9783957883995

Haben Sie eine Frage zum Produkt?

EPUB (Wasserzeichen)
Größe: 35,9 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.