In this blog post, we will teach you how to use LLaMa (Meta's AI) models using ruby for your applications and projects.
LLaMa is the Meta’s AI that “accidentally” had been shared by torrent and is now available to everyone. LLaMa.cpp is a project that provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. Many cool projects have been made with LLaMa.cpp, such as GPT4ALL for instance.
A library in C++ can be used to create ports to other languages, and luckily, a port to Ruby has been made: yoshoku/llama_cpp.rb! This means that now we can directly use LLaMa models like Alpaca or Vicuna in Ruby (and Node.js too, using hlhr202/llama-node).
The instructions on how to run it were a little cryptic, and I couldn’t find a straightforward step-by-step tutorial on how to use it. But the process is already quite simple if you know the steps.
First of all, you need to have llama_cpp.rb in your project:
$ bundle add llama_cpp
Then you need to download the model; the easiest way is to download it from Hugging Face. All the Hugging Face models are stored in git and we can download them, but we need to keep in mind that models are very large (some GBs in size), so it is recommended to use the git large file storage.
To download ggml-vicuna-7b-4bit, for example, you can run:
$ git lfs install
$ git clone [email protected]:chharlesonfire/ggml-vicuna-7b-4bit ./models
Then it is ready to be used in your code. I’ll share with you an experiment that I’m doing in Jambots to implement it:
#!/usr/bin/env ruby
require "llama_cpp"
require "bundler/setup"
model_path = "./models/ggml-vicuna-7b-4bit/ggml-vicuna-7b-q4_0.bin"
prompt = <<~HEREDOC
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User’s requests immediately and with precision.
User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:
HEREDOC
if ARGV.empty?
puts "Message required"
exit(1)
end
messages = ARGV[0]
client = LLaMACpp::Client.new(model_path: model_path, n_threads: 4, seed: 12)
output = client.completions("#{prompt} #{messages}")
puts output
It runs a bit slow on my computer, and the arguments of llama.cpp are a bit complicated, but at least now we can use this ecosystem in Ruby and Node.
Enjoy tinkering with it!
This is a question we are asked all too frequently from outside the company. However, we recently asked this very question ourselves. Yes, we only do Ruby and that isn't going to change anytime soon.
Read full articleWe've written a Ruby CLI using Thor for a client project and we share everything we've learnt in this blog post!
Read full articleGoLang has the option to create shared libraries in C, and in this post I will show you how to do it.
Read full article