ruby embedded into c++

Tutorial, Guide, Howto

Simon neoneye Strandgaard

Revision History
Revision $Revision: 1.87.2.11 $	$Date: 2003/06/08 19:59:05 $ cvs-changelog

Overview

Important

This document is work in progress and still in an early stage. Your suggestions is welcome [forum].

In this document I will descripe how to embed the ruby interpreter into c++ [WhyEmbedRuby?, EmbedRuby] and provide you with a skeleton you easily can build upon.

We want to share some of our classes between c++ and ruby, so that no difference can be feelt. This task is non-trivial and therefore the basic concepts needs to be explained. In this text we will accomplish the following:

c++ classes accessable from ruby.
ruby classes accessable from c++.
translate exceptions between ruby and c++.
let SWIG do all the hard work for us.

I am assuming that the reader has some experience with writing ruby extensions, has an understanding of UML diagrams. No previous experience with SWIG is required. This code should work with G++3, GNU-make, Ruby-1.8.0. For the [Combining Everything] you might need GNU-autoconf, SWIG-1.3.18.

Please share your experiences, this might help others! comments, suggestions, reviews, bugs, bugfixes, code. All kinds of contributions is welcome :-)

Download

I said that I would provide you with a skeleton which you freely can built upon, no credit is necessary.

Here it is [rubyembed-0.2.tar.gz]. I admit it looks overwhelming, there is not much functionality and its spread out over many different files! Be sure to read the [combining everything] section of this tutorial.

There is also a simplified version where all code is in one-file (less than 500 lines of code). Certain vital things has been left out. [main.cpp, test.rb, Makefile]. You may have to adjust the Makefile to your own environment.

Don't Use Wizard Code You Don't Understand [wizards?]. There is a few gotcha's in this code, therefore I have some explaining to do!

An Example

Lets start out with a real-world example (a typical behavier in a texteditor). This sequence-diagram shows the two most vital embedding-operations.

c++ calling ruby.
ruby calling c++.

Figure 1. Do some calls + returns

The following c++code is for the lefthand side (USER). Observe that the View class is undeclared, this is covered in the next section.

class ViewQT : public View {
public:
    void repaint() {
        cout << "repaint!" << endl;
    }
};

int main() {
    ViewQT v;
    v.insert();
    return 0;
}

The following ruby-code is for the righthand side (RUBY). Observe that the Redirect class is undeclared, this is covered in the next section.

class RubyView < Redirect
    def insert
        repaint
    end
end

Executing the above c++ code should output “repaint!” to stdout.

Calling in both directions (c++ <-> ruby) is important. But in order to do this some glue-code is necessary.

A Wormhole Between Two Worlds

Connecting those two different worlds is what this tutorial is all about! The following diagram illustrates the necessary classes (template method pattern). Do not get too scared :-)

Figure 2. Class diagram

ViewQT is a c++ class which has implemented the repaint function. Its responsible for rendering itself as an QT widget.

RubyView is a ruby class which has implemented the insert function. Its responsible for inserting text into the buffer. When the text has been inserted it invokes repaint.

View serves as a baseclass for all c++ frontends. self holds one instance of the RubyView class.

Redirect serves as a baseclass for the ruby backend. With help of this class we are able to call the overloaded virtual function repaint in ViewQT. This class is memory-managed by ruby. Here we have a SWIG candidate!

Troubleshooting

If trouble strikes you might try these options:

Do a quick browse through this tutorial plus the supplyed code (the tar.gz file).
Checkout the mentioned projects in the "Resources" section.
google
If you could not find anything about it, then post it on the [comp.lang.ruby] newsgroup.

Embedding Concepts

Talking about concepts you will need to know if you plan to do embedding.

Init/Cleanup

In order to use ruby; We must as our first thing do proper initialization, and when done we perhaps want ruby to cleanup itself.

The usual startup procedure:

ruby_init(); initializes the interpreter.
ruby_init_loadpath(); initializes the $: variable ($LOAD_PATH). It will be initialized with the content of the RUBYLIB unix-environment variable, plus the location of site_ruby and the current directory. This path is searched whenever you invoke either require or load. Therefore is ruby_init_loadpath very necessary if you plan to load some modules.
ruby_script(name); assigns the name of this script to the $0 variable. This can be useful if your script depends on your embedded application.
```
if $0 != "embed"
    puts "WARNING: this script is supposed "\
        "to be run only from 'embed'"
end
```
Sometimes you will see this script name used in backtraces. If you leave out ruby_script then the $0 variable will be "false".

Less importance: There exist a couple of extra initialization functions (ruby_options). Is there others?

Defining our own environment:

Making the c++ Redirect class visible to ruby. This is what SWIG can help us with, see [Combining Everything].
Load the ruby script, which defines a RubyView class which is inheirited from the afore-mentioned Redirect class.
Create an instance in c++ of the RubyView class.

We are up running. Here is our typical activities:

Wrap every rb_funcall() into an rb_protect() so we ourselfes can deal with exceptionhandling.
Once in a while do some garbage collection. If we do GC frequently, then we can early detect memory bugs.

Finaly teardown everything we build:

Kill childprocesses.
ruby_finalize(); clean up (garbage collection) and shutdown the interpreter.

Stay Alive - Use Protection

Warning

If an error occurs inside ruby without being encapsulated inside a rb_protect(). Then ruby will call exit() which terminates your program.

Warning

If hard errors occurs inside ruby, like segmentation fault, then a SIGABRT signal will be raised. You might find it necessary to install your own signal handler for this [deal with signals]. I have done a little research about this which you can see in the [Combining Everything] section.

So how can we secure us against this?

VALUE rb_protect((*proc), arg, error); VALUE (*proc)(VALUE); VALUE arg; int *error;

An example of such encapsulation could be following:

VALUE require_wrap(VALUE arg) {
    return rb_require("test");
}

VALUE require_protect() {
    int error;
    VALUE result = rb_protect(
        require_wrap, 0, &error);
    if(error) 
        throw;
    return result;
}

This will protect you from bad things:

You can catch if abort, exit or raise occurs in the ruby script.
If the ruby code is malformed.
If the filename you had required cannot be found.

error tells us if it were succesfully. Next section I will discuss how to deal with failures.

Exception Handling

The error value returned from rb_protect, is zero if everything is OK. Otherwise there is something wrong which needs to be dealed with!

The following code will translate ruby-exceptions into c++exceptions.

void ThrowOnError(int error) {
    if(error == 0)
        return;

    VALUE lasterr = rb_gv_get("$!");

    // class
    VALUE klass = rb_class_path(CLASS_OF(lasterr));
    clog << "class = " << RSTRING(klass)->ptr << endl; 

    // message
    VALUE message = rb_obj_as_string(lasterr);
    clog << "message = " << RSTRING(message)->ptr << endl;

    // backtrace
    if(!NIL_P(ruby_errinfo)) {
        std::ostringstream o;
        VALUE ary = rb_funcall(
            ruby_errinfo, rb_intern("backtrace"), 0);
        int c;
        for (c=0; c<RARRAY(ary)->len; c++) {
            o << "\tfrom " << 
                RSTRING(RARRAY(ary)->ptr[c])->ptr << 
                "\n";
        }
        clog << "backtrace = " << o.str() << endl;
    }
    throw runtime_error("ruby_error");
}

The other way around, translating c++exceptions into ruby-exceptions is also possible [discussion]. Do you want backtraces? [unix, windows].

#define RUBY_TRY \
    extern VALUE ruby_errinfo; \
    ruby_errinfo = Qnil; \
    try

#define RUBY_CATCH \
    catch(const std::exception &e) { \
        std::ostringstream o; \
        o << "c++error: " << e.what(); \
        ruby_errinfo = rb_exc_new2( \
            rb_eRuntimeError, o.str().c_str()); \
    } catch(...) { \
        ruby_errinfo = rb_exc_new2( \
            rb_eRuntimeError, "c++error: Unknown error"); \
    } \
    if(!NIL_P(ruby_errinfo)) { \
        rb_exc_raise(ruby_errinfo); \
    }

Memory management

Many answers regarding memory management can be found here [GC + MM].

ruby uses a garbage collection technique called mark-and-sweep, see [RubyGarden, IBM, GC-FAQ].

Figure 3. Sometimes death needs to clean up

If you are holding ruby instances within c++, you must tell rubys garbage collector, that you are busy using them. Otherwise GC will destroy them, because they seems to be unused!

There is 2 ways to tell GC that a variable is busy. You can either choose to export or not to export your variable.

Exported variables

Variables which has names and which is fully shared between c++ and ruby.

Well covered in [Programming Ruby/Extending]. Especialy take a look at the following functions:

rb_define_variable(name, object)
rb_define_class_variable(class, name, object)

Non-exported variables

Unnamed variables which is being managed by ruby. They cannot be accessed from .rb files, because they are nameless (thus non-exported).

rb_gc_register_address(VALUE *var) Tells ruby that we want to use var, by adding the variable to the list of busy objects. Objects contained in the busy list will get marked during GC and is therefore spared from mass destruction. The rb_global_vaiable function is an alias.

rb_gc_unregister_address(VALUE *var) Tells ruby that var is no longer in usage, by removing the variable from the list of busy objects. Later at some point we can start GC. When GC is done with its mark-phase, this variable will be left unmarked. Finaly during GC's sweep-phase, this variable and its unmarked childrend will get destroyed.

Speed issues

The rb_gc_register_address function is calling ALLOC every time and is thus very slow. Lucky us we have a faster alternative: keeping track of all our objects in an array (or hash). This reduces the number of allocations, which give us speed.

class Objects {
private:
    VALUE objects;
public:
    Objects() {
        objects = rb_ary_new();
        rb_gc_register_address(&objects);
    }
    ~Objects() {
        // dispose array and flush all elements
        rb_gc_unregister_address(&objects);
        /*      
        mass destruction. GC can no longer
        mark the elements in the Array and 
        therefore they will all get swept.
        */
    }
    void Register(VALUE object) {
        rb_ary_push(objects, object);
    }
    void Unregister(VALUE object) {
        rb_ary_delete(objects, object);
    }
};

Speed is the only difference in behavier, between Register and rb_gc_register_address.

Array.delete is slow (linear time) compared to Hash.delete (constant time). Having many elements, then a Hash can give you better performance. Both VIM and mod_ruby uses hashes.

Besides globals there exist locals. Here is an example:

Figure 4. Assist GC during the mark phase

Comments to this class diagram:

The Zoo class owns one instance of Cats. The Cats class owns one instance of Puma.
The Cats class is c++ code. The other classes is ruby code.
This is not an example of embedding!

Because Cats is owned by the ruby class (Zoo), we say that Cats is managed by ruby. Its ruby who fully controls when to do creation/destruction.

The mark function must be supplyed during initialization of Cats.

VALUE cats_alloc(VALUE klass) {
    return Data_Wrap_Struct(
        klass, 
        cats_mark, 
        cats_free, 
        new Cats()
    );
}

rb_gc_mark() tells ruby which objects who is active. Finaly ruby will destroy all non-active objects.

Here is a brief text [RubyDoc], on usage of Data_Wrap_Struct.

Destruction

Warning

Watch out about your destructors, they will not necessarily get invoked when you object goes out of scope. Your objects will get destroyed when GC kicks in.

This has several consequences:

Avoid reference counting. Destructors which unref the object will not work correct, because of the possible delayed destruction.

Instead of having a dtor, consider using a close function, see [dtors-discussion, c++ ruby comparison].

Consider in your dtor to output a warning, if no invokation of the close function had happened.

Remaining questions:

How to compensate for this dtor-delay? [dtor-suggestions].
Are finalizers able to help us? [finalizers].

Questions + answers

1. What do I do when I want to free something - just stop marking?
2. What happens if GC runs between [objects = rb_ary_new()] and rb_gc_register_address(&objects); ? Or can this never happen?
3. ALLOC versus new/delete, what is the difference?

1.	What do I do when I want to free something - just stop marking?
	Yes, exactly. You either have to issue that `unregister` function which matches the one you used for declaring the variable. Or you have to stop invoking `rb_gc_mark`.
2.	What happens if GC runs between [objects = rb_ary_new()] and rb_gc_register_address(&objects); ? Or can this never happen?
	Yes, AFAIK this can happen, but only in special cases (TODO which cases?). And No, you are in control of when GC should run.
3.	`ALLOC` versus `new`/`delete`, what is the difference?
	[answer].

My question to you.. can I improve this section?

Multi-Threading

Threads in ruby can be used as usual.

Thread.new do
    loop do
        puts "."
        sleep 0.5
    end
end

The only thing you should care about is: In libc, the setitimer() function plus the SIGVTALRM signal is used by ruby to schedule threads. Messing around with these things, will get you in trouble.

ruby is not thread-safe. If you embed ruby into a multi-threaded application then you will need to semaphore protect all access to ruby. Warning: Insufficient semaphore protection can be hard debuggable.

Hints

Useful hints, what to do, and not to do.

Do not use Data_Make_Struct, it allocates a C struct!
always encapsulate your ruby code in a rb_protect() wrapper.
ruby instances owned by c++. Tell GC about such relations. Otherwise a segfault will occur (most likely).
frequent execution of GC can early reveal problems.
NOT use ruby_run() it calls exit() and does not return to your program.

Resources

Getting started in a rush? My best advice is to look at the ruby source itself, plus seek inspiration in projects which successfully has embedded ruby.

Primary
The Pragmatic Programmer's Guide - Extending Ruby	Long tutorial on how to combine C and Ruby. Basic concepts, sharing data, wrapping structures, extconf.rb usage, embedding, the ruby API.
The Ruby Interpreter	`README.EXT` descripes how to make your own extensions. most of the interesting functions is implemented in `eval.c` (rb_protect, backtrace, rb_funcall).
VIM (VI iMporved)	`if_ruby.c` reveals how ruby has been embedded into VIM. This is a nice implementation which is easy to grasp.
gimp-ruby	gimp plugin which embeds ruby. See `embed/rubymod.c` for a fairly good implementation. Watch out this software is GPL.
mod_ruby	apache plugin which embeds ruby. This is a bit complicated, Thus you must take a look at VIM before looking at this. Supports different levels of safety.
SWIG (Simplified Wrapper and Interface Generator)	If you have a c/c++ library which you wanna use in a scripting language(ruby, python..) then SWIG will create the necessary wrapper almost automaticaly.

Projects which is less educational (secondary).

Secondary
ruby++	A nice c++ wrapper around ruby. Its pretty educational to see how things is done behind the scene.
exerb [windows]	Bundles your ruby code with the ruby interpreter into a single windows `.EXE` file. Compression is possible. `exerb.cpp` is the interesting part here, how to do setup, execute, teardown.

If you can recommend other resources about embedding I would be glad to add it.

Combining Everything

Problem: The source code for the Redirect class quickly becomes messy and confusing to maintain. Solution: relax with SWIG.

I have made a skeleton project [rubyembed-0.2.tar.gz] which you can use for free! Its almost the same code as we know it from [Simple Wrapper], here its just splitted up into several files. I must admit that this looks overwhelming for so little functionality, but hey - we are now holding tremendous power in our hands.

Comments on the tar.gz file:

Its scalable. You can easily add new classes & functions. This is actual the main purpose of this project.
Its easy. See redirect.cpp on how things works with & without SWIG. Imagine that you should add some functions, rename stuff. Without SWIG you would have to write a bunch of code. SWIG can automaticly create our necessary code, so with SWIG you don't have to write anything :-)
Its hidden. In library.h observe that ruby.h is not included and thus not poluting your namespace. How did I manage to hide ruby completely? Well, I used the pimpl idiom [gotw].

Don't Use Wizard Code You Don't Understand [wizards?]. There is a few gotcha's in the code, so I have some explaining to do. Lets have a look.

Cheating = Success

I think usage of global variables is nasty, but thats just me. In order to create an instance of RubyView, we must use such dirty tricks.

Impl(View *parent) : parent(parent) {
    // create instance
    Redirect::SetView(parent);
    self = RUBY_CPP::New("RubyView");

    // tell GC
    objects->Register(self);
}

Creating an instance. How is this code working?

Figure 5. Transfering arguments (dirty)

In an perfect world, we would have passed the parent variable to RubyView.new which then would pass it further to Redirect.ctor. But the world is unfortunatly not perfect and therefore its sometimes necessary to use global variables.

Why is it necessary to transfer arguments in such ackward way? todo:

Minor Issues

Less importance but nice to know.

Observe that in the redirect.h file, there is not used namespaces. Why? This is because we don't want to make it too hard getting SWIG's wrapper (redirect_wrap.cpp) to play with redirect.h. Im just assuming that this could result in problems, I havn't tried using that namespace.

I want to illustrate that its possible to either use SWIG or to do everything manualy. For this purpose I have introduced the EMBEDRUBY_SWIG define. Its only used a few places and should be easy to remove :-)

> grep EMBEDRUBY_SWIG *
Makefile.am:test_CXXFLAGS = -DEMBEDRUBY_SWIG=0
Makefile.am:testswig_CXXFLAGS = -DEMBEDRUBY_SWIG=1
library.cpp:#ifdef EMBEDRUBY_SWIG
library.cpp:    const bool use_swig = EMBEDRUBY_SWIG;
redirect.cpp:#if EMBEDRUBY_SWIG != 0
>

I don't like having signal handling in the main.cpp file. This thing was supposed to be encapsulated in the library. Maybe I will make an attempt to fix it someday.
I could'nt resist; I have made a SIGABRT 2 exception wrapper, see [signals_branch]. Its a nice hack which uses setjmp/longjmp. Still I need to fix few issues before its really usable. todo: On SIGABRT we want a coredump and then continue execution? todo: redirect to other SIGABRT handlers if this is possible?
In the test.rb file you can see a place saying Embed::Redirect. This Embed module annoyes me (cosmetic detail) and I would like get rid of it. Observe that SWIG encapsulates the Redirect class in a module, named Embed. I hav'nt yet decided with myself whether or not this is good or bad (I think its most bad). SWIG generates a initialization function named Init_%module(). Imagine that you have many of these initialization functions, each with its own module-name. Problem: In the ruby code this results in many module names.
I must find out if SWIG has an option for disabling this module-namespace thing. I asked this question [avoiding the module name] and I think Lyle is working on adding this feature to SWIG.

What Now

You might want to extend/adjust this code further for your own requirements. I will try to cover how to add new features.

1. Adding a virtual function, how?
2. Adding a ruby-function, how?
3. Adding new classes?

1.	Adding a virtual function, how?
	First look at how the `repaint`() function is done. You will need to add your `new_function`() to the `Redirect` class in both `redirect.h` and `redirect.i`.
2.	Adding a ruby-function, how?
	First look at how the `insert`() function is done. You will need to add your `new_function`() to the `RubyView` class located in `test.rb`. Next you must add a function to the `Impl` class which wraps your call from c++ into ruby. Finaly you must add a function-prototype to the `View` class which just passes the call further to the `Impl` class.
3.	Adding new classes?
	You may have observed that this code is only providing a wormhole for the `View` class and that `Redirect` is just a helper class. It is spread out over several files: `redirect.i`, `redirect.h`, `redirect.cpp`, `library.h`, `library.cpp`. Note Can this huge number of files be reduced? Yes, `redirect.cpp` can be joined with `library.cpp`. If we consider using SWIG then the remaining files cannot be joined. todo

Standing On The Shoulder Of Giants

Most people think that SWIG can be used ONLY for ruby extensions. But if you are clever you can actual use it for embedding.

Figure 6. The C++ giant looks tired

I hope you have made it this far without too many bumps on the road. Now enjoy life :-)

Over and out - Simon Strandgaard.

Appendix - Capture Output From Ruby

Redirect rubys output elsewhere can be useful: perhaps to a logfile with a timestamp attached, perhaps to a GUI-statusbox or perhaps you just want ruby to stay silent and not interfere with the console output of your application!

There is 2 ways to accomplish this. First solution could be to run ruby in a child-process and use memory-mapping between parent and child [Quarantine for untrusted applications]. Second solution could be to incorporate ruby into the same process as the main application. The second approach is what this section is about.

Complete seperation of rubys output from c++ is not trivial. I hope that this area will improve in the future. I proposed that this area should be simplified [RCR for child execution] but nobody liked the idea. Maybe I re-raise the issue when there is people to support me :-)

All rubys output functions (puts, p, print, printf) pass their output through the IO.write(text) function. By overloading this function we can gain full control of rubys output.

class CaptureOutput < IO
    def initialize
        super(2)  
    end
    def write(text)
        # send text to logfile
    end
end

todo: Im confused about initialization of the IO class ('2' is stderr). I don't know the "right way" to do it... yet.

How is this CaptureOutput class suppose to be used. Well.. a ruby example could look like this:

def capture
    raise unless block_given?
    dout, serr, sout = $defout, $stderr, $stdout
    buf = CaptureOutput.new
    begin
        $defout = buf
        $stderr = buf
        $stdout = buf
        yield
    ensure
        $defout, $stderr, $stdout = dout, serr, sout
    end
end

capture {
    print "42"
}

todo: I have some trouble capturing output from the system call. It seems to be non-trivial to do IO with child processes?

todo: Im searching for resources [Capture Output, More Capture, talk, overloading all output methods, more overloading, more].

Of course this concept has to be tweeked a bit before its really usable. todo: im working on this [sandbox.rb, test_sandbox.rb].

TODO

Debugging techniques - Can I stepBYstep debug this c++/ruby code? Breakpoints in the ruby code? How to use electricFence/Valgrind.

Backport ideas from the tar.gz file to the simple example.

Redirection of stderr, stdout. An insecure sandbox, if people want to break loose they can. Pipe is a unix-thing, how to do on windows?

How to frequent invoke GC?

Fill some more text on the "Combining Everything" Section.

load ruby DLL/SO-library only when needed.

Windows build issues (IMPORT NT=1). I have no makefile for windows yet.

SAFE, ruby has different levels of paranoia.

longjump issues if such exist? I think I read a posting on this subject.

Do some more analysis on rubys multithreading (fork). Maybe run the interpreter in its own process? How to kill zombie processes.

Integrating ruby into an existing multithreaded application.

benchmark section, measure responsiveness, memory usage. discuss how to improve responsiveness.

unittest, we wanne be sure it really works.

If something is missing, not covered sufficiently ... Then please post the issue on this [forum] and I will look at it.

NEWS

27-april-2003: Rewrote/rearranged the [Overview] section. I hesitated too much, found that extra parameter for xsltproc which enables CSS stylesheets!
17-april-2003: Version 0.2 is out [rubyembed-0.2.tar.gz, Changelog].
15-april-2003: Inheiritance from a SWIG class is now possible! Thanks to "Steve Hart" for pointing that out. See the change here [diff 1.3/1.4], still I would like to know what exactly the difference is? I asked this question at the comp.lang.ruby, see [question] and I got lots of answers. Im looking forward to SWIG version 1.3.20 where this problem hopefully should have been fixed. Thanks ruby fellows.
04-april-2003: Version 0.1 is out [rubyembed-0.1.tar.gz].

ruby embedded into c++

Tutorial, Guide, Howto

Simon neoneye Strandgaard

Overview

Important

Download

An Example

A Wormhole Between Two Worlds

Troubleshooting

Embedding Concepts

Init/Cleanup

Stay Alive - Use Protection

Warning

Warning

Exception Handling

Memory management

Exported variables

Non-exported variables

Speed issues

Destruction

Warning

Questions + answers

Multi-Threading

Hints

Resources

Combining Everything

Cheating = Success

Minor Issues

What Now

Note

Standing On The Shoulder Of Giants

Appendix - Capture Output From Ruby

TODO

NEWS