Optimise your ROS snap – Part 4

gbeuzeboc

on 21 April 2023

Tags: robotics , ROS , ROS 2 , Snap , snapcraft

This article was last updated 1 year ago.


Welcome to Part 4 of our “optimise your ROS snap” blog series. Make sure to check Part 3 before. This fourth part is going to explain what dynamic library caching is. We will present how to use it to optimise ROS snaps, along with the different points to be careful about. Finally, we will apply it to our gazebo snap and measure the performance impact.

Snaps are immutable. This means that every time we launch it, the snap is going to execute the exact same instructions and strategies. A Linux system is meant to evolve over time, thus, it uses mechanisms to support these evolutions and modularities. While such mechanisms bring reliability to a system, they can also slow down our processes during launch.

Dynamic library caching with ld-cache

Here we are addressing a more advanced optimisation topic. The topic of dynamic library caching for snap has been discussed and explored in the forum. We are going to summarise what it is, apply it to our ROS snap and measure the results.

When our program loads a dynamic library, it must find it first. The first way to look for a library is through rpath. rpath are library locations stored directly into the binary at build time. If not found on rpath, we need to look for the library in each of the directories listed in the LD_LIBRARY_PATH environment variable. In the case at hand, there are 148 libraries used by Gazebo, and each of them has to be potentially searched in 17 different paths on LD_LIBRARY_PATH. The third mechanism is runpath similar to rpath, but it can point toward directories. And finally, the last mechanism is to go through the cache file located in /etc/ld.so.cache. The cache is essentially a lookup table of dynamic libraries filenames and their known locations.

rpath can be modified, but the binary must be writable, which is not the case of snaps. rpath is considered deprecated in favour of runpath. runpath similarly cannot be modified because our binaries are non-writable. The idea then is to fill the cache with every library that is available in our snap, and setting the LD_LIBRARY_PATH to an empty string. This way we will avoid the waste of time to look for the library paths at every launch.

Optimisation

The first problem is that we cannot modify the /etc/ld.so.cache in our snap, since our snap is immutable. To overcome this, we will use a layout to bind calls from /etc/ld.so.cache to $SNAP_DATA a directory writable by our snap.

We declare the layout in our snapcraft.yaml as follows:

layout:
  /etc/ld.so.cache:
	bind-file: $SNAP_DATA/etc/ld.so.cache

Now we will need two scripts. One to build our cache, and another to check that the cache is valid. We will store both scripts inside the snap/local directory.

The build-cache.sh script is the following:

#!/bin/bash -e
# Since this will be called by a hook, this script won’t have our application LD_LIBRAR_PATH
LD_LIBRARY_PATH="/snap/gazebo/current/opt/ros/snap/lib:/snap/gazebo/current/opt/ros/foxy/opt/yaml_cpp_vendor/lib:/snap/gazebo/current/opt/ros/foxy/lib/x86_64-linux-gnu:/snap/gazebo/current/opt/ros/foxy/lib:/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/gazebo/current/lib:/snap/gazebo/current/usr/lib:/snap/gazebo/current/lib/x86_64-linux-gnu:/snap/gazebo/current/usr/lib/x86_64-linux-gnu:/snap/gazebo/current/kf5/lib/x86_64-linux-gnu:/snap/gazebo/current/kf5/usr/lib/x86_64-linux-gnu:/snap/gazebo/current/kf5/usr/lib:/snap/gazebo/current/kf5/lib:/snap/gazebo/current/kf5/usr/lib/x86_64-linux-gnu/dri:/var/lib/snapd/lib/gl:/snap/gazebo/current/kf5/usr/lib/x86_64-linux-gnu/pulseaudio"

# run ldconfig on our LD_LIBRARY_PATH lib dirs
IFS=':' read -ra PATHS <<< "$LD_LIBRARY_PATH"
mkdir -p "$SNAP_DATA/etc"
ldconfig -v -X -C "$SNAP_USER_DATA/snap-ld.so.cache" -f "$SNAP_DATA/etc/ld.so.conf" "${PATHS[@]}"
# replace the generated ld.so.cache with the one pointed by the bind
cat "$SNAP_USER_DATA/snap-ld.so.cache" > "$SNAP_DATA/etc/ld.so.cache"

The check-cache.sh script will make sure that all our dependencies are properly found. The reason is that the cache is going to be built at install and update steps (via hooks), but because of the content sharing snap containing a lot of libraries, something could break. In the case something is indeed broken, we simply launch Gazebo using the old method. Gazebo is based on a plugin system, hence most of the libs loaded at runtime are not dynamically linked. This means that all the plugins are unknown at build time and will be searched on the fly at every run. Additionally, everything in Gazebo is launched from a Ruby script that selects which library to load depending on the given command. For that reason, we decided to check the file $SNAP/opt/ros/snap/lib/libignition-gazebo3-gui.so since it’s the most likely to change due to the Qt content sharing snap.

The check-cache.sh script is the following:

#!/bin/sh -e
# save the original LD_LIBRARY_PATH, and unset it to check the cache
ORIGINAL_LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=""
BINARY_TO_TEST="$SNAP/opt/ros/snap/lib/libignition-gazebo3-gui.so"
if [ -z "$BINARY_TO_TEST" ]; then
  echo "BINARY_TO_TEST unset, can't check the dynamic linker cache for correctness"
else
  if ldd "$BINARY_TO_TEST" | grep "=> not found" | grep -q "=> not found"; then
    # We cannot regenerate the cache because we must be root.
    # So we use the LD_LIBRARY_PATH until the next hook is triggered
    export LD_LIBRARY_PATH="$ORIGINAL_LD_LIBRARY_PATH"
  Fi
fi
# execute the next command in the chain
exec "$@"

As mentioned, the /etc/ld.so.cache can only be modified by root. Here we decided to simply not use the dynamic library caching in case the ld.so.cache is not complete. We could certainly think about adding an app entry to run as root to regenerate the cache.

The build-cache is going to be called at install or update and the check-cache simply in the command-chain of the snap application. Thus, we will add a ld-cache part to our snapcraft.yaml for the hooks and add the corresponding command-chain to our Gazebo part.

ld-cache:
  after: [kde-neon-extension]
  plugin: nil
  source: snap/local
  override-build: |
    KDE_CONTENT_SNAP=$(echo $SNAPCRAFT_CMAKE_ARGS | sed -n 's/.*\/snap\/\(.*\)-sdk.*/\1/p')
    mkdir $SNAPCRAFT_PART_INSTALL/bin cp *.sh $SNAPCRAFT_PART_INSTALL/bin
    mkdir -p $SNAPCRAFT_PART_INSTALL/snap/hooks
    # post refresh hook triggered at install and update
    ln -s ../../bin/hook.sh $SNAPCRAFT_PART_INSTALL/snap/hooks/post-refresh
    ln -s ../../bin/hook.sh $SNAPCRAFT_PART_INSTALL/snap/hooks/connect-plug-$KDE_CONTENT_SNAP
    ln -s ../../bin/hook.sh $SNAPCRAFT_PART_INSTALL/snap/hooks/disconnect-plug-$KDE_CONTENT_SNAP

And for the command chain:

gz:
+ command-chain: [bin/check-ld-cache.sh]
  command: usr/bin/ruby $SNAP/opt/ros/snap/bin/ign
  plugs: [network, network-bind, home]
  extensions: [kde-neon, ros2-foxy]

Rebuilding the snap and launching it, we stumble upon an unexpected error:

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError' [gazebo.gz-1] what(): failed to initialize rcl init options: failed to find shared library 'rmw_fastrtps_cpp', at /tmp/binarydeb/ros-foxy-rmw-implementation-1.0.3/src/functions.cpp:73, at /tmp/binarydeb/ros-foxy-rcl-1.1.14/src/rcl/init_

The problem is that for now, rcl-init from ROS 2 is only looking into LD_LIBRARY_PATH for libraries and not following the library loading standard. The issue has been reported and is still opened.

It basically means that for Gazebo it’s fine, but ROS 2 libraries are still going to need an LD_LIBRARY_PATH pointing to /snap/gazebo/current/opt/ros/foxy/lib that will surely impact the performance of our optimisation.

We then have to modify our check-cache.sh script:

- export LD_LIBRARY_PATH=""
+ export LD_LIBRARY_PATH="/snap/gazebo/current/opt/ros/foxy/lib"

Results

Now we can build our snap and run it.

In case we want to have a look at the content of our generated ld.so.cache it is possible with:

ldconfig -p /var/snap/gazebo/current/etc/ld.so.cache

The results of this optimisation are the following:

Gazebo snapCold startHot startRTF.snap sizeInstalled snap size
Release6.062.724.39232 M758 M
Cleanup content sharing duplicates6.032.764.29119 M427 M
Ld-cache6.072.743.96119 M 427 M
Ld-cache with empty LD_LIBRARY_PATH6.032.39NA119 M 427 M

Due to the limitation from rcl-init we see no benefit from the dynamic library caching. Trying to launch Gazebo (it will crash) with an empty LD_LIBRARY_PATH we see a small improvement of ~400ms, but we cannot really know if it’s due to the library caching or because simply not everything could be loaded.

Unfortunately, dynamic library caching cannot be recommended for ROS 2 at the moment. This optimisation might be interesting for other projects though, but the cost of maintenance of such custom scripts might be too high compared to the small benefit. This won’t be applied to the Gazebo snap.

Conclusion

This optimisation was not one of the simplest we have seen so far. A ROS snap might rely on a lot of different dynamic library mechanism (dynamic library linking, ROS plugins, Gazebo plugins) making the optimisation tricky. In the case of a more classic C++ application only relying on dynamic library linking, the benefits could be better. Even if this can’t be applied to our ROS snap, at least we explored an interesting topic about dynamic libraries.

Continue reading Part 5 of this series.

Talk to us today

Interested in running Ubuntu in your organisation?

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Are you building a robot on top of Ubuntu and looking for a partner? Talk to us!

Contact Us

Related posts

Optimise your ROS snap – Part 6

Welcome to Part 6 of our “Optimise your ROS snap” blog series. Make sure to check Part 5. This sixth and final part will  summarise every optimisation that we...

Optimise your ROS snap – Part 3

Welcome to Part 3 of our “optimise your ROS snap” blog series. Make sure to check Part 2. This third part is going to present safe optimisations consisting of...

Optimise your ROS snap – Part 2

Welcome to Part 2 of the “optimise your ROS snap” blog series. Make sure to check Part 1 before reading this blog post. This second part is going to present...