Sending OMA LwM2M CoAP Messages With Quectel BC68

Erkki wrote this on Jun 29th, 2018 5:00 pm

One of the projects we’re currently working on is a battery powered sensor that needs to send small amounts of data, infrequently, and have a battery-life measured in years. We spent quite a bit of time researching LoRa for this, but have recently settled on Narrowband IoT (NB-Iot) instead.

Narrowband IoT (NB-IoT) is a Low Power Wide Area Network (LPWAN) radio technology standard developed by 3GPP to enable a wide range of cellular devices and services. The specification was frozen in 3GPP Release 13 (LTE Advanced Pro), in June 2016. Other 3GPP IoT technologies include eMTC (enhanced Machine-Type Communication) and EC-GSM-IoT.
NB-IoT focuses specifically on indoor coverage, low cost, long battery life, and high connection density. NB-IoT uses a subset of the LTE standard, but limits the bandwidth to a single narrow-band of 200kHz. It uses OFDM modulation for downlink communication and SC-FDMA for uplink communications.

We managed to source a Quectel LTE BC68 NB-IoT Module and started to work on data integration, soon realizing that these Neul (which is a company Huawei acquired in 2014) based NB-Iot modems require you to use special AT commands to communicate with the outside world (as opposed to “normal” modems which would require the host to call in using PPP).

Okay, let’s scan the Quectel BC95-G&BC68 AT Commands Manual. There’s a command for creating UDP sockets, great. Another one for TCP sockets, too. Awesome! And then there’s something mysteriously referred to as the “Huawei IoT Platform”. Wait, what?!?

In TELCO speech it’s called a CDP (which stands for Connected Device Platform). It’s a fancy way to describe a gateway that accepts messages from a mobile device and routes them to your application server. There are a few flavours around, such as:

But what if we want to use our own server? There are a few reasons one would do this, including cost and efficiency, or the fact that the Huawei partner signup website was broken for two weeks. (Actually turns out it just can’t accept Ü as a character in the company name. But I digress.) Nokia CDP on the other hand is only available in the U.S.

Scanning the manual further, there are hints as to what the protocol might be. To set the ip address of the CDP platform, there’s a specific AT command for it.

6.1. AT+NCDP Configure and Query CDP Server Settings
The command is used to set and query the server IP address and port for the CDP server. It is used when there is a HiSilicon CDP or Huawei’s IoT platform acting as gateway to network server applications. The values assigned are persistent across reboots.

And the default port for it is 5683. That’s the port for CoAP:

Constrained Application Protocol (CoAP) is a specialized Internet Application Protocol for constrained devices, as defined in RFC 7252. It enables those constrained devices called “nodes” to communicate with the wider Internet using similar protocols. CoAP is designed for use between devices on the same constrained network (e.g., low-power, lossy networks), between devices and general nodes on the Internet, and between devices on different constrained networks both joined by an internet. CoAP is also being used via other mechanisms, such as SMS on mobile communication networks.

Furthermore, another command confirms CoAP and adds OMA LwM2M:

6.5. AT+QLWULDATA Send Data
The command is used to send data to Huawei’s IoT platform with LWM2M protocol. It will give an code and description as an intermediate message if the message cannot be sent. Before the module registered to the IoT platform, executing the command will trigger register operation and discard the data. Please refer to Chapter 7 for possible values.

OMA LwM2M is a protocol that builds on top of CoAP

OMA Lightweight M2M is a protocol from the Open Mobile Alliance for M2M or IoT device management. Lightweight M2M enabler defines the application layer communication protocol between a LWM2M Server and a LWM2M Client, which is located in a LWM2M Device. The OMA Lightweight M2M enabler includes device management and service enablement for LWM2M Devices. The target LWM2M Devices for this enabler are mainly resource constrained devices. Therefore, this enabler makes use of a light and compact protocol as well as an efficient resource data model. It provides a choice for the M2M Service Provider to deploy a M2M system to provide service to the M2M User. It is frequently used with CoAP

With this we’re ready for experimentation. Let’s fire up the modem, attach it to the network, make it use our own server as the CDP and connect it to Eclipse Wakaama running on the server.

Boot: Unsigned
Security B.. Verified
Protocol A.. Verified
Apps A...... Verified
REBOOT_CAUSE_SECURITY_RESET_PIN
Neul
OK
AT+CFUN=0
OK
AT+NCDP=198.51.100.1
OK
AT+CFUN=1
OK
AT+CGDCONT=1,"IP","APN"
OK
AT+CSCON=1
OK
AT+CEREG=1
OK
AT+CGATT=1
OK
+CEREG:2
+CSCON:1
+CEREG:1
AT+NPING=198.51.100.1
OK
+NPING:198.51.100.1,54,962

Once the modem boots up, we

disable the radio AT+CFUN=0
set the CDP to our server AT+NCDP=198.51.100.1
re-enable the radio again AT+CFUN=1
configure the APN (value depends on your provider) AT+CGDCONT=1,"IP","APN"
enable AT+CSCON=1 and AT+CEREG=1 to monitor network connection and registration status
attach to the network AT+CGATT=1
once we see +CSCON:1 and +CEREG:1 we can send a test ping with AT+NPING=198.51.100.1

AT+QLWSREGIND=0
OK
+CSCON:1
+QLWEVTIND:0

Sending AT+QLWSREGIND=0 initiates the registration process and receiving +QLWEVTIND:0 means registration was successful. We can observe this with tcpdump and lwm2mserver included in wakaama.

20:32:00.417221 IP 198.51.100.2.19000 > 198.51.100.1.5683: UDP, length 119
  0x0000:  4500 0093 0013 0000 fa11 0c0f 92ff b496  E...............
  0x0010:  b99e b303 4a38 1633 007f 137c 4402 2862  ....J8.3...|D.(b
  0x0020:  2862 a813 b272 6411 2839 6c77 6d32 6d3d  (b...rd.(9lwm2m=
  0x0030:  312e 300d 0565 703d 3836 3737 3233 3033  1.0..ep=86772303
  0x0040:  3030 3035 3635 3503 623d 5508 6c74 3d38  0005655.b=U.lt=8
  0x0050:  3634 3030 ff3c 2f3e 3b72 743d 226f 6d61  6400.</>;rt="oma
 0x0060:  2e6c 776d 326d 222c 3c2f 312f 303e 2c3c  .lwm2m",</1/0>,<
  0x0070:  2f33 2f30 3e2c 3c2f 342f 303e 2c3c 2f31  /3/0>,</4/0>,</1
  0x0080:  392f 303e 2c3c 2f35 2f30 3e2c 3c2f 3230  9/0>,</5/0>,</20
  0x0090:  2f30 3e                                  /0>
20:32:00.417622 IP 198.51.100.1.5683 > 198.51.100.2.19000: UDP, length 13
  0x0000:  4500 0029 7ae4 4000 4011 0ba8 b99e b303  E..)z.@.@.......
  0x0010:  92ff b496 1633 4a38 0015 b45e 6441 2862  .....3J8...^dA(b
  0x0020:  2862 a813 8272 6401 30                   (b...rd.0

~/wakaama# ./lwm2mserver
119 bytes received from [::ffff:198.51.100.2]:19000
44 02 28 62  28 62 A8 13  B2 72 64 11  28 39 6C 77   D.(b(b...rd.(9lw
6D 32 6D 3D  31 2E 30 0D  05 65 70 3D  38 36 37 37   m2m=1.0..ep=8677
32 33 30 33  30 30 30 35  36 35 35 03  62 3D 55 08   23030005655.b=U.
6C 74 3D 38  36 34 30 30  FF 3C 2F 3E  3B 72 74 3D   lt=86400.</>;rt=
22 6F 6D 61  2E 6C 77 6D  32 6D 22 2C  3C 2F 31 2F   "oma.lwm2m",</1/
30 3E 2C 3C  2F 33 2F 30  3E 2C 3C 2F  34 2F 30 3E   0>,</3/0>,</4/0>
2C 3C 2F 31  39 2F 30 3E  2C 3C 2F 35  2F 30 3E 2C   ,</19/0>,</5/0>,
3C 2F 32 30  2F 30 3E                                </20/0>

New client #0 registered.
Client #0:
  name: "867723030005655"
  binding: "UDP"
  lifetime: 86400 sec
  objects: /1/0, /3/0, /4/0, /5/0, /19/0, /20/0,

Great! We’ve registered the modem. Let’s try sending data with AT+QLWULDATA

AT+QLWULDATA=4,DEADBEEF
ERROR
+CSCON:1
+QLWEVTIND:0

This is a failure. Not only does it error out, it actually triggers a new registration process. Something is missing. Scanning the docs again, there’s a reference to +QLWEVTIND:3 being sent when:

//IoT platform has observed the data object 19. When the module reports this message, the customer can send data to the IoT platform.

This took me a while to figure out, but became clear after reading more about OMA LwM2M. More specifically, object 19 as specified in OMA LightweightM2M (LwM2M) Object and Resource Registry means LwM2M APPDATA:

This LwM2M object provides the application service data related to a LwM2M Server, eg. Water meter data.

This is further specified in Lightweight M2M – Binary App Data Container.

Things are getting clearer now. For the modem to send out data, it tunnels the data inside object 19 and the server has to subscribe to receiving messages on that object. In lwm2mserver there’s a command for it:

> observe 0 /19/0/0
OK

Observe request and ACK in tcpdump:

20:44:03.122190 IP 198.51.100.1.5683 > 198.51.100.2.61500: UDP, length 19
  0x0000:  4500 002f b3fa 4000 4011 d289 b99e b303  E../..@.@.......
  0x0010:  92ff b498 1633 f03c 001b b466 4401 f4a6  .....3.<...fD...
  0x0020:  0000 0000 6052 3139 0130 0130 622d 16    ....`R19.0.0b-.
20:44:04.232286 IP 198.51.100.2.61500 > 198.51.100.1.5683: UDP, length 10
  0x0000:  4500 0026 0016 0000 fa11 0c77 92ff b498  E..&.......w....
  0x0010:  b99e b303 f03c 1633 0012 8bd3 6445 f4a6  .....<.3....dE..
  0x0020:  0000 0000 6060 0000 0000 0000 0000       ....``........

Meanwhile on the modem side, we’ve received the +QLWEVTIND:3 message and can send data now:

+QLWEVTIND:3
AT+QLWULDATA=3,AA34BB
OK
+CSCON:1

On the lwm2mserver side we can see data coming in:

15 bytes received from [::ffff:198.51.100.2]:61500
54 45 28 66  00 00 00 00  C1 2A FF DE  AD BE EF  TE(f.....*.....

Yay! We can now successfully receive data from the modem. In follow-up posts, we’ll try to figure out the differences between AT+QLWULDATA and AT+NMGS, look at receiving data from the server, and maybe write our own LwM2M server to forward data to MQTT.

Get our Secrets of Successful Web Apps email course for free!

Hosted Mender Getting Started on OSX and Raspberry Pi 3

Erkki wrote this on Feb 12th, 2018 12:00 pm

Having various embedded linux devices around (mostly Raspberry Pi’s), and a few client projects dealing with software updates to remote devices, I’ve become interested in fleet management. More specifically, I wanted:

inventory management
A/B partition updates
integrity checks
signed updates

I’ve heard about resin.io before, and while appealing, the control freak in me wanted something with an open server infrastructrure. I’m also not sold on having docker in production embedded devices (while surely being useful for prototyping and experimentation).

There’s the nerves project, mostly focused around the elixir ecosystem. Something I definitely want to check out in more detail, both to learn more about elixir and for simpler embedded projects.

Then I stumbled onto mender. On first glance, it seems perfect. Let’s take a look, shall we?

Burning the initial image

We’re gonna be roughly following along the mender getting started guide while keeping things OSX compatible.

Instead of running our own server infrastructure (which is nice to have as an option, but not required for initial experimenting), we’ll be using hosted mender. That means we will have to inject our hosted mender token into the initial disk image what we will boot the RPi3 from.

Download the Raspberry Pi 3 disk image from https://docs.mender.io/1.3/getting-started/download-test-images . Decompress and change the file extension to make it palatable for hdiutil.

wget https://d1b0l86ne08fsf.cloudfront.net/1.3.1/raspberrypi3/mender-raspberrypi3_1.3.1.sdimg.gz
gunzip mender-raspberrypi3_1.3.1.sdimg.gz
mv mender-raspberrypi3_1.3.1.sdimg mender-raspberrypi3_1.3.1.img

Verify we have a good image

hdiutil imageinfo mender-raspberrypi3_1.3.1.img

Backing Store Information:
  URL: file:///Users/erkkieilonen/projects/learning/mender/mender-raspberrypi3_1.3.1.img
  Name: mender-raspberrypi3_1.3.1.img
  Class Name: CBSDBackingStore
Class Name: CRawDiskImage
Checksum Type: none
Size Information:
  Total Bytes: 624951296
  Compressed Ratio: 1
  Sector Count: 1220608
  Total Non-Empty Bytes: 624951296
  Compressed Bytes: 624951296
  Total Empty Bytes: 0
Format: RAW*
Format Description: raw read/write
Checksum Value:
Properties:
  Encrypted: false
 Kernel Compatible: true
 Checksummed: false
 Software License Agreement: false
 Partitioned: false
 Compressed: no
Segments:
  0: /Users/erkkieilonen/projects/learning/mender/mender-raspberrypi3_1.3.1.img
partitions:
  partition-scheme: fdisk
  block-size: 512
  partitions:
      0:
          partition-name: Master Boot Record
          partition-start: 0
          partition-synthesized: true
         partition-length: 1
          partition-hint: MBR
          boot-code: 0xFAB800108ED0BC00B0B800008ED88EC0FBBE007CBF0006B90002F3A4EA21060000BEBE073804750B83C61081FEFE0775F3EB16B402B001BB007CB2808A74018B4C02CD13EA007C0000EBFE0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000A31D6D430000
      1:
          partition-name:
          partition-start: 1
          partition-synthesized: true
         partition-length: 24575
          partition-hint: Apple_Free
      2:
          partition-start: 24576
          partition-number: 1
          partition-length: 81920
          partition-hint: Windows_FAT_32
          partition-filesystems:
              FAT16: boot
      3:
          partition-start: 106496
          partition-number: 2
          partition-length: 425984
          partition-hint: Linux_Ext2FS
      4:
          partition-start: 532480
          partition-number: 3
          partition-length: 425984
          partition-hint: Linux_Ext2FS
      5:
          partition-start: 958464
          partition-number: 4
          partition-length: 262144
          partition-hint: Linux_Ext2FS
  burnable: false
Resize limits (per hdiutil resize -limits):
 min   cur     max
1220608   1220608 1245311712

We can see the boot partition, the primary and secondary root partitions and the data partition. Since the root partitions are using ext2fs, and we’re on OSX, we need to install FUSE for macOS along with FUSE-Ext2 to be able to mount and write to these partitions.

FUSE for macOS

Download the OSX package from https://osxfuse.github.io and follow the instructions to install it. Make sure to tick the checkbox for MacFUSE Compatibility Layer. That’s required for FUSE-Ext2 support.

FUSE-Ext2

This gets a little more complicated. Instead of compiling everything from source, like described here, we’re using the excellent Homebrew package manager to install the dependencies and just compile FUSE-Ext2 itself. (we should probably create a formula for FUSE-Ext2 too …)

Install the dependencies

brew install m4 autoconf automake libtool e2fsprogs

Install FUSE-Ext2 itself

git clone https://github.com/alperakcan/fuse-ext2.git
cd fuse-ext2
./autogen.sh
./configure
CFLAGS="-I /usr/local/include -I $(brew --prefix e2fsprogs)/include" LDFLAGS="-L/usr/local/lib -L$(brew --prefix e2fsprogs)/lib" ./configure
make
sudo make install
cd ..

Attach the original mender image

hdiutil attach mender-raspberrypi3_1.3.1.img

/dev/disk2           FDisk_partition_scheme
/dev/disk2s1          Windows_FAT_32                  /Volumes/boot
/dev/disk2s2          Linux
/dev/disk2s3          Linux
/dev/disk2s4          Linux

Since /dev/disk1 is our OSX boot disk, and we have nothing else mounted, /dev/disk2 is the .img file we just attached. Pay attention to use the correct device in case you have more disks attached.

We will have to mount both of the root partitions and edit some files in there.

mkdir $(e2label /dev/disk2s2)
mkdir $(e2label /dev/disk2s3)
mount -t fuse-ext2 -o rw /dev/disk2s2 $(e2label /dev/disk2s2)
mount -t fuse-ext2 -o rw /dev/disk2s3 $(e2label /dev/disk2s3)

Grab your hosted mender token ( top right menu, under My organization) and inject it to the image. Replace <token from hosted mender> with your token.

sed -ibak 's/dummy/<token from hosted mender>/' primary/etc/mender/mender.conf
sed -ibak 's/docker.mender.io/hosted.mender.io/' primary/etc/mender/mender.conf
sed -ibak 's/dummy/<token from hosted mender>/' secondary/etc/mender/mender.conf
sed -ibak 's/docker.mender.io/hosted.mender.io/' secondary/etc/mender/mender.conf

Verify the config file contents

cat primary/etc/mender/mender.conf

{
    "InventoryPollIntervalSeconds": 5,
    "RetryPollIntervalSeconds": 1,
    "RootfsPartA": "/dev/mmcblk0p2",
    "RootfsPartB": "/dev/mmcblk0p3",
    "ServerCertificate": "/etc/mender/server.crt",
    "ServerURL": "https://hosted.mender.io",
    "TenantToken": "<token from hosted mender>",
    "UpdatePollIntervalSeconds": 5
}

Unmount and burn the image and we’re done. Adjust /dev/disk3 to your sdcard device.

umount primary
umount secondary
hdiutil detach /dev/disk2
sudo dd if=mender-raspberrypi3_1.3.1.img of=/dev/disk3 bs=1m && sudo sync
hdiutil detach /dev/disk3

Boot the RPi and check mender dashboard, you should see a new authorization request pop up.

Get our Secrets of Successful Web Apps email course for free!

Enable LTE Roaming on a Mikrotik Router With Huawei ME909u-521 Modem

Erkki wrote this on Jan 31st, 2018 11:00 am

MikroTik by default doesn’t enable roaming when used with a non-local sim card. This puzzled us as everything seemed to be configured correctly but the LTE interface wasn’t getting any ip addresses. This is how to log in to your router and enable roaming.

Log in to the MikroTik box. We’re using the command-line interface via ssh but you could use the web UI too. If you haven’t done this before, check out First time startup

Every router is factory pre-configured with the IP address 192.168.88.1/24 on the ether1 port. The default username is admin with no password.

➜ ssh [email protected]
  MMM      MMM       KKK                          TTTTTTTTTTT      KKK
  MMMM    MMMM       KKK                          TTTTTTTTTTT      KKK
  MMM MMMM MMM  III  KKK  KKK  RRRRRR     OOOOOO      TTT     III  KKK  KKK
  MMM  MM  MMM  III  KKKKK     RRR  RRR  OOO  OOO     TTT     III  KKKKK
  MMM      MMM  III  KKK KKK   RRRRRR    OOO  OOO     TTT     III  KKK KKK
  MMM      MMM  III  KKK  KKK  RRR  RRR   OOOOOO      TTT     III  KKK  KKK

  MikroTik RouterOS 6.39.2 (c) 1999-2017       http://www.mikrotik.com/

[?]             Gives the list of available commands
command [?]     Gives help on the command and list of arguments

[Tab]           Completes the command/word. If the input is ambiguous,
                a second [Tab] gives possible options

/               Move up to base level
..              Move up one level
/command        Use command at the base level

[admin@MikroTik] >

We checked the LTE interface and realized it is not joining any networks. If you can’t see the LTE interface/modem at all, you need to enable the mini-PCIe interface.

[admin@MikroTik] /interface lte info lte1 once
     pin-status: no password required
  functionality: full
   manufacturer: Huawei Technologies Co., Ltd.
          model: ME909u-521
       revision: 12.636.12.01.00
           imei: <redacted>
           imsi: <redacted>
           uicc: <redacted>

With the help of the AT Command Interface Specification for the HUAWEI ME906s LTE M.2 Module we can see roaming status is queried and set using the AT^SYSCFGEX command.

To check existing roaming status, we need to look at the third parameter returned. Notice that we’re escaping the question mark ? with the backslash character \. This is because the command line interface interprets ? as the help command.

[admin@MikroTik] > /interface lte at-chat lte1 input="AT^SYSCFGEX\?"
  output: ^SYSCFGEX: "00",3FFFFFFF,0,1,7FFFFFFFFFFFFFFF

OK

`AT^SYSCFGEX=?`
Possible Response(s)
`<CR><LF>^SYSCFGEX`: (list of supported `<acqorder>`s),(list of supported (`<band>`,`<band_name>`)s),(list of supported `<roam>`s),(list of supported `<srvdomain>`s),(list of supported (`<lteband>`,`<lteband_name>`)s)`<CR><LF><CR><LF>OK<CR><LF>`

`<roam>`: indicates whether roaming is supported.
0 Not supported
1 Supported
2 No change

0 here means roaming not enabled. Lets set it to 1 instead (notice that we’re keeping all the rest of the parameters unchanged from the output of the previous command).

[admin@MikroTik] > /interface lte at-chat lte1 input="AT^SYSCFGEX=\"00\",3FFFFFFF,1,1,7FFFFFFFFFFFFFFF,,"
  output: OK

Query the status again.

[admin@MikroTik] > /interface lte at-chat lte1 input="AT^SYSCFGEX\?"
  output: ^SYSCFGEX: "00",3FFFFFFF,1,1,7FFFFFFFFFFFFFFF

Boom, roaming enbled. Verify with lte info to make sure we’re registered to a network.

[admin@MikroTik] /interface lte info lte1 once
         pin-status: no password required
      functionality: full
       manufacturer: Huawei Technologies Co., Ltd.
              model: ME909u-521
           revision: 12.636.12.01.00
   current-operator: EE Elisa
                lac: 24
     current-cellid: 256268822
  access-technology: Evolved 3G (LTE)
     session-uptime: 1m10s
               imei: <redacted>
               imsi: <redacted>
               uicc: <redacted>
               rssi: -80dBm
               rsrp: -106dBm
               rsrq: -7dB
               sinr: 18dB

One last thing we need to do is to persist this across (router or modem) reboots. For this we have to set a ‘modem-init’ that gets excuted every time the modem is started.

/interface lte set lte1 modem-init="AT^SYSCFGEX=\"00\",3FFFFFFF,1,1,7FFFFFFFFFFFFFFF,,"

To verify

[admin@MikroTik] > :put [/interface lte get lte1 modem-init]
AT^SYSCFGEX="00",3FFFFFFF,1,1,7FFFFFFFFFFFFFFF,,

Get our Secrets of Successful Web Apps email course for free!

Getting Vagrant VMware Plugin Installed on OS X El Capitan and Homebrew

Jarkko wrote this on Oct 26th, 2015 11:18 am

I finally decided to bite the bullet this morning and install VMware Fusion 8 and the corresponding Vagrant plugin.

Both were paid upgrades (41.42€ + $39 from Fusion 7), which was a bit bitter given I had only had the previous versions for a couple of months. Yet, the word on the street was that the new version would be much more stable and less cpu-hungry than the previous generation. So what the heck, maybe I’d again be able to get more than a couple hours of productive work done without draining the battery.

The purchase process itself was painless, as was installing VMware itself. But when I started installing the plugin, an ugly yak raised its hairy head.

The first thing I tried was to just use the old plugin with the new VMware version. Would it perhaps work?

➜  ~ vagrant suspend
This provider only works with VMware Fusion 5.x, 6.x, or 7.x. You have
Fusion '8.0.1'. Please install the proper version of VMware
Fusion and try again.

That sounds like a resounding “No”.

So onwards: bought a license for the plugin as well and tried to install it.

➜  ~ vagrant plugin install vagrant-vmware-fusion
Installing the 'vagrant-vmware-fusion' plugin. This can take a few minutes...
Bundler, the underlying system Vagrant uses to install plugins,
reported an error. The error is shown below. These errors are usually
caused by misconfigured plugin installations or transient network
issues. The error from Bundler is:

An error occurred while installing hitimes (1.2.3), and Bundler cannot continue.
Make sure that `gem install hitimes -v '1.2.3'` succeeds before bundling.

Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

    /opt/vagrant/embedded/bin/ruby extconf.rb
creating Makefile

make "DESTDIR="


Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.

No biggie, a little Googling revealed what I suspected – I don’t actually need to run this as root, I just need to accept the EULA of the latest XCode version before running the install process again. I popped up XCode, YOLOed the agreement, and was back to the terminal.

➜  ~ vagrant plugin install vagrant-vmware-fusion
Installing the 'vagrant-vmware-fusion' plugin. This can take a few minutes...
Bundler, the underlying system Vagrant uses to install plugins,
reported an error. The error is shown below. These errors are usually
caused by misconfigured plugin installations or transient network
issues. The error from Bundler is:

An error occurred while installing eventmachine (1.0.8), and Bundler cannot continue.
Make sure that `gem install eventmachine -v '1.0.8'` succeeds before bundling.

Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

[LOTS OF CRUFT REMOVED FOR BREVITY]

make "DESTDIR="
compiling binder.cpp
warning: unknown warning option '-Werror=unused-command-line-argument-hard-error-in-future'; did you mean '-Werror=unused-command-line-argument'? [-Wunknown-warning-option]
In file included from binder.cpp:20:
./project.h:116:10: fatal error: 'openssl/ssl.h' file not found
#include <openssl/ssl.h>
         ^
1 warning and 1 error generated.
make: *** [binder.o] Error 1


Gem files will remain installed in /Users/jarkko/.vagrant.d/gems/gems/eventmachine-1.0.8 for inspection.
Results logged to /Users/jarkko/.vagrant.d/gems/gems/eventmachine-1.0.8/ext/gem_make.out

Again, the reason is clear: I need to build the gem against proper openssl libs – in this case, -I/usr/local/opt/openssl/include. Thus:

➜  ~  gem install eventmachine -v '1.0.8' -- --with-cppflags=-I/usr/local/opt/openssl/include
Fetching: eventmachine-1.0.8.gem (100%)
Building native extensions with: '--with-cppflags=-I/usr/local/opt/openssl/include'
This could take a while...
Successfully installed eventmachine-1.0.8
Parsing documentation for eventmachine-1.0.8
Installing ri documentation for eventmachine-1.0.8
Done installing documentation for eventmachine after 6 seconds
1 gem installed

Perfect. So I tried to install the Vagrant plugin again – and got the same error. Crap.

It turns out that Vagrant uses its own gem folder, so it’s not picking up what’s installed in one’s primary gem directory. The issue was, how do I tell Vagrant to use the correct cpp flags in its build process?

Fortunately I didn’t have to figure that out, because the end of the error message above gave me enough of a pointer towards a solution: if I only managed to install the eventmachine gem by hand to the correct location with the proper cpp flags, I should be fine. But how?

I dug up to gem -h and the defaults at the end gave the correct option away: with the --install-dir option I could install the gem to wherever I wanted to. Thus:

~  gem install eventmachine -v '1.0.8' --install-dir /Users/jarkko/.vagrant.d/gems  -- --with-cppflags=-I/usr/local/opt/openssl/include
Fetching: eventmachine-1.0.8.gem (100%)
Building native extensions with: '--with-cppflags=-I/usr/local/opt/openssl/include'
This could take a while...
Successfully installed eventmachine-1.0.8
Parsing documentation for eventmachine-1.0.8
Installing ri documentation for eventmachine-1.0.8
Done installing documentation for eventmachine after 5 seconds
1 gem installed

Et voilá. Now one more try:

➜    vagrant plugin install vagrant-vmware-fusion
…
Installed the plugin 'vagrant-vmware-fusion (4.0.2)'!

…and we’re off to the races!

…well, at least almost.

➜  ~ vagrant up
Bringing machine 'default' up with 'vmware_fusion' provider...
==> default: Verifying vmnet devices are healthy...
==> default: Preparing network adapters...
==> default: Starting the VMware VM...
An error occurred while executing `vmrun`, a utility for controlling
VMware machines. The command and output are below:

Command: ["start", "/Users/jarkko/vmware/955a09a5-f7f6-451e-b565-22f41c8fced0/packer-ubuntu-14.04-amd64.vmx", "nogui", {:notify=>[:stdout, :stderr], :timeout=>45}]

Stdout: 2015-10-26T11:10:45.943| ServiceImpl_Opener: PID 84443
Error: The operation was canceled

Stderr:

Oh well, that wasn’t very helpful. So I tried the proven trick #1: I killed the VMware app on OS X and even its menubar daemon just to be certain, and sure enough, it did the trick. The vagrant VM is now back on track.

Get our Secrets of Successful Web Apps email course for free!

How Do I Know Whether My Rails App Is Thread-safe or Not?

Jarkko wrote this on Mar 13th, 2015 10:10 am

Photo by Joseph Francis, used under a Creative Commons license.

In January Heroku started promoting Puma as the preferred web server for Rails apps deployed on its hugely successful platform. Puma – as a threaded app server – can better use the scarce resources available for an app running on Heroku.

This is obviously good for a client since they can now run more concurrent users with a single Dyno. However, it’s also good for Heroku itself since small apps (probably the vast majority of apps deployed on Heroku) will now consume much fewer resources on its servers.

The recommendation comes with a caveat, however. Your app needs to be thread-safe. The problem with this is that there is no simple way to say with absolute certainty whether an app as a whole is thread-safe. We can get close, however.

Let’s have a look how.

For the purpose of this issue, an app can be split into three parts:

The app code itself.
Rails the framework.
Any 3rd party gems used by the app.

All three of these need to be thread-safe. Rails and its included gems have been declared thread-safe since 2.2, i.e. since 2008. This alone, however, does not automatically make your app as a whole so. Your own app code and all the gems you use need to be thread-safe as well.

What is and what isn’t thread-safe in Ruby?

So when is your app code not thread-safe? Simply put, when you share mutable state between threads in your app.

But what does this even mean?

None of the core data structures (except for Queue) in Ruby are thread-safe. The structures are mutable, and when shared between threads, there are no guarantees the threads won’t overwrite each others’ changes. Fortunately, this is rarely the case in Rails apps.

Any code that is more than a single operation (as in a single Ruby code call implemented in C) is not thread-safe. The classic example of this is the += operator, which is in fact two operations combined, = and +. Thus, the final value of the shared variable in the following code is undetermined:

@n = 0
3.times do
  Thread.start { 100.times { @n += 1 } }
end

However, none of the two above things alone makes code thread-unsafe. It only becomes so when it is mated with shared data. Let’s get back to that in a minute, but first…

Aside: But what about GIL?

More informed readers might object at this point and point out that MRI Ruby uses a GIL, a.k.a. Global Interpreter Lock.

The general wisdom on the street is that GIL is bad because it does not let your threads run in parallel (true, in a sense), but good, because it makes your code thread-safe.

Unfortunately, GIL does not make your code thread-safe. It only guarantees that two threads can’t run Ruby code at the same time. Thus it does inhibit parallelism. However, threads can still be paused and resumed at any given point, which means that they absolutely can clobber each others’ data.

GIL does accidentally make some operations (such as Array#<<) atomic. However, there are two issues with this:

It only applies to cases where what you’re doing is truly a single Ruby operation. When what you’re doing is multiple operations, context switches can and will happen, and you won’t be happy.
It only applies to MRI. JRuby and Rubinius support true parallelism and thus don’t use a GIL. I wouldn’t count on GIL being there forever for MRI either, so relying on it guaranteeing your code being thread-safe is irresponsible at best.

Go read Jesse Storimer’s Nobody understands the GIL (also parts 2 and 3) for much more detail about it (than you can probably even stomach). But for the love of the flying spaghetti monster, don’t count on it making your app thread-safe.

Thread-safety in Rails the framework

A bit of history:

Rails and its dependencies were declared thread-safe already in version 2.2, in 2008. At that point, however, the consensus was that so many third party libraries were not thread-safe that the whole request in Rails was enclosed within a giant mutex lock. This meant that while a request was being processed, no other thread in the same process could be running.

In order to take advantage of threaded execution, you had to declare in your config.rb that you really wanted to ditch the lock:

config.threadsafe!

However, en route to Rails 4 Aaron Tenderlove Patterson demonstrated that what config.threadsafe! did was

effectively irrelevant in multi-process environments (such as Unicorn), where a single process never processed multiple requests concurrently.
absolutely necessary every time you used a threaded server such as Puma or Passenger Enterprise.

What this meant was that there was no reason for not to have the thread-safe option always on. And that was exactly what was done for Rails 4 in 2012.

Key takeaway: Rails and its dependencies are thread-safe. You don’t have to do anything to “turn that feature on”.

Making your app code thread-safe

Good news: Since Rails uses the Shared nothing architecture, Rails apps are consequentially very suitable for being thread-safe as well. In general, Rails creates a new controller object of every HTTP request, and everything else flows from there. This isolates most objects in a Rails app from other requests.

Like noted above, built-in Ruby data structures (save for Queue) are not thread-safe. This does not, however, matter, unless you are actually sharing them between threads. Because of the way in which Rails is architectured, this almost never happens in a Rails app.

There are, however, some patterns that can come bite you in the ass when you want to switch to a threaded app server.

Global variables

Global variables are, well, global. This means that they are shared between threads. If you weren’t convinced about not using global variables by now, here’s another reason to never touch them. If you really want to share something globally across an app, you are more than likely better served by a constant (but see below), anyway.

Class variables

For the purpose of a discussion about threads, class variables are not much different from global variables. They are shared across threads just the same way.

The problem isn’t so much about using class variables, but about mutating them. And if you are not going to mutate a class variable, in many cases a constant is again a better choice.

Class instance variables

But maybe you’ve read that you should always use class instance variables instead of class variables in Ruby. Well, maybe you should, but they are just as problematic for threaded programs as class variables.

It’s worth pointing out that both class variables and class instance variables can also be set by class methods. This isn’t such an issue in your own code, but you can easily fall into this trap when calling other apis. Here’s an example from Pratik Naik where the app developer is getting into thread-unsafe territory by just calling Rails class methods:

class HomeController < ApplicationController
  before_filter :set_site

  def index
  end

  private

  def set_site
    @site = Site.find_by_subdomain(request.subdomains.first)
    if @site.layout?
      self.class.layout(@site.layout_name)
    else
      self.class.layout('default_lay')
    end
  end
end

In this case, calling the layout method causes Rails to set the class instance variable @_layout for the controller class. If two concurrent requests (served by two threads) hit this code simultaneously, they might end up in a race condition and overwrite each others’ layout.

In this case, the correct way to set the layout is to use a symbol with the layout call:

class HomeController < ApplicationController
  before_filter :set_site
  layout :site_layout

  def index
  end

  private

  def set_site
    @site = Site.find_by_subdomain(request.subdomains.first)
  end

  def site_layout
    if @site.layout?
      @site.layout_name
    else
      'default_lay'
    end
  end
end

However, this is besides the point. The point is, you might end up using class variables and class instance variables by accident, thus making your app thread-unsafe.

Memoization

Memoization is a technique where you lazily set a variable if it is not already set. It is a common technique used where the original functionality is at least moderately expensive and the resulting variable is used several times within a request.

A common case would be to set the current user in a controller:

class SekritController < ApplicationController
  before_filter :set_user

  private

  def set_user
    @current_user ||= User.find(session[:user_id])
  end
end

Memoization by itself is not a thread safety issue. However, it can easily become one for a couple of reasons:

It is often used to store data in class variables or class instance variables (see the previous points).
The ||= operator is in fact two operations, so there is a potential context switch happening in the middle of it, causing a race condition between threads.

It would be easy to dismiss memoization as the cause of the problem, and tell people just to avoid class variables and class instance variables. However, the issue is more complex than that.

In this issue, Evan Phoenix squashes a really tricky race condition bug in the Rails codebase caused by calling super in a memoization function. So even though you would only be using instance variables, you might end up with race conditions with memoization.

What’s a developer to do, then?

Make sure memoization makes sense and a difference in your case. In many cases Rails actually caches the result anyway, so that you are not saving a whole lot if any resources with your memoization method.
Don’t memoize to class variables or class instance variables. If you need to memoize something on the class level, use thread local variables (Thread.current[:baz]) instead. Be aware, though, that it is still kind of a global variable. So while it’s thread-safe, it still might not be good coding practice.

def set_expensive_var
  Thread.current[:expensive_var] ||= MyModel.find(session[:goo_id])
end

If you absolutely think you must be able to share the result across threads, use a mutex to synchronize the memoizing part of your code. Keep in mind, though, that you’re kinda breaking the Shared nothing model of Rails with that. It’s kind of a half-assed sharing method anyway, since it only works across threads, not across processes.

Also keep in mind, that a mutex only saves you from race conditions inside itself. It doesn’t help you a whole lot with class variables unless you put the lock around the whole controller action, which was exactly what we wanted to avoid in the first place.

class GooController < ApplicationController
  @@lock = Mutex.new
  before_filter :set_expensive_var

  private

  def set_expensive_var
    @@lock.synchronize do
      @@stupid_class_var ||= Foo.bar(params[:subdomain])
    end
  end
end

Use different instance variable names when you use inheritance and super in memoization methods.

class Foo
  def env_config
    @env_config ||= {foo: 'foo', bar: 'bar'}
  end
end

class Bar < Foo
  def env_config
    @bar_env_config ||= super.merge({foo: 'baz'})
  end
end

Constants

Yes, constants. You didn’t believe constants are really constant in Ruby, did you? Well, they kinda are:

irb(main):008:0> CON
=> [1]
irb(main):009:0> CON = [1,2]
(irb):9: warning: already initialized constant CON

So you do get a warning when trying to reassign a constant, but the reassignment still goes through. That’s not the real problem, though. The real issue is that the constancy of constants only applies to the object reference, not the referenced object. And if the referenced object can be mutated, you have a problem.

Yeah, you remember right. All the core data structures in Ruby are mutable.

irb(main):010:0> CON
=> [1, 2]
irb(main):011:0> CON << 3
=> [1, 2, 3]
irb(main):012:0> CON
=> [1, 2, 3]

Of course, you should never, ever do this. And few will. There’s a catch, however. Since Ruby variable assignments also use references, you might end up mutating a constant by accident.

irb(main):010:0> CON
=> [1, 2]
irb(main):011:0> arr = CON
=> [1, 2]
irb(main):012:0> arr << 3
=> [1, 2, 3]
irb(main):013:0> CON
=> [1, 2, 3]

If you want to be sure that your constants are never mutated, you can freeze them upon creation:

irb(main):001:0> CON = [1,2,3].freeze
=> [1, 2, 3]
irb(main):002:0> CON << 4
RuntimeError: can't modify frozen Array
  from (irb):2
  from /Users/jarkko/.rbenv/versions/2.1.2/bin/irb:11:in `<main>'

Keep in mind, though, that freeze is shallow. It only applies to the actual Array object in this case, not its items.

Environment variables

ENV is really just a hash-like construct referenced by a constant. Thus, everything that applies to constants above, also applies to it.

ENV['version'] = "1.2" # Don't do this

Making sure 3rd party code is thread-safe

If you want your app to be thread-safe, all the third-party code it uses also needs to be thread-safe in the context of your app.

The first thing you probably should do with any gem is to read through its documentation and Google for whether it is deemed thread-safe. That said, even if it were, there’s no escaping double-checking yourself. Yes, by reading through the source code.

As a general rule, all that I wrote above about making your own code thread-safe applies here as well. However…

With 3rd party gems and Rails plugins, context matters.

If the third party code you use is just a library that your own code calls, you’re fairly safe (considering you’re using it in a thread-safe way yourself). It can be thread-unsafe just the same way as Array is, but if you don’t share the structures between threads, you’re more or less fine.

However, many Rails plugins actually extend or modify the Rails classes, in which case all bets are off. In this case, you need to scrutinize the library code much, much more thoroughly.

So how do you know which type of the two above a gem or plugin is? Well, you don’t. Until you read the code, that is. But you are reading the code anyway, aren’t you?

What smells to look for in third party code?

Everything we mentioned above regarding your own code applies.

Class variables (@@foo)
Class instance variables (@bar, trickier to find since they look the same as any old ivar)
Constants, ENV variables, and potential variables through which they can be mutated.
Memoization, especially when one of the two above points are involved
Creation of new threads (Thread.new, Thread.start). These obviously aren’t smells just by themselves. However, the risks mentioned above only materialize when shared across threads, so you should at least be familiar with in which cases the library is spawning new threads.

Again, context matters. Nothing above alone makes code thread-unsafe. Even sharing data with them doesn’t. But modifying that data does. So pay close attention to whether the libs provide methods that can be used to modify shared data.

The final bad news

No matter how thoroughly you read through the code in your application and the gems it uses, you cannot be 100% sure that the whole is thread-safe. Heck, even running and profiling the code in a test environment might not reveal lingering thread safety issues.

This is because many race conditions only appear under serious, concurrent load. That’s why you should both try to squash them from the code and keep a close eye on your production environment on a continual basis. Your app being perfectly thread-safe today does not guarantee the same is true a couple of sprints later.

Recap

To make a Rails app thread-safe, you have to make sure the code is thread-safe on three different levels:

Rails framework and its dependencies.
Your app code.
Any third party code you use.

The first one of these is handled for you, unless you do stupid shit with it (like the memoization example above). The rest is your responsibility.

The main thing to keep in mind is to never mutate data that is shared across threads. Most often this happens through class variables, class instance variables, or by accidentally mutating objects that are referenced by a constant.

There are, however, some pretty esoteric ways an app can end up thread-unsafe, so be prepared to track down and fix the last remaining threading issues while running in production.

Have fun!

Acknowledgments: Thanks to James Tucker, Evan Phoenix, and the whole Bear Metal gang for providing feedback for the drafts of this article.

This article is a part of a series about Rails performance optimization and GC tuning. Other articles in the series:

Get our Secrets of Successful Web Apps email course for free!

Rails Garbage Collection: Tuning Approaches

Lourens wrote this on Feb 20th, 2015 9:19 am

MRI maintainers have put a tremendous amount of work into improving the garbage collector in Ruby 2.0 through 2.2. The engine has thus gained a lot more horsepower. However, it’s still not trivial to get the most out of it. In this post we’re going to gain a better understanding of how and what to tune for.

Photo by Steve Arnold, used under a Creative Commons license.

Koichi Sasada (_ko1, Ruby MRI maintainer) famously mentioned in a presentation (slide 89):

Try GC parameters

There is no silver bullet

No one answer for all applications

You should not believe other applications settings easily

Try and try and try!

This is true in theory but a whole lot harder to pull off in practice due to three primary problems:

Interpreter GC semantics and configuration change over time.
One GC config isn’t optimal for all app runtime contexts: tests, requests, background jobs, rake tasks, etc.
During the lifetime and development cycles of a project, it’s very likely that existing GC settings are invalidated quickly.

An evolving Garbage Collector

The garbage collector has frequently changed in the latest MRI Ruby releases. The changes have also broken many existing assumptions and environment variables that tune the GC. Compare GC.stat on Ruby 2.1:

{ :count=>7, :heap_used=>66, :heap_length=>66, :heap_increment=>0,
  :heap_live_slot=>26397, :heap_free_slot=>507, :heap_final_slot=>0,
  :heap_swept_slot=>10698, :heap_eden_page_length=>66, :heap_tomb_page_length=>0,
  :total_allocated_object=>75494, :total_freed_object=>49097,
  :malloc_increase=>465840, :malloc_limit=>16777216, :minor_gc_count=>5,
  :major_gc_count=>2, :remembered_shady_object=>175,
  :remembered_shady_object_limit=>322, :old_object=>9109, :old_object_limit=>15116,
  :oldmalloc_increase=>1136080, :oldmalloc_limit=>16777216 }

…with Ruby 2.2:

{ :count=>6, :heap_allocated_pages=>74, :heap_sorted_length=>75,
  :heap_allocatable_pages=>0, :heap_available_slots=>30162, :heap_live_slots=>29729,
  :heap_free_slots=>433, :heap_final_slots=>0, :heap_marked_slots=>14752,
  :heap_swept_slots=>11520, :heap_eden_pages=>74, :heap_tomb_pages=>0,
  :total_allocated_pages=>74, :total_freed_pages=>0,
  :total_allocated_objects=>76976, :total_freed_objects=>47247,
  :malloc_increase_bytes=>449520, :malloc_increase_bytes_limit=>16777216,
  :minor_gc_count=>4, :major_gc_count=>2, :remembered_wb_unprotected_objects=>169,
  :remembered_wb_unprotected_objects_limit=>278, :old_objects=>9337,
  :old_objects_limit=>10806, :oldmalloc_increase_bytes=>1147760,
  :oldmalloc_increase_bytes_limit=>16777216 }

In Ruby 2.2 we can see a lot more to introspect and tune, but this also comes with a steep learning curve which is (and should be) out of scope for most developers.

One codebase, different roles

A modern Rails application is typically used day to day in different contexts:

Running tests
rake tasks
database migrations
background jobs

They all start pretty much the same way with the VM compiling code to instruction sequences. Different roles affect the Ruby heap and the garbage collector in very different ways, however.

This job typically runs for 13 minutes, triggers 133 GC cycles and allocates a metric ton of objects. Allocations are very bursty and in batches.

class CartCleanupJob < ActiveJob::Base
  queue_as :default

  def perform(*args)
    Cart.cleanup(Time.now)
  end
end

This controller action allocates 24 555 objects. Allocator throughput isn’t very variable.

class CartsController < ApplicationController
  def show
    @cart = Cart.find(params[:id])
  end
end

Our test case contributes 175 objects to the heap. Test cases generally are very variable and bursty in allocation patterns.

def test_add_to_cart
  cart = carts(:empty)
  cart << products(:butter)
  assert_equal 1, cart.items.size
end

The default GC behavior isn’t optimal for all of these execution paths within the same project and neither is throwing a single set of RUBY_GC_* environment variables at it.

We’d like to refer to processing in these different contexts as “units of work”.

Fast development cycles

During the lifetime and development cycle of a project, it’s very likely that garbage collector settings that were valid yesterday aren’t optimal anymore after the next two sprints. Changes to your Gemfile, rolling out new features, and bumping the Ruby interpreter all affect the garbage collector.

source 'https://rubygems.org'

ruby '2.2.0' # Invalidates most existing RUBY_GC_* variables

gem 'mail' # slots galore

Process lifecycle events

Let’s have a look at a few events that are important during the lifetime of a process. They help the tuner to gain valuable insights into how well the garbage collector is working and how to further optimize it. They all hint at how the heap changes and what triggered a GC cycle.

How many mutations happened for example while

processing a request
between booting the app and processing a request
during the lifetime of the application?

When it booted

When the application is ready to start doing work. For Rails application, this is typically when the app has been fully loaded in production, ready to serve requests, ready to accept background work, etc. All source files have been loaded and most resources acquired.

When processing started

At the start of a unit of work. Typically the start of an HTTP request, when a background job has been popped off a queue, the start of a test case or any other type of processing that is the primary purpose of running the process.

When processing ended

At the end of a unit of work. Typically the end of a HTTP request, when a background job has finished processing, the end of a test case or any other type of processing that is the primary purpose of running the process.

When it terminated

Triggered when the application terminates.

Knowing when and why GC happens

Tracking GC cycles interleaved with the aforementioned application events yield insights into why a particular GC cycle happens. The progression from BOOTED to TERMINATED and everything else is important because mutations that happen during the fifth HTTP request of a new Rails process also contribute to a GC cycle during request number eight.

On tuning

Primarily the garbage collector exposes tuning variables in these three categories:

Heap slot values: where Ruby objects live
Malloc limits: off heap storage for large strings, arrays and other structures
Growth factors: by how much to grow slots, malloc limits etc.

Tuning GC parameters is generally a tradeoff between tuning for speed (thus using more memory) and tuning for low memory usage while giving up speed. We think it’s possible to infer a reasonable set of defaults from observing the application at runtime that’s conservative with memory, yet maintain reasonable throughput.

A solution

We’ve been working on a product, TuneMyGC for a few weeks that attempts to do just that. Our goals and objectives are:

A repeatable and systematic tuning process that respects fast development cycles
It should have awareness of runtime profiles being different for HTTP requests, background job processing etc.
It should support current mainline Ruby versions without developers having to keep up to date with changes
Deliver reasonable memory footprints with better runtime performance
Provide better insights into GC characteristics both for app owners and possibly also ruby-core

Here’s an example of Discourse being automatically tuned for better 99th percentile throughput. Response times in milliseconds, 200 requests:

Controller	GC defaults	Tuned GC
categories	227	160
home	163	113
topic	55	40
user	92	76

GC defaults:

$ RUBY_GC_TUNE=1 RUBY_GC_TOKEN=a5a672761b25265ec62a1140e21fc81f ruby script/bench.rb -m -i 200

Raw GC stats from Discourse’s bench.rb script:

GC STATS:
count: 106
heap_allocated_pages: 2447
heap_sorted_length: 2455
heap_allocatable_pages: 95
heap_available_slots: 997407
heap_live_slots: 464541
heap_free_slots: 532866
heap_final_slots: 0
heap_marked_slots: 464530
heap_swept_slots: 532876
heap_eden_pages: 2352
heap_tomb_pages: 95
total_allocated_pages: 2447
total_freed_pages: 0
total_allocated_objects: 27169276
total_freed_objects: 26704735
malloc_increase_bytes: 4352
malloc_increase_bytes_limit: 16777216
minor_gc_count: 91
major_gc_count: 15
remembered_wb_unprotected_objects: 11669
remembered_wb_unprotected_objects_limit: 23338
old_objects: 435435
old_objects_limit: 870870
oldmalloc_increase_bytes: 4736
oldmalloc_increase_bytes_limit: 30286118

TuneMyGC recommendations

$ RUBY_GC_TUNE=1 RUBY_GC_TOKEN=a5a672761b25265ec62a1140e21fc81f RUBY_GC_HEAP_INIT_SLOTS=997339 RUBY_GC_HEAP_FREE_SLOTS=626600 RUBY_GC_HEAP_GROWTH_FACTOR=1.03 RUBY_GC_HEAP_GROWTH_MAX_SLOTS=88792 RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=2.4 RUBY_GC_MALLOC_LIMIT=34393793 RUBY_GC_MALLOC_LIMIT_MAX=41272552 RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.32 RUBY_GC_OLDMALLOC_LIMIT=39339204 RUBY_GC_OLDMALLOC_LIMIT_MAX=47207045 RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR=1.2 ruby script/bench.rb -m -i 200

Raw GC stats from Discourse’s bench.rb script:

GC STATS:
count: 44
heap_allocated_pages: 2893
heap_sorted_length: 2953
heap_allocatable_pages: 161
heap_available_slots: 1179182
heap_live_slots: 460935
heap_free_slots: 718247
heap_final_slots: 0
heap_marked_slots: 460925
heap_swept_slots: 718277
heap_eden_pages: 2732
heap_tomb_pages: 161
total_allocated_pages: 2893
total_freed_pages: 0
total_allocated_objects: 27167493
total_freed_objects: 26706558
malloc_increase_bytes: 4352
malloc_increase_bytes_limit: 34393793
minor_gc_count: 34
major_gc_count: 10
remembered_wb_unprotected_objects: 11659
remembered_wb_unprotected_objects_limit: 27981
old_objects: 431838
old_objects_limit: 1036411
oldmalloc_increase_bytes: 4736
oldmalloc_increase_bytes_limit: 39339204

We can see a couple of interesting points here:

There is much less GC activity – only 44 rounds instead of 106.
Slot buffers are still decent for high throughput. There are 718247 free slots (heap_free_slots) of 1179182 available slots (heap_available_slots), which is 64% of the current live objects (heap_live_slots). This value however is slightly skewed because the Discourse benchmark script forces a major GC before dumping these stats - there are about as many swept slots as free slots (heap_swept_slots).
Malloc limits (malloc_increase_bytes_limit and oldmalloc_increase_bytes_limit) and growth factors (old_objects_limit and remembered_wb_unprotected_objects_limit) are in line with actual app usage. The TuneMyGC service considers when limits and growth factors are bumped during the app lifecycle and attempts to raise limits via environment variables slightly higher to prevent excessive GC activity.

Now it’s your turn.

Feel free to take your Rails app for a spin too!

1. Add to your Gemfile.

gem 'tunemygc'

2. Register your Rails application.

$ bundle exec tunemygc -r [email protected]
      Application registered. Use RUBY_GC_TOKEN=08de9e8822c847244b31290cedfc1d51 in your environment.

3. Boot your app. We recommend an optimal GC configuration when it ends

$ RUBY_GC_TOKEN=08de9e8822c847244b31290cedfc1d51 RUBY_GC_TUNE=1 bundle exec rails s

This article is a part of a series about Rails performance optimization and GC tuning. Other articles in the series:

Get our Secrets of Successful Web Apps email course for free!

It’s Not About You

Jarkko wrote this on Feb 12th, 2015 1:32 pm

This is a slightly expanded version of a talk I gave in Rails Girls Helsinki in February 2015.

Photo by Sean MacEntee, used under a Creative Commons license.

I used to suffer from terrible stage fright. I was super nervous every time I presented. I forgot stuff I was supposed to say on stage. I never vomited before a talk, though, I’ll give you that.

It got better over time through lots of practice, but I still get all sweaty and shaky before getting on stage.

Then I recently stumbled upon an article by Kathy Sierra called Presentation Skills Considered Harmful. In it she tells about having had similar problems, and how all the tutorials told her what and how you should do to give a good presentation. You, you, you.

Then she realized that a presentation is really a UX. A presentation is just a user experience. You present ideas – hopefully good ones – to your audience. What does this make you, the presenter? A UI. You are just a UI, a user interface. You yourself don’t matter that much. All that matters is that your ideas touch your audience.

Is that bad? No, it’s great. It’s a huge relief. What matters is not you but what you have to say, the meat of your talk.

And that brings us nicely to the topic of this article.

It’s not about you.

This is maybe the most important thing I’ve learned during the past decade¹. It’s also not only profound, but spans your entire life.

Writing

These days, everyone and their cat has a blog.

Most blogs tell about the author, which is totally fine — if the author writes it for themselves or their relatives. But when you think about the most helpful blogs – the ones you go back to all the time – they really aren’t. They are about helping the reader, either by informing them or teaching them new things.

But Jarkko, I hear you say. You just started this article with a story about yourself.

That is correct. Storytelling gets kind of an exception.

Except not really. Because storytelling isn’t really about the storyteller, either. You know, we didn’t always know how to write. Telling stories was the only way to pass information to younger generations. Thus, our brains are quite literally wired to react to storytelling. We’re evolutionarily built to learn better from stories.

Thus, stories are not so much about you, the teller, either. Stories are about the listener/reader, and how they relate to the protagonist of the narrative.

And in the end, unless you’re writing fiction, stories are just a tool as well. A powerful one, yes, but just a tool to bring home a lesson to the reader.

Because writing is not about you.

Cincinnati Enquirer learned this the hard way recently. After they laid off their whole copy desk, they were shocked to find out that readers were outraged about the deteriorating quality of the paper’s articles. John E. McIntyre describes the issue vividly:

The reader doesn’t care how hard you worked, what pressures you are under, or how good your intentions are. The reader sees the product, online or in print; if the product looks sloppy and substandard, the reader will form, and likely express, a low opinion of it. And the reader is under no obligation whatsoever to be kind.

What the Enquirer forgot was that their writing is really not about them.

But you don’t wanna hear me babble about writing, let’s get to business.

Business ideas

Are you still looking for that killer business idea?

Stop.

Ideas are all about you. How to get into business you should – apparently through divine intervention or something – come up with a dazzling idea.

Ideas are also dangerous because once you get one, it makes you a victim of confirmation bias. You’re going to start retrofitting the world to your idea, which is totally backwards.

Instead, find an audience you can relate to and sell to. Then, find about their pains, problems, and ways to help them make more money. Then solve that pain and you’ll do well.

Because – to quote Peter Drucker – the purpose of business is to create and keep a customer. It’s that customer, not you, who is going to decide the fate of your business.

Because successful business isn’t about you.

Marketing

You know what’s special about Apple ads? They almost never list features or specs. Instead, they show what their users can do with them. Shoot video. Facetime with their relatives on the other side of globe. Throw a DJ gig with just an iPad or two.

My friend Amy Hoy has this formula for great sales copy called PDF, for Pain–Dream–Fix:

Start with describing the pain your users are having, with crisp, clear words, so that they will go all ahhh, they understand me.
Then flip it around. “Imagine a world where you wouldn’t have to manually scan your taxi receipts. Instead, they would magically appear in your accounting system.”
And fix. “We got you covered. Uberreceipt 9000 will hook up Uber with your accounting and you will never again touch another taxi receipt!”

Now that is how you make people pay attention.

Because great marketing and sales copy is not about you, either.

Finally: Product

Last, and indeed least, we get to the actual product, software in our case.

It – as and of itself – is not all that important. Because having a great product is not about you, or the product itself. It’s about solving a customer pain or a job they need to tackle.

Because people don’t buy a quarter-inch drill, they buy a quarter-inch hole in the wall.

By now, you already know the pains of your audience². Now it’s time to solve them. From previous points, you should already have a nice roadmap to a product people like.

Extra credit: Make your users kick ass

For an extra credit, let’s see how we could transform people from liking your product to loving and raving about it.

We started this talk with Kathy Sierra, and we’re going to end it with her as well.

Kathy’s big idea is that the main purpose of software is to make its users kick ass. What she means by this is that software – or any product really – should help their users to get better at what they do, not just using the product³.

Screw gamification.

Final Cut Pro should not make its users better at using Final Cut Pro. It should make them better film editors.

WODConnect should not make its users better at using the app. It should make them stronger and faster.

This should happen through everything related to your product. The product itself, its marketing, its manuals, its customer support, everything.

Because your success is not about you or your product.

It’s about the users.

It’s about empathy.

Discuss on Hacker News.

Well, let’s just say it’s a tie with learning about the growth mindset.↩
If not, go back to the Business Ideas part above. Do not – and I can’t stress this enough – start building a product before you are sure it solves a problem people have, know they have, and are willing to pay for.↩
All this is also the subject of Kathy’s upcoming book, Badass: Making Users Awesome ↩

Get our Secrets of Successful Web Apps email course for free!

The Tremendous Scale of AWS and the Hidden Benefit of the Cloud

Jarkko wrote this on Feb 3rd, 2015 3:14 pm

Photo by Susanne Nilsson, used under a Creative Commons license.

Finland, 2013. Vantaa, the second largest municipality in Finland buys a new web form for welfare applications from CGI (née Logica, née WM-Data) for a whopping €1.9 million. The story doesn’t end there, though. A month later it turns out, that Helsinki has bought the exact same form from CGI as well, for €1.85 million.

Now, you can argue about what is a fair value for a single web form, especially when it has to be integrated to an existing information system. What is clear though, that it is not almost 2 million Euros, twice.

“How on earth was that possible,” I hear you ask. Surely someone would have offered to do that form for, say, 1 million a pop. Heck, even the Finnish law for public procurements mandates public competitive bidding for such projects.

Vendor lock-in. CGI was administering the information system on which the form was to be built. And since they held the key, they could pretty much ask for as much as the municipalities could potentially pay for the form.

Now hold that thought.

Over at High Scalability, Todd Hoff writes about James Hamilton’s talk at the AWS re:Invent conference last November. It reveals how gigantic the scale of Amazon Web Services really is:

Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004).

This also means that AWS is leaps and bounds above its competitors when it comes to capacity:

All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner)

This of course gives AWS a huge benefit compared to its competitors. It can run larger datacenters both close and far from each others; they can get sweetheart deals and custom-made components from Intel for servers, just like Apple does with laptops and desktops. And they can afford to design their own network gear, the one field where the progress hasn’t followed the Moore’s law. There the only other companies who do the same are other internet giants like Google and Facebook, but they’re not in the same business as AWS¹.

All this is leading to a situation where AWS is becoming the IBM of the 21st century, for better or for worse. Just like no one ever got fired for buying IBM in the 80’s, few will likely get fired for choosing AWS in the years to come. This will be a tough, tough nut to crack for Amazon’s competitors.

So far the situation doesn’t seem to have slowed down Amazon’s rate of innovation, and perhaps they have learned the lessons of the big blue. Only future will tell.

From a customer’s perspective, a cloud platform like AWS brings lots and lots of benefits – well listed in the article above – but of course also downsides. Computing power is still much cheaper when bought in physical servers. You can rent a monster Xeon server with basically unlimited bandwidth for less than €100/month. AWS or platforms built on it such as Heroku can’t compete with that on price. So if you’re very constrained on cash and have the sysadmin chops to operate the server, you will get a better deal.

Of course we’re comparing apples and oranges here. You won’t get similar redundancy and flexibility with physical servers as you can with AWS for any money – except when you do. The second group where using a commercial cloud platform doesn’t make sense is when your scale merits a cloud platform of your own. Open source software for such platforms – such as Docker and Flynn – are slowly at a point where you can rent your own servers and basically build your own AWS on them². Of course this will take a lot more knowledge from your operations team, especially if you want to attain similar redundancy and high availability that you can with AWS Availability Zones.

There is – however – one hidden benefit of going with a commercial cloud platform such as AWS, that you might not have thought about: going with AWS will lessen your vendor lock-in a lot. Of course you can still shoot yourself in the foot by handing off the intellectual property rights of the software to your vendor or some other braindead move. But given that you won’t, hosting is another huge lock-in mechanism large IT houses use to screw their clients. It not only effectively locks the client to the vendor, but it also massively slows down any modifications made by other projects that need to integrate with the existing system, since everything needs to be handled through the vendor. They can, and will, block any progress you could make yourself.

With AWS, you can skip all of that. You are not tied to a particular vendor to develop, operate, and extend your system. While running apps on PaaS platforms requires some specific knowledge, it is widely available and standard. If you want to take your systems to another provider, you can. If you want to build your own cloud platform, you can do it and move your system over bit by bit.

It is thus no wonder that large IT consultancies are racing to build their own platforms, to hit all the necessary buzzword checkboxes. However, I would be very wary of their offerings. I’m fairly certain the savings they get from better utilization of their servers by virtualization are not passed on to the customer. And even if some of them are, the lock-in is still there. They have absolutely no incentive to make their platform compatible with the existing ones, quite the contrary. Lock-in is on their side. It is not on your side. Beware.

Apart from Google Computing Engine, but even it doesn’t provide a similar generic cloud platform that AWS does.↩
If you’re at such a state, we can help, by the way. The same goes for building a server environment on AWS, of course.↩

Get our Secrets of Successful Web Apps email course for free!

Of Course You May Eat Your Own Food Here

Jarkko wrote this on Feb 2nd, 2015 4:00 pm

Are you with or against your customers?

Do your signs read like those “Private property. Keep out!” signs? “No eating of own food”. “Food may not be taken out of the restaurant!” Don’t you have more important things to care about, like, whether your customers love you?

I recently saw this sign on the outside wall of a small roadside restaurant¹:

It reads: “You may of course eat your own food here as well.”

So, why should you be more like Tiinan Tupa and less like ABC?

First of all, very few people will probably abuse it. Even if someone eats their own lunch packs there, they will probably reciprocate and buy something as a favor to the place.

Second, it is the ultimate purple cow. It is something unexpected, something remarkable people will pay attention to. People are so accustomed to passive-aggressive notes forbidding this and that that a sign being actually nice will be noticed and talked about.

Lastly, and most importantly, it’s just plain being nice. So stop just using glorious and empty words about how you care about your customers and actually walk the walk.

For these three reasons, being ridiculously admitting with your customers will probably also be good for you in the long term. So stop picking nickels in front of bulldozers. Make sure people will give you money because they love you, not because they have to.

Coincidentally, I found it on Vaihtoehto ABC:lle, a site and an app dedicated to finding alternatives to the ubiquitous, generic and boring ABC gas station chain that seems to have infiltrated the whole country in just a couple of years. ABC specifically forbids eating food bought from its own supermarket in the cafeteria of the same gas station.↩

Get our Secrets of Successful Web Apps email course for free!

Help! My Rails App Is Melting Under the Launch Day Load

Jarkko wrote this on Jan 12th, 2015 1:39 pm

It was to be our day. The Finnish championship relay in orienteering was about to be held close to us, in a terrain type we knew inside and out. We had a great team, with both young guns and old champions. My friend Samuli had been fourth in the individual middle distance championships the day before. My only job was to handle the first leg securely and pass the baton to the more experienced and tougher guys who would then take care of our success. And I failed miserably.

My legs were like dough from the beginning. I was supposed to be in good shape, but I couldn’t keep up with anyone. I was supposed to take it easy and orienteer cleanly, but I ran like a headless chicken, making a mistake after another. Although I wouldn’t learn the term until years later, this was my crash course to ego depletion.

The day before the relay we organized the middle distance Finnish champs in my old hometown Parainen. For obvious reasons, I was the de facto webmaster of our club pages, which also hosted the result service. The site was running on OpenACS, a system running on TCL I had a year or so of work experience with. I was supposed to know it.

After the race was over, I headed back to my friend’s place, opened up my laptop… only to find out that the local orienteering forums were ablaze with complaints about our results page being down. Crap.

After hours or hunting down the issue, getting help on the OpenACS IRC channel, serving the results from a static page meanwhile, I finally managed to fix the issue. The app wasn’t running enough server processes to keep up with the load. And the most embarrassing thing was that the load wasn’t even that high – from high dozens to hundreds of simultaneous users. I headed to bed with my head spinning, hoping to scramble up my self confidence for the next day’s race (with well-known results).

What does this have to do with Ruby or Rails? Nothing, really. And yet everything. The point is that most of us have a similar story to share. It’s much more common to have a meltdown story with a reasonably low number of users than actually have a slashdotting/hackernewsing/daring fireball hit your app. If you aren’t old enough to have gone through something like this, you probably will. But you don’t have to.

During the dozen or so years since the aforementioned episode, I’ve gone through some serious loads. Some of them we have handled badly, but most – including the Facebook terms of service vote campaign – with at least reasonable grace. This series of articles about Rails performance builds upon those war stories.

We have already posted a couple of articles to start off the series.

This article will serve as kind of a belated intro to the series, introducing our high level principles regarding the subject but without going more to the details.

The Bear Metal ironclad rules of Rails performance

Scalability is not the same as performance

As I already noted in Does Rails Scale?, it’s worth pointing out that performance is not the same thing as scalability. They are related for sure. But you can perform similarly poorly from your first to your millionth user and be “scalable”. There is also the difference that performance is right here and now. If your app scales well, you can just throw more hardware (virtual or real) at the problem and solve it by that.

The good news is that Rails scales quite well out of the box for the vast majority of real-world needs. You probably won’t be the next Twitter or even Basecamp. In your dreams and VC prospectus maybe, but let’s be honest, the odds are stacked against you. So don’t sweat about that too much.

Meanwhile, you do want your app to perform well enough for your initial wave of users.

Perceived performance is what matters

There are basically three different layers of performance for any web app: the server level (I’m bundling everything from the data store to the frontend server here), browser rendering and the performance perceived by the user. The two latter ones are very close to each other but not exactly the same. You can tweak the perceived performance with tricks on the UI level, something that often isn’t even considered performance.

The most important lesson here is that the perceived performance is what matters. It makes no difference what your synthetic performance tests on your server say if the UI of the app feels sluggish. There is no panacea to solve this, but make no mistake, it is what matters when the chicken come home to roost.

Start with quick, large, and obvious wins

The fact is that even a modest amount of users can make your app painfully slow. The good thing is that you can probably fix that just with hitting the low-hanging fruit. You won’t believe how many Rails apps reveal that they’re running in the development mode even in production by – when an error occurs – leaking the whole error trace out to the end user.

Other examples of issues that are fairly easy to spot and fix are N+1 query issues with ActiveRecord associations, missing database indeces, and running a single, non-concurrent app server instance, where any longer-running action will block the whole app from other users.

YAGNI

Once you have squashed all the low-hanging fruit with your metal-reinforced bat, relax. Tuning app performance shouldn’t be your top priority at the moment – unless it is, but in that case you will know for sure. What you should be focusing on is how to get paying customers and how you can make them kick ass. If you have clear performance issues, by all means fix them. However…

Don’t assume, measure

You probably don’t have any idea how many users your app needs to support from the get go. That’s fine. The reality will teach you. As long as you don’t royally fuck up the share nothing (and 12 factor if you’re on a cloud platform such as Heroku) architecture, you should be able to react to issues quickly.

That said, you probably do want to do some baseline load testing with your app if you’re opening for a much larger private group or the public. The good news is that it is very cheap to spin up a virtual server instance just for a couple of hours and hit your app hard with it. Heck, you can handle the baseline from your laptop if needed. With that you should be able to get over the initial, frightening launch.

Once your app is up and running under load from real users, your tuning work starts for real. Only now will you be able to measure where the real hot paths and bottlenecks in your app are, based on real usage data, not just assumptions. At this point you’ll have a plethora of tools at your disposal, from the good old request log analyzer to commercial offerings such as Skylight, and New Relic.

On the frontend most browsers have nowadays developer tools to optimize end-user performance, from Chrome and Safari’s built-in developer tools to Firebug for Firefox.

Wrap-up

In this introductory article to building performant Rails (or any, for that matter) web apps, we took a look at five basic rules of performance optimization:

Scalability is not the same as performance.
Perceived performance is what matters.
Start with the low-hanging fruit.
YAGNI
Don’t assume, measure.

We will get (much) more in the details of Rails performance optimization in later articles. At that point we’ll enter a territory where one size does not fit all anymore. However, whatever your particular performance problem is, you should keep the five rules above at the top your mind.

Get our Secrets of Successful Web Apps email course for free!

← Older Blog Archives

Get our Secrets of Successful Web Apps email course for free!

Burning the initial image

FUSE for macOS

FUSE-Ext2

Get our Secrets of Successful Web Apps email course for free!

Get our Secrets of Successful Web Apps email course for free!

Get our Secrets of Successful Web Apps email course for free!

What is and what isn’t thread-safe in Ruby?

Aside: But what about GIL?

Thread-safety in Rails the framework

Making your app code thread-safe

Global variables

Class variables

Class instance variables

Memoization

Constants

Environment variables

Making sure 3rd party code is thread-safe

What smells to look for in third party code?

The final bad news

Recap

Related articles

Get our Secrets of Successful Web Apps email course for free!

An evolving Garbage Collector

One codebase, different roles

Fast development cycles

Process lifecycle events

When it booted

When processing started

When processing ended

When it terminated

Knowing when and why GC happens

On tuning

A solution

GC defaults:

Feel free to take your Rails app for a spin too!

1. Add to your Gemfile.

2. Register your Rails application.

3. Boot your app. We recommend an optimal GC configuration when it ends

Related articles

Get our Secrets of Successful Web Apps email course for free!

Writing

Business ideas

Marketing

Finally: Product

Extra credit: Make your users kick ass

Get our Secrets of Successful Web Apps email course for free!

Get our Secrets of Successful Web Apps email course for free!

Get our Secrets of Successful Web Apps email course for free!

The Bear Metal ironclad rules of Rails performance

Scalability is not the same as performance

Perceived performance is what matters

Start with quick, large, and obvious wins

YAGNI

Don’t assume, measure

Wrap-up

Get our Secrets of Successful Web Apps email course for free!