Using Go to Execute Commands on Hundreds of Servers with SSH

Go

In this article we’re going to write a simple program on Go (100 lines), which will execute commands via SSH protocol on hundreds of servers, and will do it quite efficiently. It will be implemented with the help of go.crypto/ssh, which is SSH protocol implementation by authors of Go.

More “advanced” version of the program written in this article is available on github: GoSSHa (Go SSH agent).

Introduction

The company I work for has more than one server and in order to keep the work efficient libpssh library on the basis of libssh2 has been written. This library was written on C with libevent use many years ago, and it still copes with its tasks. One thing is that it’s quite complicate to maintain it. Go language from Google is getting more popular, as well as within our company. So I decided to rewrite libpssh with Go and fix some drawbacks, at the same time simplifying the code and support complexity.

In order to begin working we need Go language compiler (it can be downloaded from golang.org) and an operating hg command in order to download go.crypto/ssh with the help of “go get”.

Let’s Get to Work

Let’s create “main.go” file in some directory, desirable empty. Now let’s write the “framework” of our program and then implement missing functions during the article.

package main

import (
    "code.google.com/p/go.crypto/ssh"
    // ...
)

// ...

func main() {
    cmd := os.Args[1] // the first argument is a command we’ll execute on all servers 
    hosts := os.Args[2:] // other arguments (starting from the second one) – the list of servers 
    results := make(chan string, 10) // we’ll write results into the buffered channel of strings
    timeout := time.After(5 * time.Second) // in 5 seconds the message will come to timeout channel

    // initialize the structure with the configuration for ssh packat.
    // makeKeyring() function will be written later
    config := &ssh.ClientConfig{
        User: os.Getenv("LOGNAME"),
        Auth: []ssh.ClientAuth{makeKeyring()},
    }

    // running one goroutine (light-weight alternative of OS thread) per server,
    // executeCmd() function will be written later
    for _, hostname := range hosts {
        go func(hostname string) {
            results <- executeCmd(cmd, hostname, config)
        }(hostname)
    }

    // collect results from all the servers or print "Timed out",
    // if the total execution time has expired
    for i := 0; i < len(hosts); i++ {
        select {
        case res := <-results:
            fmt.Print(res)
        case <-timeout:
            fmt.Println("Timed out!")
            return
        }
    }
}

Not taking into consideration the fact that we need to write makeKeyring() and executeCmd() functions, our program is ready! Thanks to “Go magic” we’ll connect to all servers in parallel and execute the given command on all of them. We’ll finish in 5 seconds, having displayed the results from all servers, which managed to execute. Such a simple way of total time-out implementation for all executed in-parallel operations is possible thanks to channels concept and select. It allows performing communication between several channels at the same time. As soon as at least one of statements in case can be executed, the appropriate code block will be performed.

Data Structures Initialization for go.crypto/ssh

We haven’t written makeKeyring() and executeCmd() yet, but you’ll probably find nothing interesting here. We’ll sign in with the help of SSH keys and will suppose that the keys are located in .ssh/id_rsa or .ssh/id_dsa:

type SignerContainer struct {
    signers []ssh.Signer
}

func (t *SignerContainer) Key(i int) (key ssh.PublicKey, err error) {
    if i >= len(t.signers) {
        return
    }
    key = t.signers[i].PublicKey()
    return
}

func (t *SignerContainer) Sign(i int, rand io.Reader, data []byte) (sig []byte, err error) {
    if i >= len(t.signers) {
        return
    }
    sig, err = t.signers[i].Sign(rand, data)
    return
}

func makeSigner(keyname string) (signer ssh.Signer, err error) {
    fp, err := os.Open(keyname)
    if err != nil {
        return
    }
    defer fp.Close()

    buf, _ := ioutil.ReadAll(fp)
    signer, _ = ssh.ParsePrivateKey(buf)
    return
}

func makeKeyring() ssh.ClientAuth {
    signers := []ssh.Signer{}
    keys := []string{os.Getenv("HOME") + "/.ssh/id_rsa", os.Getenv("HOME") + "/.ssh/id_dsa"}

    for _, keyname := range keys {
        signer, err := makeSigner(keyname)
        if err == nil {
            signers = append(signers, signer)
        }
    }

    return ssh.ClientAuthKeyring(&SignerContainer{signers})
}

As you can see, we return ssh.ClientAuth interface which has necessary methods for implementation on a server. For short, there’s almost no error handling, in production mode the code size will be one and a half time bigger.

In order to execute a command on a server, the code is also quite trivial (error handling is thrown away):

func executeCmd(cmd, hostname string, config *ssh.ClientConfig) string {
    conn, _ := ssh.Dial("tcp", hostname+":22", config)
    session, _ := conn.NewSession()
    defer session.Close()

    var stdoutBuf bytes.Buffer
    session.Stdout = &stdoutBuf
    session.Run(cmd)

    return hostname + ": " + stdoutBuf.String()
}

For short and simplicity we always use the current user name for authorization on the servers, and port 22 by default.

Our program is ready! Here is the complete source code:

package main

import (
    "bytes"
    "code.google.com/p/go.crypto/ssh"
    "fmt"
    "io"
    "io/ioutil"
    "os"
    "time"
)

type SignerContainer struct {
    signers []ssh.Signer
}

func (t *SignerContainer) Key(i int) (key ssh.PublicKey, err error) {
    if i >= len(t.signers) {
        return
    }
    key = t.signers[i].PublicKey()
    return
}

func (t *SignerContainer) Sign(i int, rand io.Reader, data []byte) (sig []byte, err error) {
    if i >= len(t.signers) {
        return
    }
    sig, err = t.signers[i].Sign(rand, data)
    return
}

func makeSigner(keyname string) (signer ssh.Signer, err error) {
    fp, err := os.Open(keyname)
    if err != nil {
        return
    }
    defer fp.Close()

    buf, _ := ioutil.ReadAll(fp)
    signer, _ = ssh.ParsePrivateKey(buf)
    return
}

func makeKeyring() ssh.ClientAuth {
    signers := []ssh.Signer{}
    keys := []string{os.Getenv("HOME") + "/.ssh/id_rsa", os.Getenv("HOME") + "/.ssh/id_dsa"}

    for _, keyname := range keys {
        signer, err := makeSigner(keyname)
        if err == nil {
            signers = append(signers, signer)
        }
    }

    return ssh.ClientAuthKeyring(&SignerContainer{signers})
}

func executeCmd(cmd, hostname string, config *ssh.ClientConfig) string {
    conn, _ := ssh.Dial("tcp", hostname+":22", config)
    session, _ := conn.NewSession()
    defer session.Close()

    var stdoutBuf bytes.Buffer
    session.Stdout = &stdoutBuf
    session.Run(cmd)

    return hostname + ": " + stdoutBuf.String()
}

func main() {
    cmd := os.Args[1]
    hosts := os.Args[2:]

    results := make(chan string, 10)
    timeout := time.After(5 * time.Second)
    config := &ssh.ClientConfig{
        User: os.Getenv("LOGNAME"),
        Auth: []ssh.ClientAuth{makeKeyring()},
    }

    for _, hostname := range hosts {
        go func(hostname string) {
            results <- executeCmd(cmd, hostname, config)
        }(hostname)
    }

    for i := 0; i < len(hosts); i++ {
        select {
        case res := <-results:
            fmt.Print(res)
        case <-timeout:
            fmt.Println("Timed out!")
            return
        }
    }
}

Let’s run the application now:

$ vim main.go # write the program :)
$ go get # download all relations
$ time go run main.go 'hostname -f; sleep 4.7' localhost srv1 srv2
localhost: localhost
srv1: srv1
Timed out!

real	0m5.543s

It’s working! localhost, srv1 and srv2 servers had 0.3 seconds only to execute all of the commands. Too slow srv2 couldn’t make it. Together with program compilation “on the fly” from the source code, the program was executed in 5.5 seconds, where 5 seconds is our default timeout for the command execution.

Summary

The article is short, but we have created quite a useful application. It can be used in production. More advanced version of the application has been deployed to production environment and has showed great results.

Comments

  1. I built something similar called Overcast using Node.js and native SSH. Beyond running commands and script files on multiple servers, it includes API support for a number of different cloud providers. I haven’t tried using it on hundreds of servers though, I’d be curious to see how it performs — probably not as well as your Go example.
  2. Wow, it looks very interesting, I’ll definitely take a look into it tonight. Btw, you should write an article about its core functionality.

    # Spin up a new Ubuntu 14.04 instance on DigitalOcean:
    $ overcast digitalocean create db-01
    
    # Spin up a new Ubuntu 14.04 instance on Linode:
    $ overcast linode create db-02
    

    digitalocean,linode — are these just an alias to ip?

  3. Thanks! Maybe I will write an article. If I understand your question correctly, in this case db-01 and db-02 are instances that have corresponding IP addresses, which were assigned by the hosting provider (DigitalOcean and Linode in those examples) during the creation process.
  4. It seems this code is now broken — I assume changes have been made to the SSH API since the creation of this.
  5. Yup, code is broken. This is what I get — go run test.go ‘date’ host1 host2 # command-line-arguments ./test.go:29: cannot assign *ssh.Signature to sig (type []byte) in multiple assignment ./test.go:45: undefined: ssh.ClientAuth ./test.go:56: undefined: ssh.ClientAuthKeyring ./test.go:79: undefined: ssh.ClientAuth

    My go version is 1.4rc2

  6. The go.crypto/ssh package has been changed. Try to change all the imports for code.google.com/go.crypto/ssh to code.google.com/gosshold/ssh.
940

Ropes — Fast Strings

Most of us work with strings one way or another. There’s no way to avoid them — when writing code, you’re doomed to concatinate strings every day, split them into parts and access certain characters by index. We are used to the fact that strings are fixed-length arrays of characters, which leads to certain limitations when working with them. For instance, we cannot quickly concatenate two strings. To do this, we will at first need to allocate the required amount of memory, and then copy there the data from the concatenated strings.