InfluxDB in IoT world: Hosting and scaling on AWS (Part 2)

In the previous part we took a bird's-eye view of InfluxDB, it's core features and some of the reasons to embrace the database in the wake of IoT data onslaught.
In this part, we're going to see how easy it is to install and start using InfluxDB on AWS, see how to scale it and how fast InfluxDB is on different types of AWS instances.

Hosting on Amazon Web Services

AWS doesn't have an out-of-the-box support of InfluxDB DBMS, so we'll need to do some manual installation. Let's begin with firing up a new EC2 instance, let's say m4.large (as suggested in InfluxDB installation doc) with Amazon Linux AMI. Let's also create a security group and allow incoming TCP traffic to ports 8086 and 8088. Next, let's install the thing. Ssh to the server and execute the following:

sudo yum update
sudo reboot
cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL
baseurl = https://repos.influxdata.com/rhel/7Server/x86_64/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF
sudo yum install influxdb

Next, I'd highly recommend to setup authentication and HTTPS for a real system, as described in (the official documentation)[https://docs.influxdata.com/influxdb/v1.3/administration/security/].

Now, start it sudo /etc/init.d/influxdb start, and let's check if it's running with influx command. You should connect to local InfluxDB and see something similar to the below.

$ influx
Connected to http://localhost:8086 version x.x.x
InfluxDB shell version: x.x.x
>

Scalability

So far in this series InfluxDB was a knight atop a white stallion. There's a twist, though. The OSS version does not support replication and sharding. You're pretty much stuck with a single installation (pretty powerful though, as we saw in Part 1). High availability is achievable using influxdb-relay, but that's one more layer and piece of infrastructure to manage.
If you need out-of-the-box sharding, service availability, monitoring and other enterprise features (I'd just call them production-ready features, but that's a fuel for another post), then your company have to go with commercial InfluxEnterprise or fully managed SaaS.
For a free version, you can stick to a half-century old methods:

  • Custom sharding
  • Scale-up (use faster hardware)

Custom sharding is difficult and costly in development, requires a good domain understanding and change prediction (up to the level of oracle power). Still, it can win in the long run.
The problem with scaling up is that eventually you're greeted with the law of diminishing returns. That is, to get a twice faster machine you need to pay 10 times more $$, for example. And the price grows exponentially.
Also, vertical scaling is not appropriate for all technologies. Depending on specific bottlenecks of a system, it's possible that scaling up is almost impossible. For example, disk IO is not an easy thing to upgrade.

As for InfluxDB, vertical scalability is pretty feasible, as we're going to see next.

Scale-up

Let's see some data about how InfluxDB scales up.
Below is the comparison of query execution times on different AWS instances for two different types of load. I'll call them:

  • Analytical read:
SELECT stddev("value") FROM "measurements" WHERE "type" = 'PM25' AND time > 'xxx' and time < 'yyy' GROUP BY time(30d) fill(none)
  • Write:
    Writing ~4M multi-value data entries from another node in the same AWS network using InfluxDB benchmark project bulk_load_influx tool (see Part 1 for more details).

The execution time results:

AWS instance type vCPUs RAM (approx.) Analytical read execution time Write execution time
m4.large 2 4 GB 157 seconds 130 seconds
m4.2xlarge 8 32 GB 130 seconds 34 seconds
m4.4xlarge 16 64 GB 119 seconds 19 seconds
r4.large 2 15 GB 154 seconds 125 seconds
r4.2xlarge 8 61 GB 132 seconds 34 seconds
c4.2xlarge 8 15 GB 120 seconds 33 seconds
c4.8xlarge 36 60 GB 122 seconds 13 seconds

This is by no means a thorough benchmark. However, it is a fair approximation of Airly's intended load for the DB.

How can the results be interpreted? It looks like write operation are CPU intensive. The following is how top command looked like on c4.2xlarge (note 783% CPU usage).

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2686 influxdb  20   0 1748m 1.1g  54m S 783.8  7.3   2:58.15 influxd

Thus, write load throughput scales up nicely, especially with the number of CPUs.

Read load is probably more I/O intensive (disk and memory). That's why we don't see much decrease of the read query execution time. Provided that working set fit into RAM.

To summarize, CPU power is very important for a write throughput. RAM size is important if most of your working set can fit into RAM. In this case, high memory instance types (r4, m4, etc) would be recommended. InfluxDB can fit a lot of data in 64GB of RAM.

Coming up next...

In the next part we're going to plot some graphs with Grafana using data from InfluxDB!

Comments

Drop your comment bellow in a twitter reply:

Show Comments

Get the latest posts delivered right to your inbox.