Scaling data services

Every day Apility.io’s scraping services analyze almost a hundred black and blocking lists distributed throughout the network. Between IP addresses, email and domains resource the number of active resources in our database fluctuate between 3 and 4 million items. Although this number of items is easily manageable with today’s technology, when we talk about the number of transactions carried out in the system, the numbers skyrocket.

Our database contains several million active records of IP addresses, domains, and emails. But this is just the tip of the iceberg because every day we process more than a million transactions on this database. A resource such as an IP address can enter and exit a blocking list on multiple occasions, which added to the fact that it can enter and exit different lists at the same time allows our users to get an idea of the magnitude of the information we handle.

Live and History resource data

All these transactions do not actually operate on a single database, but on two with different characteristics:

  • Database with ‘live’ data: It serves customers with ‘fresh’ data as quickly as possible. In addition, this database replicates in several read-only databases distributed around the world that are what our customers actually consume when they call the API. This database stores the result of the latest updates from all the black and blocking lists we track.
  • Database with historical values: It stores as much data as possible assuming a very reasonable sub-second performance. It stores all changes made by all transactions to the database with’ live’ data. This huge log stores the changes made over time.

We have developed a custom process that synchronizes these two databases. They have completely different technologies: when a change is made to the database with live data, it is automatically reflected in the historical database. In addition, a process periodically checks that there are no inconsistencies between one database and the other.

What good is this for me?

There is no doubt that the first benefit that comes to mind is to use this information for forensic analysis. Knowing the behavior over time of resources such as an IP or domain can help cybersecurity experts find the root causes of problems and incidents. For those cybersecurity experts who wish to know the historical activity of these resources in our databases, we now make available to them complete access to all the history we have available.

The resources available are:

  • IP address
  • Domains
  • Emails

And can be filtered by the resource and the time. You can test this feature from our search engine right way:

Apility.io Search Engine Resource History

In this example the IP address now is clean, but it has a very recent history of abuse. Somebody has tried to SSH servers in the net without permission and it has been reported to Blocklist.de. If this IP address belongs to you… I think you need to have a look at that machine!

How the new API calls works

The new endpoints available in the API documentation are:

GET https://api.apility.net/metadata/changes/ip/<IP>
GET https://api.apility.net/metadata/changes/domain/<DOMAIN>
GET https://api.apility.net/metadata/changes/email/<EMAIL>

If we want to do the exact same search for the example above:

curl -i -H "X-Auth-Token: YOUR_API_KEY" -X GET "https://api.apility.net/metadata/changes/ip/AAA.BBB.CCC.DDD"

The IP for this example is  AAA.BBB.CCC.DD is the IP of the example (anonymized in this blog post for obvious reasons). The response is:

{
    "changes_ip": [
        {
            "blacklist_change": "FAIL2BAN-ALL,FAIL2BAN-SSH",
            "blacklists": "",
            "timestamp": 1520187309162,
            "command": "rem",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-ALL",
            "blacklists": "FAIL2BAN-ALL,FAIL2BAN-SSH",
            "timestamp": 1520108359925,
            "command": "add",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-ALL",
            "blacklists": "FAIL2BAN-SSH",
            "timestamp": 1520104871843,
            "command": "rem",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH",
            "blacklists": "FAIL2BAN-ALL,FAIL2BAN-SSH",
            "timestamp": 1520072166966,
            "command": "add",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH",
            "blacklists": "FAIL2BAN-ALL",
            "timestamp": 1520068580059,
            "command": "rem",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH,FAIL2BAN-ALL",
            "blacklists": "FAIL2BAN-SSH,FAIL2BAN-ALL",
            "timestamp": 1519946120669,
            "command": "add",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH,FAIL2BAN-ALL",
            "blacklists": "FAIL2BAN-SSH,FAIL2BAN-ALL",
            "timestamp": 1519946118892,
            "command": "add",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH,FAIL2BAN-ALL",
            "blacklists": "",
            "timestamp": 1519661715742,
            "command": "rem",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH",
            "blacklists": "FAIL2BAN-SSH,FAIL2BAN-ALL",
            "timestamp": 1519647423138,
            "command": "add",
            "ip": "AAA.BBB.CCC.DDD"
        },
        {
            "blacklist_change": "FAIL2BAN-SSH",
            "blacklists": "FAIL2BAN-ALL",
            "timestamp": 1519643865538,
            "command": "rem",
            "ip": "AAA.BBB.CCC.DDD"
        }
    ]
}

The changes_ip JSON object contains a list of transaction_ip objects. Each transaction_ip object will return:

  • timestamp: The UNIX time in seconds when the transaction was recorded.
  • command: Type of transaction in the database: ADD to the blacklist or REMove from the blacklist.
  • ip: IP address of the transaction
  • blacklist_change: Blacklist added or removed thanks to the transaction.
  • blacklists: List of blacklists after the execution of the command and the blacklist change.

Because there may be a large amount of data available, it is possible to restrict queries by providing:

  • Unix time in seconds from which the query will be made.
  • The number of items to be returned per page.
  • The page number.

Calling this API always returns items from the most recent to the oldest, so Unix time always indicates the freshest transaction. If none of these parameters are provided, the API call will return the history from the current date with a maximum of 10 items.

Every call made to the API will count as a new HIT in the quota of the user. You can read the full API details in the Resource History API documentation.

What’s next?

In order to use this service, it is necessary to register in the platform and obtain an API Key. You are allowed to use it even with a free account, so all you have to do to start using the service is register now!