Technical

Put Your API Users on the Map with Gloo Gateway, ClickHouse, and Grafana

Seeing where (in terms of geographic location) you are receiving your requests from is invaluable to understanding the behavior and distribution of your users.

With this information, you can identify right away if:

  • You are slower at serving requests in certain geographical locations.
  • You are getting requests from locations where your service shouldn’t be available.
  • Your WAF is not working properly.

And many more!

Now it’s much easier to determine whether you should look into using CDNs, support multi-regions, or fine-tuning your GeoIP-based WAF policies.

The possibilities are endless, but all come with the requirement of having the aforementioned geo-information available. Without that, it can be challenging to grep these kind of insights out of raw log lines.

Explore how you can easily attain that data in Gloo Gateway with the help of popular open-source projects like ClickHouse and Grafana.

Getting Started

First, make sure you have have followed the steps in our Gloo Mesh Gateway workshop right until the very end.

Note: you can achieve the same functionalities by making sure you have a ClickHouse database with IP addresses, and use that instead of gloo_api_logs.

Then, let’s take a look at the database and its content that we will use to map IP addresses to geo coordinates!

select
    *
from
    url(
        'https://raw.githubusercontent.com/sapics/ip-location-db/master/dbip-city/dbip-city-ipv4.csv.gz',
        'CSV',
        '
            ip_range_start IPv4, 
            ip_range_end IPv4, 
            country_code Nullable(String), 
            state1 Nullable(String), 
            state2 Nullable(String), 
            city Nullable(String), 
            postcode Nullable(String), 
            latitude Float64, 
            longitude Float64, 
            timezone Nullable(String)
        '
    )
limit
    20;

You should see a similar result after you click Run.

see where IP ranges start and end, what's the corresponding country code and state information, longitude and latitude, etc

You can now see where IP ranges start and end, what’s the corresponding country code and state information, longitude and latitude, etc. Basically all the information we need is here.

Then, let’s create a new table with this GeoIP data!

create table geoip_url(
    ip_range_start IPv4, 
    ip_range_end IPv4, 
    country_code Nullable(String), 
    state1 Nullable(String), 
    state2 Nullable(String), 
    city Nullable(String), 
    postcode Nullable(String), 
    latitude Float64, 
    longitude Float64, 
    timezone Nullable(String)
) engine=URL('https://raw.githubusercontent.com/sapics/ip-location-db/master/dbip-city/dbip-city-ipv4.csv.gz', 'CSV');

Now you should have ~3M records inserted!

select count() from geoip_url;

record count

Formatting the Data

In ClickHouse, there’s a structure called dictionary that maps keys to attributes. These can be also updated dynamically, making them a perfect fit for our use-case.

Because ClickHouse’s ip_trie dictionary expects the data to be in CIDR notion, we have to convert the data we have, and do some preparation. Please see this page for more information!

create table geoip (
   cidr String,
   latitude Float64,
   longitude Float64,
   country_code String
) 
engine = MergeTree() 
order by cidr;

Then ingest the rows!

insert into 
    geoip
with 
    bitXor(ip_range_start, ip_range_end) as xor,
    if(xor != 0, ceil(log2(xor)), 0) as unmatched,
    32 - unmatched as cidr_suffix,
    toIPv4(bitAnd(bitNot(pow(2, unmatched) - 1), ip_range_start)::UInt64) as cidr_address
select
    concat(toString(cidr_address),'/',toString(cidr_suffix)) as cidr,
    latitude,
    longitude,
    country_code    
from
    geoip_url;

You should have receive an OK. response.

Now, let’s finally create our dictionary that we mentioned!

create dictionary ip_trie (
   cidr String,
   latitude Float64,
   longitude Float64,
   country_code String
) 
primary key cidr
source(clickhouse(table 'geoip' USER 'default' PASSWORD 'password'))
layout(ip_trie)
lifetime(3600);

If you run this query, you should have data formatted in CIDR notion.

select * from ip_trie limit 20;

data formatted in CIDR notion

After this, we can start to perform IP lookups. Let’s try it!

select dictGet('ip_trie', ('country_code'), tuple('108.48.2.1'::IPv4))

You should see the corresponding country code coming up.

country code look up

Now, if you have followed all the previous labs, you should have logs with various IP addresses in gloo_api_logs table.

If you run this query, you should see the client IPs for your most recent API queries:

SELECT
  CAST(LogAttributes['client_address'] AS IPv4) as client_address,
  count(*) AS Count
FROM
  gloo_api_logs
GROUP BY
  client_address
ORDER BY
  Count Desc;

Here’s what I have in my case:

client IP

It’s possible that you only see an internal IP address here due to having proxies in front of your gateway.

To get the actual IP addresses, you have to specify the number of trusted proxies before your gateway pods.

To trust two proxies (that is common, when you are using a cloud provider’s load balancer in front of your gateway pods), here’s a patch that you can apply to your gateway:

# gateway-patch.yaml
spec:
  template:
    metadata:
      annotations:
        proxy.istio.io/config: '{"gatewayTopology" : { "numTrustedProxies": 2 } }'

Then you can apply it via:

kubectl --context ${CLUSTER1} -n istio-gateways patch deployment $(kubectl --context ${CLUSTER1} -n istio-gateways get deploy -l istio=ingressgateway -o jsonpath='{.items[0].metadata.name}') --patch-file gateway-patch.yaml

Now, you can send requests again via the Gateway to populate the proper IP address to the access logs!

That Will Not Scale

Alright, now let’s plot all of our access logs over a GeoMap!

Hold on, let’s take a step back.

If you have a fairly popular service, visualizing all the requests can get quite expensive!

We can, however, use the built-in geohashEncode() function to hash the coordinates, aggregate them, and log10() normalize the data to show density in a scalable manner!

with coords as (
    select
        CAST(LogAttributes['client_address'] AS IPv4) as client_address,
        dictGet(
            'ip_trie',
            ('latitude', 'longitude'),
            tuple(client_address)
        ) as coords,
        coords.1 as latitude,
        coords.2 as longitude,
        geohashEncode(longitude,latitude,4) as hash
    from 
        gloo_api_logs
    where
        longitude != 0 
        and latitude != 0
) 
select 
    hash, 
    count() as heat, 
    round(log10(heat),2) as adj_heat 
from 
    coords 
group by 
    hash;

The final part of the query basically counts the number of records for each geohash (to get the “heat” or density of data points in that area), and calculates an adjusted heat value using the logarithm of the heat value. This adjusted heat value is used to create a heat map where areas with more data points have higher values, making it a perfect format to visualise with Grafana.

You should have a similar output like this:

heat value

Now, we can use this to power a Grafana dashboard to plot all the data over an actual map!

Grafana dashboard

Try Gloo Gateway Today

As enterprises embrace cloud native technologies and seek modern solutions for API management, Solo.io’s Gloo Gateway emerges as a valuable choice. With its superior developer experience, streamlined operations, improved performance, and cost-saving advantages, Gloo revolutionizes the API management landscape.

The success stories that F100 financial services and telecom companies can demonstrate, coupled with the comprehensive feature set of Gloo, make it a worthy investment and a key driver of business growth in the cloud native era.

Shoutout to Zach Naimon for his blog post that was extensively used for this blog post.