Redis tech tip #1

In order to delete members from your Redis Geo-spatial collection, you can use ZREM (underneath the hood, the data is stored in a sorted set)

Add a few (Indian) cities

GEOADD IndianCities 28.6108127 77.2060241 NewDelhi

​GEOADD IndianCities 12.9715987 77.5945627 Bangalore

GEOADD IndianCities 13.0826802 80.2707184 Chennai

The weather in Chennai humid – not my kind.. let’s remove it

ZREM IndianCities Chennai

You can also delete the entire collection using DEL IndianCities (usual stuff…)

Cheers!

Advertisements
Posted in nosql, redis | Tagged , , , | Leave a comment

NATS & Kafka: random notes

This is not NATS vs Kafka by any means – just jotting down some info. NATS recently joined CNCF (which host projects like Kubernetes, Prometheus etc. – look at the dominance of Golang here!) and that’s when it caught my attention

I have been taken aback by its simplicity and performance (yet to test drive it fully – but the numbers are out there)

  • Kafka runs on JVM (Scala to be specific), NATS is written in Go
  • Protocol – Kafka is binary over TCP as opposed to NATS being simple text (also over TCP)
  • Messaging patterns – Both support pub-sub and queues, but NATS supports request-reply as well (sync and async)
  • For (horizontally) scalable processing,
    • NATS has a concept of queue (with a unique name of course) and all the subscribers hooked on same queue end up being a part of the same queue group. Only one of the (potentially multiple) subscribers gets the message. Multiple such queue groups would also receive the same set of messages. This makes it a hybrid pub-sub (one-to-many) and queue (point-to-point)
    • Same thing is supported in Kafka via consumer groups​ which can pull data from one or more topics
  • Stream processing – NATS does not support stream processing as a first class feature like Kafka does with Kafka Streams
  • Kafka clients use a poll-based technique in order to extract messages as opposed to NATS where the server itself routes messages to clients (maintains an interest-graph internally)
  • NATS can act pretty sensitive in the sense that it has the ability to cut off consumers who are not keeping pace with the rate of production as well as clients who don’t respond to heartbeat requests.
    • The consumer liveness check is executed by Kafka as well. From what I recall, this is done/initiated from the client itself & there are complex situations that can arise due to this (e.g. when you’re in a message processing loop and don’t poll ). There are a bunch of configuration parameter/knobs to tune this behavior (on client side)
  • Delivery semantic – NATS supports at-most once (and at-least-once with NATS streaming) as opposed to Kafka which also supports exactly-once (tough!)
  • NATS doesn’t seem to have a notion of partitioning/sharding messages like Kafka does
  • No external dependency in case of NATS. Kafka requires Zookeeper

… work in progress…

 

Posted in Distributed systems, Kafka, messaging | Tagged , , , | Leave a comment

NATS – quick start with Docker

Here is a hello-world example (on Github) for getting started with natsio – a distributed messaging server written in Go

It consists of the following components which are managed via Docker Compose

Cheers!

Posted in Distributed systems, messaging | Tagged , , , , | Leave a comment

Session @ Oracle Code San Francisco 2017

Here are the details for my talk @ Oracle Code (a track at JavaOne 2017) – Streaming Solutions for Real time problems (Stream Processing solutions using Apache Kafka, Kafka Streams and Redis)

Code (Github) – https://github.com/abhirockzz/accs-ehcs-stream-processing

Video

Slides

Cheers!

Posted in Distributed systems, Kafka | Tagged , , , , , , , , , | Leave a comment

Debezium test drive

Debezium is an open source, distributed change data capture system built on top of Apache Kafka. I tried it out and the project is available on Github

Setup

Details are in the README. It uses the Debezium tutorial as a background but the setup is very simple since it uses Docker Compose – one command (docker-compose up --build is all it takes to get going !

Components

  • MySQL – the database
  • Debezium – the CDC platform which tails MySQL binlog and pushes its logs to Kafka using its MySQL Kafka Connect connector
  • Consumer – its a Java EE application which consumes the (DB change) events from Kafka (topic), derserializes and parses it out and logs it

results.jpg

Cheers!

 

Posted in Distributed systems, Kafka | Tagged , , , , , , | 1 Comment

Kafka & Websocket

For those who are interested in an example of Kafka working with the (Java EE) Websocket API, please check out this blog . There is an associated Github project as well

Cheers!

Posted in Distributed systems, Kafka | Tagged , , | Leave a comment

Kafka producer and partitions

There are only a few possible ways to specify partitions while using the Kafka Producer API

  • Just specify it in the ProducerRecord itself
  • If key is not null, (by default) Kafka will hash your key and calculate the partition
  • If key is null, (by default) Kafka will round-robin b/w all the partitions (to load balance the data)
  • If not, just use a custom Partitioner

If interested, you can also check out the Kafka Partitioning blog

Cheers!

Posted in Distributed systems, Kafka | Tagged , , | Leave a comment

Kafka Streams state stores…

This blog explores some common aspects of state stores in Kafka Streams…

Default state store

By default, Kafka Streams uses the RocksDB as it’s default state store

In-memory or persistent ?

This parameter of the state store is configurable. RocksDB can work in both modes and you can toggle this using the Stores factory API.

Once the StateStoreSupplier (api) is created it can be used in the (high level) Kafka Streams DSL API as well as the (low level) Processor API

Persistent storage medium

In case of persistent stores,  RocksDB (default) flushes the state store contents to the file system which can be specified by the StreamsConfig.STATE_DIR_CONFIG (this is not a compulsory parameter)

config.put(StreamsConfig.STATE_DIR_CONFIG, "my-state-store")

Notes

  • this state store is managed by Kafka Streams internally
  • it is also replicated to Kafka (for fault tolerance and elasticity) topic – this is log compacted topic and nothing but a changelog of the local state store contents (this is the default behavior which is also configurable using the enableLogging method or can be turned off using disableLogging 
  • it is possible to query these state stores using interactive queries feature

DSL vs Processor API

There are differences in the way DSL and Processor APIs handle state stores

Read-only vs writable stores

the DSL API restricts access to a read-only view of the state store (via ReadOnlyKeyValueStore) as opposed to the Processor API using which you can get access to a writable view which allows you to mutate the state store

Custom state stores

As of now (0.10.2) it’s not possible to plugin your custom state store implementation when using the DSL API. This means that with the DSL API, you are limited to the RocksDB based state store implementation. With the Processor API, it’s possible to configure a custom implementation (more on this below)

Custom state stores

As mentioned earlier, the Processor API gives you the freedom to use your own state store in a streams application. You can wrap your custom state store on top of the Kafka Streams API itself – by implementing the required interfaces like StateStore , StateStoreSupplier etc. More details here. Doing this will allow you to query the state store using standard Kafka Streams APIs

But …..

A custom state implementation might already have a query feature.. So does it still make sense to wrap it with the Kafka Streams interfaces/APIs ?

the answer is yes.. if…

you want to leverage the fault tolerance and elastic capabilities which are come for free (if configured using aforementioned parameters) with the Kafka Streams API – this is due to the changelog topic which is created in Kafka (corresponding to each state store). A custom state store implementation which is not based on the Streams API

  • will not interact with the Kafka broker to backup the state stores, as a result of which
  • task re-distribution during scale-in/scale-out will not be possible since the participating application nodes will not be able to synchronize the latest changes made to other local stores due to the lack of a global checkpoint

Note that this problem can be circumvented with the help of an inherently distributed state store implementation which all the stream processing application instances can access

Other stuff

Cheers!

Posted in Distributed systems, Kafka | Tagged , , , , | 1 Comment

Docker-ized Kafka Streams applications

Here is another example of a Kafka Streams based application.. this time, it’s about running it in Docker containers – spawn more containers to distribute the processing load. More details in the README

docker-kafka

Cheers!

Posted in Distributed systems, Kafka | Tagged , , , , | Leave a comment

Kafka Streams based application

A Kafka Streams sample application is available on Github… This is a microservice (packaged in form of an Uber JAR) which uses the Kafka Streams Processor (low level) API to calculate the Cumulative Moving Average of the CPU metrics of each machine in a distributed manner

  • A producer application continuously emits CPU usage metrics into a Kafka topic (cpu-metrics-topic) and consumer application (instances) do the computation
  • Consumers can be horizontally scaled – the processing work is distributed amongst many nodes and the process is elastic and flexible thanks to Kafka Streams (and the fact that it leverages Kafka for fault tolerance etc.)
  • Each instance has its own (local) state for the calculated average. A custom REST API has been (using Jersey JAX-RS implementation) to tap into this state and provide a unified view of the entire system (moving averages of CPU usage of all machines)

….. more in the project README

Cheers!

Posted in Distributed systems, Kafka | Tagged , , , | Leave a comment