The post-office & the postman

If we were to talk about old messaging system where there exist post-office, postman & mailbox. each component had its own functionality that we looked for when trying to visualize how those component where to interact in a computerized version.

Simple scenario:

Mail is added in mail box
Postman arrive pick mails from his area mailboxes and take them to the post-office.
Post-office organize mails by areas.

Postman takes mails related to his area "distribute it in mailboxes".
A person can go to post-office and pick his own mail "in case of failure or wishes for early delivery".

Mapping in a computerized version:

Scenario: Observer design pattern which can use push or pull scenario, to inform those whom are registered for an event about its occurrence.
Component:

Post-Office = Message-Broker
Post-Office-Box = Message-Storage-Validity
Mailbox = Topic/Queue
Postman !!! where's the postman ?

Apache kafka act as a message broker which decouple message processing from publisher, also it can buffer unprocessed message, it follow pull data scenario "Consumer pull data". below is kafka capabilities:

publish / subscribe: act as a messaging system Topic / Queue.
Store stream: act as a storage system that can keep data for 2 day "configurable".
Process stream: act as a decorator design pattern.

Apache flume act as the post-man, which will deliver the event from a point to another, it follow push data scenario, below is flume capabilities:

deliver data from a point "source" to another "sink" (can be configured to push data to a port).
Modify or drop events in the flow "act as a decorator".
can store data in a partitions ( /dataDir/year=%y/ ).

Comments

Not all Ps sting

If someone meant to say Ps and pronounce it Bees. would this confuse you :). Ps is for the P that is the start of Properties and Practice Each application should have some properties and follow certain practices. Properties: Below are 5 properties we should try to have in our application with a small description of how to include them Scalable, Scale => Increase workload (horizontally scaling) Statless, no state should be shared among different application instances, Concurrency, concurrent processing = Threads. Loosely coupled, decompose the system into modules, each has minimal dependencies on each other "modularization", encapsulating code that changes together "High cohesion". API first, Interfaces, implementation can be changed without affecting other application. favor distribution of work across different teams. Backing Services, "DB, SMTP, FTP ..." , treating them as attached resources, meaning they can easily be changed. Manageable, changi...

digging

Open SVG image in a browser, use arrows to navigate When you say digging, 1 st thought, most would think that you would plant a tree. How about digging in DATA 1 st Hadoop is a framework for processing large chunks of data, consisting of 2 modules HDFS: Hadoop Distributed File System "for managing files". Map-Reduce: hadoop methodology for processing data, where big chunks of data is divided into smaller chunks, each directed to the map f n to extract the needed data from, then the reduce f n where the actual processing we need takes place. Hadoop work on the whole data, in one time, so it is considered Batch processing. 2 nd Hadoop eco-system It would be annoying, that each time you wish to do a task, you write a java code for each of the map function, then the reduce function, compile the code.. etc. yet Hadoop eco-system provide us with tools that could do so for us PIG: a scripting language "that is translated in the background to a ...

Change

Search This Blog