Monday, August 22, 2016

GSoC Work Submission Summary

Hi
This post is about the work summary pertaining to my GSoC 2016 project "Message Queues based Archiver".

The project involved creating archiving interfaces for the GNU Mailman which are based on message queues.
The initial stage involved investigating how archiving happens in Mailman, and how it needs to be tweaked with minimum disturbance in order to accommodate message queue based interfaces. The challenging part was to fit in one interface multiple archives pattern in an one interface one archive pattern. This involved changing the naming policy of the interface instances in Mailman core.
After investigating upon various api's offerred for message queuing backends, RabbitMQ was found to be the ideal choice for the use case. Though initially a simple message queue based interface(MQArchiver) was developed, it was later extended to incorporate a publisher-subscriber model  based interface named PubMQArchiver, which feeds mails to a common channel, and from where mailms may be picked up by a variety of Web Applications.
After incorporating different message patterns(such as direct queueing and publisher subscriber), the interface was extended to incorporate pluggable backends, where one can further develop a Backend class calling api calls in other message queueing api's based off a Base backend class. This makes our design open to accomodate other api's than RabbitMQ as well.
The archiver interfaces are configurable to set message queueing server host, exchange_names for common channels, backends and backend specific parameters. In case of failures(of missing configuration or parameters), the archiver falls back on default configuration(which is based on RabbitMQ backend with server host at 'localhost' and exchange name 'pubsub'.
A feature to delete archiver entries in database unable to access their archiver instances (called "Stale Archivers") was added to postorious(GUI).


Below are the links to repositories with my work :

1. I worked on developing archiving interfaces for GNU Mailman. These are located at :
https://gitlab.com/anirudhdahiya9/MailmanMQArchiver

2. My work involved bringing in changes to the GUI for deletion of "Stale Archivers", i.e. archivers whose database entries cannot be mapped to active archiver instances(called "System Archivers").
https://gitlab.com/anirudhdahiya9/postorius/tree/stale-archivers

3. This needed slight changes to mailmanclient as well, in order for requests to be mapped between postorius and the REST.
https://gitlab.com/anirudhdahiya9/mailmanclient/tree/stale-archivers

4. My project required slightly changing the core mailman archiving procedure.
https://gitlab.com/anirudhdahiya9/mailman/tree/MQArchiver

I must mention, that all this was possible with the constant guidance of my project mentor Florian Fuchs, and Barry Warsaw, among other contributors at GNU Mailman.

Thanks
Anirudh Dahiya

Tuesday, July 26, 2016

A long due post : Status Update

Hello reader,
Its been a fun filled month. After passing the mid evaluations(yay!), I started to fix the tests and complete the tiny bits left in the archiver interface. This was followed by cleaning the code, and catching exceptions that I missed out.
Once this was done, me and Florian had a meeting to decide what we wanted to do next, the MVP was complete, it was working fine(pending a bit of decision making at the core side), but now what next?
We had two ideas, both almost equally compelling for me : either extend our message queue system to include a publisher subscriber message pattern(go google that, its a fun concept), or to create an archiver side implementation to receive messages from the interface.
Florian left the decision up to me, and finally we selected the pub sub implementation as the next task.
Having a task at hand gets you working, excited! Thus I started, and found out that it required quite a remodelling at the mailman core side, particularly to change how the ListArchiver and ListArchiverSet models interact. Eventually I got stuck. I approached Florian, who was on a work trip at that time, and thus wasn't available. This gave me even more time to think about it on my own, and I started to come up with a solution. But then Florian got back, and I discovered that I had the wrong idea about what was expected to be implemented.
Once clarified, I figured out it was a fairly simple task given what wee had already made, and it was completed quite well in time.
Now, tests have been written, and its working fine. But wait for it, there's still that pending decision about changes in core side to be finalized. And I'd love to test the waters for the archiver side implementation.
Thanks

Friday, June 17, 2016

Gsoc2016: Pre Mid Evaluation

Hello Readers,
Its been some 20 days of intense discussions, coding, deadlocks, discussions again, confusion, and a moment of discovery. BUT it has been fun altogether.

As I mentioned in my previous blog, the integration part was done, as per me. And I felt things were moving fast, thanks to our preparation and design discussions during the community bonding period. WELL not so early! *silly me* :p .

SOME TECHNICAL STUFF - CAN SKIP
//////////////////////////////
When I actually implemented the message queue part, I noticed that the emails were being archived at only one of the archive using my interface. umm, big problem. I went codebase hunting again, searching through modules, which led to call other modules, further on. Finally, I made a small discovery. There was a python dictionary bring built, with the key_name being the interface name. What this meant was that the interface object created, was being instantiated only once.
Another issue was how to let my interface object know what archive it needs to send mail to?
This led me to intense discussions with Florian,and asking for advice on the developer mailing list. I was suggesting him that we could add a field in the cfg file where the archivers are registered, checking for this field in the IListArchiverSet interface, and changing the dict key for the instance. After some toying around with the mailman core codebase, I overcame this with changing the name. But Florian pointed out that we should go for a more generic design in the interface. This led us to change name for all the interface instances, irrespective of the name. DONE!!! The major hurdle was overcome .
//////////////////////////////

In the above section I mentioned some grave issue I faced, and how we(me and Florian) came out with a solution. Though we still need to discuss this with Barry, but the final solution seems to be somewhere around this.
After this, I restructured my interface code into better functions, and was done with the archiver interface.
Next was the time for test writing. Florian had been pushing for this since the beginning, and I had read about tox and unittest, and even planned about what all tests would be required to be written. But now when I actually set out to write them, I faced difficulties with how I was going to deal with RabbitMQ api calls.
Now "unittest.mock" came to our rescue, and I read a tutorial about it. CONFUSING it was.
I read more tutorials on mock, tried writing the tests, but was facing a block, like a writer's block. I approached people in my college for help on this, and a senior did try to help, but he was in US for intern, so our communication over chat wasn't very effective. 3 days were spent without writing a line of code, but only reading tutorials, and trying to think how to approach. I felt as if this was becoming a mental issue for me, and I was overthinking. Finally I approached Florian with the issue, and he agreed that our case was quite complicated to use mock, and that mock in itself was not very straightforward. Right after chatting with Florian, with somewhat feeling not sorry for myself anymore, I tried a few things with mock, and used tips accross 2 tutorials I had read over n over again. While actually writing tests for the first time in my life, I actually realized how they go a long way in modifying the code structure, and help us modularize things well. FINALLY IT WORKED OUT!!
Next moves were made quickly, with me preparing tox environment file, running nose2 tests, and cleaning out code for meeting pep8 standards. The job was done, and I feel I've well covered the goal I had set for mid term evaluation.
I must say that this journey has caused confusions and frustrations, but always rewarded with new learnings and joy.
I hope to pass the evaluations, and get the most out of this summer.
Thank You

Friday, May 27, 2016

Let's get it started!

Hello readers!
So the community bonding period has finally ended. It was a fun experience discussing ideas and designs with Florian, my mentor. We had numerous meetings with intense brainstorming involved, specially regarding the design part.

First let me share one of the design board I made over these meetings - https://docs.google.com/drawings/d/1b79uUHsDd1WHZQ9sZsVLTcra0eW0CA6Qc1h3ySpRY0E/edit?usp=sharing

OK so let me give you some context. I am building a message queue based archiving interface. For me, as I have been investigating various backends to perform the message queues part, implemeting message queue part is not really a challenge anymore. Infact I had developed prototypes for some message queue backends(RabbitMQ and ZeroMQ) while preparing my proposal and researching about project. During the community bonding period, I revisited RabbitMQ and found that it perfectly fits the needs, and is fairly simple to use and integrate. Thus in our first meeting, me and Florian decided to go with RabbitMQ as our primary backend(ZeroMQ will be looked into once we are done with this).


Now the challenging part remains understanding the minor details about how the archiver interface is loaded, what all functions it has to perform, and how it can be made to function under varying configurations. Also the configurations can be saved in ini style files, and these may be loaded and used on the fly. BASICALLY HOW WE ARE GOING TO INTEGRATE IT WITH CURRENT MAILMAN SERVER.

To tackle this, I spent days going over the codebase multiple times, and trying to come up with how the message queues part shall interact. One nice suggestion by Florian proved very fruitful to me was to go about it in a testing module manner, i.e. think of all the tests our interface should be subject to, and what should be the input and expected output of these. This led me into researching about unittest module, tox, and reading pre-existing tests for current archiving modules. This led me to a more firm understanding of the interface design.

Next, we actually discussed design. I had some things in my mind, but over irc it gets difficult to express those ideas. This led me to making diagrams explaining my ideas, and also allowed Florian to ask me his doubts about my ideas, and also provide suggestions. After two or maybe three meetings of intense discussions, we finalised on a simple architectural design(calling interface each time for each archiver) that is practical and clean. But still I have a solution in my mind(using publish model) which has one flaw, and I shall try my hands at it once the simple design is implemented.

The coding period has already begun, and as of writing this, I have made the interface python package, installed it in my virtualenv, and it shows up in postorius. It also has a bit dirty code for the message queue part, but as I mentioned, that isn't a problem.
Hoping to rock this project!
Have an awesome Summer!

Tuesday, May 3, 2016

The Proposal - Explained

Hi
Now let's get to what I am going to work on over the summer. As mentioned in the last post, this one is going to elaborate my understanding and ideas for the project.
I applied for the project Message Queue based Email Archiver under the org GNU Mailman. GNU Mailman is an open source org which develops and maintains Mailman, the mailing list management system. Mailing lists are used in more places than one might think about to begin with. They're probably being  used in the institution and the organisation(s) you're enrolled in, in the promotional schemes of big corporates and so on. You just might be a member of one of those without even realising it (Think of some spam that you receive!). What mailman does is to manage these mailing lists, their moderators, owners, banned lists and so on. Also it maintains a server to which all the mails to registered mailing lists are directed, validates that email, puts it on hold for moderation by list admins if required, calculates the recipients and delivers the mails into their inboxes. Wait, another thing, it also has an option to archive the mails it sends out to list subscribers, and thats where my project steps in :) .

Right now, the archiving is done something like this -



Plugin.pngWAIT! If you're freaked out by the complexity, worry not. I'll explain in broader sense.
The mailman server and the archive servers are currently separate, i.e. decoupled, which is a good thing, keeping separate things separate as we can.
A mail to can be archived by to different archives by their own archive specific methods(like a POST request, or a mail( inside a mail :D ) to the archive server.
Despite such a neat design, we have some issues here. Firstly, currently the POST request method isn't secure enough, thus archive server and mailman server need to be in the same subnet(or even local machine). Secondly, what if our archive server decided to go off to sleep for a teeny weeny lil second. BOOM. message lost, an error code received, no retryability(right now). Enter the scene message patterns.  These beautiful architectural solutions provide some fascinating ways in which machines can pass messages among themselves. Also, these offer options for a queue of messages to be made between our mailman server and the archive server, where messages can be stored for the archive server to consume as and when it can(in an asynchronous manner). Another thing, we can attach as many archive servers, or even other possible web apps to these queues and not bother our mailman server. Also, the issue about whether our mail was received by the archive can now be handled by our message queuing system, saving out mailman server quite some headache. These are just some of the possibilities that message patterns bring in.

My project implementing an interface to accomodate such message queue based systems in our current system, and then subsequently implement message patterns in various backends. The choice of the backends here is dependent on what features it provides. I investigated some backends during my study about the project, and found RabbitMQ quite good for the job. Also ZeroMQ and Redis  were also found to be possible good candidates. A small noting made made by me on these  in my proposal is as follows - 
 "
  • RabbitMQ - It also offers flexible and customizable routing options between publisher and message queues. This could be helpful to publish to select subset (based on mailing list) subscribers listening to our publisher. Also its reliability feature suits the requirements in our context as it provides both acknowledgements as well as personalised queues for each subscriber.
  • ZeroMQ - Unlike others, it can run without a dedicated message broker. It basically provides web sockets which can be configured to customize message patterns, and can thus be used to suit specific personal needs. Though highly customizable, it might require explicit implementation of a few features.
  • Redis - Though it started with some initial confusion about lack of any reliable way to message transfer after I approached the redis community for solutions, I subsequently found quite a nice solution in this blog post. It offers features like persistence to disk and message reference queues per subscriber, although no approach was found to provide retryability in case of failure.  Thus , it is a lightweight broker system which can be used to provide reliability. 


Following was the timeline I proposed after much deliberations with myself, mentors, and some seniors who had worked on such projects before.

Timeline :


Till 10 May
Find out more about Message Patterns and understand Mailman and Archiver Interface code architecture. Get familiar with ‘tox’, the testing system.
10 May - 22 May
Find out more about the various backends available and corresponding message patterns supported. Discuss and finalise various architectural designs for messaging system with the mentors.
23 May - June 1
Implement plugin for IArchiver supporting multiple backends. Test the plugin with dummy messaging systems.
June 2 - June 12
Implement a message queue system for a viable backend. Finish with the minimum viable product.
June 13 - June 19
Write unit tests for the mvp. Ask for community review of the product. Catch up with any leftover tasks
June 20 - June 27
Code submitted for mid term evaluation. Write basic documentation for mvp. Reconsider the architecture for message queues and plugin.
June 28 -July 8
Improvise upon the message queue system based on the evaluation. Research about additional backends and message patterns viable for the job.
July 9  - July 25
Discuss and implement more message queuing patterns suitable. Implement support for suitable alternative backends. Attempt to incorporate list events into the messaging system.
July 26 - August 1
Refactoring Week. Seek community review for existing product and design architecture. Write integration tests.
Aug 2 - August 13
Complete any leftover work. Write documentation. Improvise upon reviews by mentors.Attempt to incorporate extensible metrics(as mentioned above). Write any tests left(system testing). Complete any unachieved milestone.
August 14 - August 20
Tidy up code. Complete documentation. Perform rigorous testing upon system. Submit for final submission.

If I am able to complete the intended tasks as part of the project, I intend to incorporate events such as list creation, moderator/owner assignment and other list events as part of a larger more general archiving system. One approach I feel would be good is to implement a handler/system of handlers which direct such list events to the archive servers.

 I hope I have well explained the objectives for this project. Please feel free to comment here or contact me by email if you have any suggestions/doubts.
Cheers

Monday, May 2, 2016

GSoC - Applied and Selected. Yay!

Hello Readers
The past month has been full of activities, GSoC application, assignments, exams, and desperate waiting for results(GSoC). I'd like to emphasize more upon my application for gsoc, the proposal, waiting for results, and the RESULT.

As I approached the mid of March and GSoC proposal deadline neared(25th March), I came to a realization, a horrific one, I HAD NOT EVEN SELECTED A PROJECT I WAS GOING TO APPLY FOR. Most applicants, and even my friends had finalized their projects and had had lengthy discussions with the mentors. I felt it was the end of my GSoC journey, and felt that yes it was fun contributing to GNU Mailman,BUT my dreams of GSoC were going to be shattered.
As one last measure to save the sinking ship, I went through the project list once more, and found Message Queue based Email archiver. Now this was completely new to me, I had mostly worked in the Mailman core area, not the email archiving part. But with no options left, I went through the whole Mailman architecture, understood where and how archiving is being done and what the project could mean. After this only, I contacted the project mentor, Florian (a wonderful person as I would find out). I did so keeping in mind the little time left, and to make my conversation with the mentor more fruitful, since discussing a project with little background isn't really an effective way to work.

Approaching Florian made me feel more confident about the possibility of still having a chance at GSoC. He seemed interested about my application, and liked that I had done some homework before approaching him. Now it was time for more intense action, next 4-5 days were spent understanding what message queues are, what backends are available and what they offer. This involved going through pycon talks, approaching seniors(Thanks to Nehal and Anhad) who had some experience with this, and reading articles and documentations.
Now with just 3 days left for proposal deadline, I became more anxious, my proposal hadn't even been drafted yet! Also I need to get it reviewed by mentors and seniors who had been through the drill.
One thing I learnt - Writing proposal requires a very clear idea of the project and the existing architecture (that is when you genuinely want to write a good proposal). Thus while drafting mine, I had to again go over a lot of things, and ensure I really had a good idea of things involved. It took me while 2 days to complete the draft, and now it was time for reviews.
OOPS!! The timeline I proposed is crappy! Also I hadn't clearly mentioned the minimum viable product the deliverables from the project. Mentors and seniors pointed out that the timeline I had developed was not practical. It required a whole lot of thinking and asking for advice from seniors to get that straight. Also, I added what I thought should be finally with a thumbs up from mentors, I submitted the proposal and hoped for the best.(That night after the proposal deadline, me and my friends decided to go for a walk, which turned into a long one, longer than what was comfortable, crap! we walked for around 15 kms)

Acads resumed, struggled with assignments, and thought about the results, FOR A MONTH!! That was tiresome. I felt it was indeed fun to have applied for GSoC, and I had worked hard. Now even if I wasn't going to be selected, i would not have had any regrets. It had certainly given me enough experience to crack it the next year, and exposed me to the beautiful world of open source community.

Result day, I was excited and a bit nervous the whole day. My whole group of friends had applied, and we all were anxiously going over the gsoc irc channel for any updates. Finally result time. SELECTED!!! YAY. It was a dream come true. Another friend had also cracked it. I was on cloud nine. Now as the customary celebration on my college for any achievement and birthday is, I was given GPL. Don't ask for the full form, it just means you are gonna receive the spanking of your life by your friends, followed by a juice treat at the canteen.

I would like to take this opportunity to thank my friends who kept me motivated(Motwani, Battan, Chenoo and Sanket), seniors(esp, Bhavesh Goyal, Ghaisas, Nehal and Anhad) who helped me out at various points of this journey, and people at GNU Mailman who accomodated me doubts and discussions and guided me.

Next post, I am going to share the particulars of the proposal I wrote, till then here's the final draft -
https://docs.google.com/document/d/1-ElRY-7IF4IlTqK_h6JAVCOFgJsant_dMTjGJKgxBV4/edit?usp=sharing

Sunday, March 27, 2016

The Journey So Far

Hello Readers!
Welcome to the first post!
I could go on and introduce myself with the boring details, but lets just bring it with the flow.
This post is about how it got started, wait, started for GSoC! Now I was approaching the second year of my college with not much title of achievement, other than being just another decent student excited now and then by simple experiments into the world of problem solving and creating my own stuff.
Though I wanted to start contributing to open source from a long time, it was only by January that I could really begin, seeing many of my friends searching organizations for GSoC. GNU Mailman was the first one I could begin with, it being a familiar concept(my college uses mailing lists for courses, not sure how it goes elsewhere) and matching my skill set well, and it turned out really well with them. The mentors were all nice and supportive, once even patiently dictating to me instructions on how to push successfully.
 And ah yes, that first Merge Request gave me a shocker too. First, I was using their bundler package to build and fix (OK stupid me), and secondly being ignorant about the system of upstream and origin and main branch and stuff. Another hurdle I faced was with Gitlab CI, with me using my college internet which has proxy, everything tends to get difficult.... It took me 2 weeks of head scratching, and loosing my OS(Ubuntu 15.04, now on 14.04) to get things straight.
 But I learnt a lesson, and a very valuable one - STICK WITH AN ORG. GIVE IT YOUR FULL DEVOTION. I saw many friends who switched org after org whenever they faced any hurdle, eventually getting lost. And I am worried this might lead people to never coming back to the open source community, out at least feeling uncomfortable about it. Believe me, it's a wonderful world out here, a break away from those regular assignments, and a feeling of seeing your contribution being used by people, a lot of them at times.
Cheers