Sunday, September 06, 2009

Web Scalability & Performance: Real Life Lessons

Following is a presentation that I made at TechWeekend in Pune on 5th September. About sixty hard-core technical geeks were present at the sessions. Following is the presentation that I made. Feel free to share.Web Scalability & Performance
You can reach me on Twitter @mukulneetika .

Labels: , , , , , , , , , , , , , ,

Monday, October 06, 2008

5 Hacks for Startup Hiring in India

Here are some thoughts on hiring for a Startup in India. My experience with hiring in India for the last fifteen years, in one word, has been – “Awesome” ! In Pune I have met some of the best Programmers and Designers in the world and work with many of them. There are some of most hardworking, smart and knowledgeable individuals, who love to crank code (read an interesting post here, under the “People” section). I love working with the great guys at Komli!

Hiring in India is different than hiring in other parts of the world. The following thoughts are written for an employer in mind, especially a startup employer. These thoughts are in a random order, and based on personal experiences. Please don’t equate my consistent use of ‘he/him’ with a gender bias.

1. “Offer acceptance is not equal to JOINING”

This is something you learn the hard way. It is very difficult to believe that a candidate talks so nicely, and accepts your job offer, only to NOT show up on the joining date. This is a shocker, which takes several days to recover from. If the candidate is good he calls up/e-mails you a few days in advance, telling you that he cannot join. Many will not inform you, and simply won’t show up on the joining date.

My recent experience – “a senior managerial candidate, who was relocating from the USA to India, accepted the offer after several negotiations that went on for weeks. He was very happy and I was very happy, that we have a deal. The day he was supposed to land in Pune, and join after a few hours of landing – he didn’t show up. I patiently waited until the evening, and next morning. Emailed him, and found out that instead of flying into Pune, he landed into Bangalore, and joined another large company yesterday. How nice.”

Accept the fact – a hire is only a “probability” until the day he shows up. This probability increases as the date of joining comes near. An offer acceptance on e-mail, or in hardcopy are still probabilities of joining.

The way I would handle this is – a) don’t count on a hire until he joins, b) always plan for backups – no hiring is complete until the last guy joins, c) keep calling the candidate every few days, to find out if he is going to join – if he tells that he is not joining, it’s better to know that early on, rather than on the last date, and d) if the numbers are large – over-hire, to compensate for the probability.

2. “The Resume”

I have found that many resumes have inaccurate information in them. You can actually build a “probability-framework” on what percentage of a resume is true – based on some of the key parameters of the resume – such as skill-set (Java, PHP, RoR, ASP.NET, C/C++, C#, etc). Try that, it works.

The way I handle this is – talk to the candidate, find out what he has done. Correlate that with his resume. Most of the times they match.

Another interesting parameter is the – “keyword density”. My personal experience has been that the higher the keyword density, the more likely is that the candidate is bogus. You cannot learn – all of “C, C++, Java, PHP, MySQL, Oracle, OLTP, Apache, Tomcat, RoR” in 2 years :-)

3. “The Sourcing”

Sourcing of resumes has a major impact on the success rate. I think it is very important to access the success rate of each of the source of resumes – direct, referral, recruiters, newspapers, online portals, etc. You may be surprised to know that there may be a difference of 10x in the conversion rates of each of these sources – so you should focus on the source that has the highest conversion rate.

For startups referrals work the best. Keep your employees happy, so that they find more friends who want to become happy!

4. “The Interview”

A few things at the top of my head are following:
Do initial screenings before you go too deep into technical discussion. If the candidate is not good, let’s find out in the first ten minutes of discussion, so that you save time on both sides. One important thing in my mind is – ask questions about your most recent problem that your company is facing, find out if he can solve that problem or not. Even if a guy can solve the most complex algorithm problems, or he can do the most optimal data structure design – can he solve your current (or past two-three) problems? Make sure you factor that into the overall decision. Don’t compare the candidate to yourself – “he is not like me; I can do it better than him”. It is very difficult to find a guy better that yourself, don’t try that :-)

5. “The Timing”

Try to keep “good” interviews at the top of the day, during mornings. You are in the office at 9AM, if the candidate doesn’t show up, or doesn’t pickup the phone – that does very bad things to your day. It’s a difficult thing to do, but I try and keep most interviews at the later part of the day.

Well, that’s all I have for now. There are many more things, but I wanted to keep it simple.

Got any more ideas, send me a message on facebook or Twitter?

Labels: , , , , , ,

Wednesday, September 17, 2008

PubMatic interview with Amar Goel

PubMatic interview with Amar Goel, CEO PubMatic/Komli:
Amar talks about ad optimization, how PubMatic benefits publishers, and how ad-price index shows how adprices are trending:

Labels: , ,

Thursday, April 03, 2008

Komli to represent eBay India for all their ad sales worldwide

Komli and eBay India have entered into an exclusive partnership whereby Komli will represent eBay India for all their ad sales worldwide. In addition, Komli's ad network optimization technology PubMatic will optimize eBay's unsold ad space for maximization of revenue.

This is very exciting news for a couple of reasons:
1. A global internet giant has chosen to partner with an Indian startup for its superior understanding of online advertising and online advertising technology,
2. This bodes well for the growth of online advertising in India -- large portals, which in the past have not looked at online advertising as a key revenue driver, are starting to do that now.

For details see official news release at - http://www.komli.com/news/ebaypress.php .

Labels: , , , ,

Sunday, March 30, 2008

Iframe ad-tag vs. Script ad-tag: Online advertising tag type comparison

This is a list that I have discussed many times with friends, however I never found these on a single place so here you go ...

Differences between iframe tag and script tag:
  1. Iframe tag does not delay the loading of the web-page elements: Iframes usually load in parallel, so for example if you have several elements in a page like images, CSS, JavaScripts and HTML tags and you have the ad-tag as an iframe embedded in the page, the iframe loading would happen in parallel and it would not make your page loading slower. So, if you want page to load faster use iframe tags.
  2. Script tag does not change the “referrer” property of your ad-tag: If your ad-tag is served from inside an iframe, the ad-network that serves the ad will see a referrer property different that your page url/domain. On the other hand if you use a script tag, then the referrer url remains the same as your page url and therefore your domain name. Some ad-networks that require that the ad being served from the same domain that they were created for, will therefore not work with iframe tags (therefore they will not serve ads). Most ad-networks however allow setting of a “site-alias” that allows you to set a different domain from which the ad may be served. Read more about the referrer property here.
  3. Script tag works better for ad-networks that do contextual analysis of the content of the page: if you use iframe tags, ad-networks will not be able to look outside of the iframe therefore they will not be able to do on-the-fly contextual analysis of the contents of the page, therefore they may serve irrelevant ads. Read more about contextual analysis here.
  4. If there is more than one ad from the same ad-network, and you are using iframe tags, these ads may not be able to communicate amongst themselves since the scope of the JavaScript variables is within an iframe. Therefore if an ad-tag sets a JavaScript variable, which the other ad-tag on the same page is expected to read, this will break if you use iframe tags.
  5. Since JavaScript variables have their scope only within that iframe, they don’t contaminate the namespace of the JavaScript variables of your web-page, neither do they get affected by the JavaScript variables of your web-page.
  6. Iframe tags are easier for inclusion inside a web-page, since you can save an ad-tag in a file, and load it as an iframe into your web page. This will also allow parallel load of the ad-tag iframe. For example if your web-page is:

<html>
<script type=”” …>
</script>
<iframe
src=”ad-tag.html”></iframe>


<body>
</body>
</html>

More questions? Drop me an email.

Update: For #3 "Script tag works better for ad-networks that do contextual analysis", Google AdSense does mention in their help section for Why aren't my ads relevant?, read on:
The AdSense code was placed within an IFRAME.
Our targeting technology is not optimized to serve ads within a separate IFRAME. If you placed the AdSense code in a separate IFRAME, your site may display less targeted ads or public service ads. For better results, please implement our ad code directly into the source of your webpage. Once you make these changes, relevant ads may not appear immediately. Until we are able to re-crawl your site, which may take up to 48 hours or more, your page may continue to display untargeted or public service ads.

Labels: , , , , , , , ,

Thursday, March 20, 2008

Disk storage - where are we headed?

Some insightful articles and some of my own thoughts on the trends in data storage:

THE BACKGROUND:
Disk capacities are going up and costs are going down, however the effective transfer bandwidth (ETB) per byte of capacity has come down tremendously. Despite capacities and transfer rates increasing by factors or 10,000 and 100 respectively, typical drive ETB has actually decreased by a factor of 100. As Jim Gray said "Disks have become tapes." (Link to source).

Consider, for example, a 10 TB database. Ten years ago, this database would have occupied two thousand 5 GB drives - a common size at the time. With a 3 MB/second transfer rate, the aggregate bandwidth of these 2,000 drives would have been 6 GB/second, enabling the entire database to be scanned in about 30 minutes. Today, only about 20 higher-capacity drives would be needed to hold this same database. Those 20 drives would have an aggregate bandwidth of 1.2 GB/second, increasing the time required to scan the entire database to 150 minutes - an increase of two hours.

DISKS ARE BECOMING A SEQUENTIAL ACCESS DEVICE RATHER THAN A RANDOM ACCESS DEVICE
Jim Gray points out - We have to convert from random disk access to sequential access patterns. Disks will give you 200 accesses per second, so if you read a few kilobytes in each access, you're in the megabyte-per-second realm, and it will take a year to read a 20-terabyte disk. If you go to sequential access of larger chunks of the disk, you will get 500 times more bandwidth—you can read or write the disk in a day. So programmers have to start thinking of the disk as a sequential device rather than a random access device.

Tom White later says that - "MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk (seeking is the process of moving the disk's head to a particular place on the disk to read or write data). Read more here.

My take is that SSDs are going to take a while to become an economically viable alternative to disks. Flash disks cost approximately $10/GB, and the OEM costs of good flash drives cost about $60/GB or more (source here). Compare this with the cost of disk, which is about $0.20/GB. So, we are looking at about 300x price difference here. So, I think, it's going to take while before SSDs become reality in storing terabytes of data. Until that time, we will have to use 50-70% empty disks to enhance striping-performance. So, if we were to use 50% empty disks, the cost of disks doubles for storing the same amount of data.

Labels: , , , , , , , , , ,

Wednesday, February 20, 2008

Skype All-Hands: Works really well

I did a "Skype All Hands" this morning. Surprisingly it worked much better than a face to face all-hands, or a teleconference-all-hands. To be specific the problem that I mostly face is - people don't talk, they don't ask questions during such all-hands meetings. In a face to face all-hands meeting, it takes a while before the first guy asks a question, and then the second guy, and many questions come towards the end of the meeting. A Skype all-hands on the other had turned out to very interactive, people asked many questions, they really participated in the meeting. It seems like engineers like typing much more than talking. Well, I love this. As an added advantage - you already have the meeting minutes (cut-and-paste from IM log), and you can do this across the oceans.

Labels: , , , ,

Tuesday, January 29, 2008

PubMatic Enables Ad Optimization Across Every Ad Network

Palo Alto, Calif. - (January 28, 2008) - PubMatic (www.pubmatic.com), the first and largest ad optimization platform for Web publishers worldwide, today announced the ability to optimize online ads across any and every ad network. Now Web publishers using PubMatic can eliminate the headache of testing and deciding which ad network and layout will maximize their revenues, because PubMatic does it for them.

Currently in beta, PubMatic serves more than 2,000 publishers and more ad networks than any other ad inventory optimization platform.

"PubMatic immediately doubled our ad revenues by recommending the optimal ad network for each and every visit to WinCustomize.com," said Michael Crassweller, Web Site Manager, StarDock. "Since Wincustomize.com serves up nearly 4 million ads per day, PubMatic's ad network optimization has made a big difference to our bottom line."

The PubMatic public beta is open to all Web publishers, regardless of geography or company size. Signing up is simple and free: publishers can visit www.pubmatic.com/signup to get started in minutes.

Labels: , , , ,

Friday, January 18, 2008

8 hacks for finding Startup office space

I started looking for a new office for Komli Engineering at Pune, India about a week back. Here I describe my journey and the final selection.

Here are some of the key points when finding an office of about 2000 sqft in Pune in Aundh/Baner area:
  1. Rates have gone up like crazy – average rate is Rs. 50/sqft., unfurnished
  2. Most office spaces have only 2 restrooms, which is too few for a 2000 sqft space. So most spaces can pretty much be rejected on that ground
  3. There are a large number of residential properties that people are converting into commercial properties for offices and showrooms, and charging Rs. 50 per-sqft!
  4. The problem with these residential-turned-commercial properties is that – a) families are living in the same building, b) kids are playing in open spaces and c) parking is mostly an issue.
  5. There are independent-bungalows available at very cheap rates. These places are great – they are peaceful, have lots of spaces, lots of parking and so on. BUT you would probably never get broadband in those places. These independent-bungalows are available at 1/3 the rental cost
  6. Your office space must be located not more than 3 minutes from 5 places that must sell wada-pav, hot samosas, cut-chai, tandoori chicken and Pizza Hut – else you are doomed, because most employees in a startup are not married, and they need to eat (when they are not writing code)
  7. The other most important things when you are renting an office space are – 1) the place is good, 2) broadband is feasible and 3) parking space is available. The “broadband” is the most unexpected thing to find out. You can find the best place and the least cost, but no broadband – that will totally blow you off. The second most difficult thing to find is parking space for 4 cars
  8. I looked at the most cool places such as a nice place next to McDonalds in Aundh, a really cool office with all glass façade – but not good enough for Komli!

I finally decided with a really nice place above “Kobe Sizzlers” in Aundh. Awesome place, lots of space, central location, 2 balconies and lots of eateries around. And the best part is – you can get Sizzlers on Demand.

Wanna join us - check-out our open positions at http://www.komli.com/careers/ .

Labels: , , , , ,

Thursday, October 04, 2007

PubMatic Engineering: Slide show

Sunday, September 30, 2007

PubMatic selected by TechCrunch as a Top 40 Startup in the World

PubMatic, a product of Komli, was selected by TechCrunch as a Top 40 Startup in the World. Nearly 750 startups from around the world applied for this honor, and PubMatic was lucky enough to be selected! This was announced at the TechCrunch40 conference in San Francisco, CA, a conference built to showcase these 40 top startups.

We are hiring!If you dream in Java, think in PHP, and talk in <xml> over IM, you should talk to us.

In addition, as part of our presentation at the conference, we announced that PubMatic has been released into a global beta available for all publishers around the world! During our alpha over 500 publishers from around the world have been using PubMatic and seeing some amazing results. See news about PubMatic here.

Online advertising is growing at a very fast pace, and the number of variables affecting the performance of an online ad has been growing at an even faster pace. Komli is devising methods for maximizing the yield of online advertising using
advanced algorithms running over large-scale systems. We are also developing decision support system for data analytics, analysis of real time data, such as user behavior and web analytics, server scalability to support 100,000,000 requests per day (to start with), and much more cool stuff.


The last I posted about Komli, we had just moved into our new office. We were still building the product. Since then a lot has changed, we wrote a bunch of code, did a beta, were selected as a Top 40 startup in the world, our team grew to 8 people, and have been having a lot of fun.


The beta release was amazing, we had close to 400 customers using PubMatic, a small team of very enthusiastic world-class programmers were writing code and managing escalations at the same time.



While we hacked code in Java, PHP, AJAX and C 12 hours a day, and listened to rock and the latest Bollywood tunes of Bhool Bhuliyaa, the continued to have a sense of humor. This is a sketch that one of us drew on the whiteboard, while he was designing a new DB schema for user authentication.

And, did I mention, we never miss a chance to have fun ...




Labels: , , , , , , , ,