Years ago I started an online marketing company with some friends and while there I put together a side project that ended up being 1% of Google. At the time Google indexed 8 billion pages and 75 million of those pages were from about 5 dynamic HTML pages. Essentially it was like keyword stuffing 75 million keywords into Google. In retrospect it is a funny story and it went as you would have expected. Insane amounts of traffic and affiliate marketing revenue then banning.
The first question is, why would you be such an asshole to the internet? The short answer is misplaced ambition. I was young, saddled with college loan debt, and the pressure to drive revenue outweighed the risks of doing evil things to Google.
To set up this story, our company had 250ish websites in various industries with various levels of SEO that drove affiliate traffic and our employee count was 10ish. Most of our sites leveraged pure white hat SEO and a few sites had some black hat techniques mixed in. Our bread and butter was white hat SEO but those sites took time to cultivate. For that reason I usually mixed in some minor spam to kick start traffic. The minor spam drew some positive attention within the company then the pressure was on to do more of it. That’s how this idea started. Part of the thinking was if a little black hat SEO went a little ways then why not pursue every black hat technique imaginable and see how far that could go. Usually you don’t do this for fear of being banned from Google. I minimized the risk by setting up a new subnet within our colo. By doing so I could get banned and not negatively impact the rest of our business.
The 6 facets to making this project a success:
The product needed to be something that had good affiliate commissions and didn’t have much competition.
I needed an industry that could support millions of keywords.
I needed the ability to introduce a lot of new content to Google.
- A network effect
I needed to create a network of sites such that the sites could add value by linking to one another. Basically my own link farm.
- On page SEO
The on page SEO needed work in such a way where I could optimize millions of pages with little effort.
I needed the ability to create lots of links from sites I did not control.
Here is how those areas were addressed:
The content problem was solved by hiring local college students from UCSB to create content. That summer we hired a dozen or so students to work part time. We built a CMS system so students could work from home. Within a few weeks we had thousands of pages of unique articles. The CMS system worked by consuming a list of thousands of keywords that were for the product or geo location. Product keywords included things like “black mold” and “siding” and geo location keywords were cities, counties and states. Students could access the CMS system, find content to write about then publish to the CMS system. To a large extent this process managed itself. We seeded the CMS with keywords, provided the ability for lots of people to add content then tracked operational aspects like production efficiency and payroll expenses.
- A network effect
I wanted the benefits of a link farm without the cost of being caught in a link farm. To do this, 26 websites were created. Multiple IPs were used on various subnets to give the appearance that the sites were independently owned. The webservers were set up to share IP addresses, meaning one webserver could host several sites even if those sites were on different IPs. Given cloud computing is popular these days, circa 2004 this was a novel approach. Inter linking between websites was done in such a way that loops were difficult to recognize. Given the graph of 26 websites, you would have to traverse 6 or more websites to make a full loop. I wanted to make it difficult for Google to determine that this set of sites constituted a link farm.
- On page SEO
The on page SEO was not complicated. Each site was basically 5 dynamic pages so there wasn’t much work needed. HTML attributes, H1 tags, title tags, etc were set similarly throughout the site. At times there were keyword density issues that needed to be corrected and that was handled by training the content writers. On page content was also shadowed such that the content was visible to search engines but hidden to users. Keep in mind that this done at a time when search engines were not well equipped to deal with spam. Shadowing content freed up real estate on the page to create good call to action splash pages.
It’s here that this project took a turn to the dark side. I hired contractors from India to create a tool to automatically generate links. For brevity let me just say that when a website links to another website, it tells search engines that the site it is linking to is important. Point being inbound links are good. Traditionally you get links by contacting another website owner and setting up an arrangement for a back links. Good websites used to sell these links. Having millions of pages that needed links, I didn’t have this time. So I hired contractors to create a tool to do this for me.
This tool worked by searching the internet for forms that allowed links then adding those links. The tool was written in .NET and had a UI to interface with it. The tool required a set of keywords to use, parameters to allow it so spider Google for forms and configuration parameters. It was also multi threaded so it could add links very quickly. In an hour it could easily create 1000+ backlinks. On each site I would create 26 backlinks, one for each site. This was ok because each site was seen by a search engine as a unique site. The tool also had the ability to limit the links per site. If found some examples where the tool would replace a wiki page with a link, which was bad. This resulted in people calling our ISP to complain so some changes were made to mitigate the damage. This tool isn’t something I’m proud of but it had to be part of the experiment. Google had to be convinced to spider millions of pages and deep links were the best way to to do that.
What happened when the site went live?
The development of this project took six months to complete and the sites were let loose on August 2004. To monitor the performance of the websites we created tools to measure googlebot activity and SERP rankings. The tools were comprehensive. One side of the tools would consume web logs to measure activity by site by page by keyword. Another side of the tools would scrape SERP results so we could see how our pages ranked for specific keywords over time. As a guy who likes data, seeing the sites grow up was the most interesting part. There was a lot of data to digest. For each site there were millions of pages. Each page had various keywords associated with it, the page had to get indexed, climb SERP rankings over time and ultimately drive affiliate clicks and revenue. The sites started to take off immediately and within a few weeks revenue was in the thousands. Every day I would run the linking tool to add new links. Every day google would pick up more pages. Every day revenue went up a little.
It took a few months for the site to reach 1 million pages indexed and that is when I started to worry. The growth was exponential and there was no way to slow it down. The traffic from the bots alone was several thousand a month. Our hosting provider said we were getting more traffic than Sony music. I stopped checking growth after our tools found 75 million pages in Google. About that same time, Google announced it had indexed 8 billion pages. This project represented roughly 1% of Google. Technically that is true but it’s also a misleading stat. These were long tail keywords that were not commonly searched. Being 1% of Google SERP database did not translate into 1% of the traffic. Regardless, if you were if you were looking for anything home contracting then our sites would come up. To that extent, this project was a success. There were times when nearly the entire first page would be links to our sites. A search engine was not smart enough to care but a person could easily see something was fishy. 75 million keywords was a lot and the success of this experiment surely contributed to its demise.
How did the experiment end?
In March 2005, the Google banhammer finally came down. This came as no surprise. Google eliminated the 26 websites from their index. Our other ~250 sites were spared. Our advertiser, who was owned by IAC, got banned as well. Eventually our advertiser called us to say my company was mentioned in a board meeting and our contract would be cancelled. They were unusually nice considering the circumstance. Fortunately for them they were returned to the index within a few days. The cost to implement this experiment was about $100k plus a little bit of pride. Revenue more than made up for the cost but the project did end in failure. A lot of effort went into sites that were no longer viable. Success in SEO is trying to find that fine line between white had and black hat SEO. This experiment pushed that line and it was interesting to witness.