Today, data analysis is a part of any SEO professional’s everyday life. Data collections are getting broader and more data is being collected and analyzed. Many SEOs are now considering business intelligence (BI) tools as an integral part of their optimization work.
So how does Business Intelligence help SEO?
We need to understand how data analysis workflow typically evolves. When you’re working with a small website, analysis can be managed with a simple spreadsheet with formulas and pivot tables containing metrics such as visits, page views and bounce rates.. As the site gets more traffic and/or more comparison sites are added as part of the analysis, the amount of data will grow and eventually push beyond the limit of today’s spreadsheets (the current row limit of an XLSX document in Excel is about 1 million). Spreadsheets will take longer to open. Calculations will slow to a crawl. If your computer does not have enough memory, opening an enormous spreadsheet can crash your machine.
The next logical step is to move the data into a database. Databases will solve the row limit problem immediately. They allow you to scale up to storing millions of records. Data aggregation will also be reasonably fast. A decent SATA drive lets you scan hundreds of thousand rows per second. Assuming your data set is at the millions of records scale, you can expect data aggregation response time within a few seconds. One caveat is that you will have to learn SQL to get the most benefit from database usage. As calculation complexity increases, response time can sometimes increase exponentially.
Another logical next step is to follow the OLAP (online analytical processing) model by building data cubes. Cubes are basically pre-calculated aggregated result sets. Instead of performing data aggregation at request time, it is provisioned, pre-calculated on a set schedule and stored into cubes. Retrieval is extremely fast but this model is not very flexible and data availability is subject to latency. Once a cube is created, it’s fixed and cannot be modified easily. If you want to dissect and aggregate the data differently, a new cube will have to be designed and implemented. That takes time and effort. On the positive side, OLAP is a good and scalable solution as long as the cubes can answer all questions users may ask and latency is not an issue. OLAP is not a new concept and it has been very popular in BI space for decades. For the SEO industry, OLAP, is a valuable BI tool when designed correctly.
What does this mean for SEO?
SEOs are always hungry for data and want to access it as soon as it’s available. With social networks becoming more prominent in search, more data will need to be analyzed. OLAP is great for data analysis but the added latency prevents users from accessing data soon enough. One solution is in-memory OLAP. Being stored in memory, offline data aggregation can be quickly calculated on-demand, with results provided in less than a second. It solves the flexibility issue, allowing you to slice and dice data any way you want, and the latency issue, so you get to see aggregation results immediately.
In-memory OLAP has gained popularity over the past few years. If you do a search on Google for “in-memory OLAP” or “in-memory analytics,” you will find quite a few big software vendors offering their own solutions. Some notable big vendors are Oracle, SAS and Google. Some companies offer desktop products with visualization capabilities and others provide service-based solutions that give you scalability on a distributed platform. A desktop product is easier to set up than a service-based product, and gives you instant gratification with pretty visualization. The only limit is amount of RAM you have in your system. RAM is cheap, so I recommend installing as much RAM as you can on your computer. More RAM means that more data can be analyzed. Service-based solutions let you set up a cluster of machines to store your analytics data. It usually facilitates a highly optimized storage strategy that reduces storage requirements and improves retrieval speed.
To sum all of this up, in-memory OLAP is the new, sexy way of doing data analysis. If you like visualization, go with a desktop product.. If you are very hands-on and enjoy writing queries, a service-based solution will give you more flexibility in the long run. In-memory OLAP is a slight update from a traditional OLAP. When computing power becomes cheaper and more powerful, we may see something even more interesting in the future.
Don’t you agree that BI is an exciting field? What do you think of BI solutions for SEO based on your experience?