Finding User Geo Location and performing large scale data analysis

With an array of data from users in weblogs, companies do make use of them for multiple purposes that involves user behavior such as how much time users spent on which kind of pages, which are more popular pages, user page transversal etc. A typical weblog contains information such as time of the day, IP address of the user, the page served as well as status of the page served etc.

One of the major ways for making use of IP address is mapping it to a location such as city, state or even country. One can simply purchase products which provide list of IP blocks mapped to varied locations. For example one is Max Mind. The data is relatively static; however you can very easily obtain the latest data each month or even quarter and use it. Thus, you have user’s IP address in one hand as well as mapping of IP blocks to city and country in another as well as finding the user location that may seem straight forward which it is in case you are performing it for few IP addresses. However, it can be a huge performance issues in case your are performing analysis on tens or thousands of IPs on a constant basis.

MySQL 4.x or 5.x never make use of index through performance of range operations between two sets of data. But Geo City data maps block do block an IP address to a city for some good reason. In turn , it will force you for using range operations such as greater than or smaller than or equal to between operators. A note of caution is that IP authorities do allocate a group of IP addresses to ISP’s or even particular country and there isn’t any reason for storing all separate IP addresses.