Web analytics
Encyclopedia
Web analytics is the measurement, collection, analysis and reporting of internet data
Data (computing)
In computer science, data is information in a form suitable for use with a computer. Data is often distinguished from programs. A program is a sequence of instructions that detail a task for the computer to perform...

 for purposes of understanding and optimizing web usage.

Web analytics is not just a tool for measuring website traffic but can be used as a tool for business research and market research. Web analytics applications can also help companies measure the results of traditional print advertising campaigns. It helps one to estimate how traffic to a website changes after the launch of a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views. It helps gauge traffic and popularity trends which is useful for market research.

There are two categories of web analytics; off-site and on-site web analytics.

Off-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website's potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole.

On-site web analytics measure a visitor's journey once on your website. This includes its drivers and conversions; for example, which landing page
Landing page
In online marketing a landing page, sometimes known as a lead capture page, is a single web page that appears in response to clicking on an advertisement...

s encourage people to make a purchase. On-site web analytics measures the performance of your website in a commercial context. This data is typically compared against key performance indicators
Key performance indicators
A performance indicator or key performance indicator is an industry jargon for a type of performance measurement.. KPIs are commonly used by an organization to evaluate its success or the success of a particular activity in which it is engaged...

 for performance, and used to improve a web site or marketing campaign's audience response.

Historically, web analytics has referred to on-site visitor measurement. However in recent years this has blurred, mainly because vendors are producing tools that span both categories.

On-site web analytics technologies

Many different vendors provide on-site web analytics software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 and services. There are two main technological approaches to collecting the data. The first method, log file analysis, reads the logfiles
Server log
A server log is a log file automatically created and maintained by a server of activity performed by it.A typical example is a web server log which maintains a history of page requests. The W3C maintains a standard format for web server log files, but other proprietary formats exist...

 in which the web server records all its transactions. The second method, page tagging, uses JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....

 on each page to notify a third-party server when a page is rendered by a web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

. Both collect data that can be processed to produce web traffic reports.

In addition other data sources may also be added to augment the data. For example; e-mail response rates, direct mail campaign data, sales and lead information, user performance data such as click heat map
Heat map
A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional table are represented as colors. Fractal maps and tree maps both often use a similar system of color-coding to represent the values taken by a variable in a hierarchy...

ping, or other custom metrics as needed.

Web server logfile analysis

Web servers record some of their transactions in a logfile. It was soon realized that these logfiles could be read by a program to provide data on the popularity of the website. Thus arose web log analysis software
Web log analysis software
Web log analysis software is a simple kind of Web analytics software that parses a log file from a web server, and based on the values contained in the log file, derives indicators about who, when, and how a web server is visited...

.

In the early 1990s, web site statistics consisted primarily of counting the number of client requests (or hits) made to the web server. This was a reasonable method initially, since each web site often consisted of a single HTML file. However, with the introduction of images in HTML, and web sites that spanned multiple HTML files, this count became less useful. The first true commercial Log Analyzer was released by IPRO in 1994 .

Two units of measure were introduced in the mid 1990s to gauge more accurately the amount of human activity on web servers. These were page views and visits (or sessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes. The page views and visits are still commonly displayed metrics, but are now considered rather rudimentary.

The emergence of search engine spiders
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

 and robots in the late 1990s, along with web proxies
Proxy server
In computer networks, a proxy server is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server...

 and dynamically assigned IP addresses
Dynamic Host Configuration Protocol
The Dynamic Host Configuration Protocol is a network configuration protocol for hosts on Internet Protocol networks. Computers that are connected to IP networks must be configured before they can communicate with other hosts. The most essential information needed is an IP address, and a default...

 for large companies and ISPs
Internet service provider
An Internet service provider is a company that provides access to the Internet. Access ISPs directly connect customers to the Internet using copper wires, wireless or fiber-optic connections. Hosting ISPs lease server space for smaller businesses and host other people servers...

, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies
HTTP cookie
A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is used for an origin website to send state information to a user's browser and for the browser to return the state information to the origin site...

, and by ignoring requests from known spiders.

The extensive use of web cache
Web cache
A web cache is a mechanism for the temporary storage of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag...

s also presented a problem for logfile analysis. If a person revisits a page, the second request will often be retrieved from the browser's cache, and so no request will be received by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

Page tagging

Concerns about the accuracy of logfile analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or 'Web bugs'.

In the mid 1990s, Web counter
Web counter
A web counter or hit counter is a computer software program that indicates the number of visitors, or hits, a particular webpage has received...

s were commonly seen — these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.

The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits. Cookie acceptance rates vary significantly between web sites and may affect the quality of data collected and reported.

Collecting web site data using a third-party data collection server (or even an in-house data collection server) requires an additional DNS
Domain name system
The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...

 look-up by the user's computer to determine the IP address of the collection server. On occasion, delays in completing a successful or failed DNS look-ups may result in data not being collected.

With the increasing popularity of Ajax
Ajax (programming)
Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...

-based solutions, an alternative to the use of an invisible image, is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company. This is in some ways flawed by browser restrictions on the servers which can be contacted with XmlHttpRequest
XMLHttpRequest
XMLHttpRequest is an API available in web browser scripting languages such as JavaScript. It is used to send HTTP or HTTPS requests directly to a web server and load the server response data directly back into the script. The data might be received from the server as XML text or as plain text...

 objects. Also, this method can lead to slightly lower reported traffic levels, since the visitor may stop the page from loading in mid-response before the Ajax call is made.

Logfile analysis vs page tagging

Both logfile analysis programs and page tagging solutions are readily available to companies that wish to perform web analytics. In some cases, the same web analytics company will offer both approaches. The question then arises of which method a company should choose. There are advantages and disadvantages to each approach.

Advantages of logfile analysis

The main advantages of logfile analysis over page tagging are as follows:
  • The web server normally already produces logfiles, so the raw data is already available. No changes to the website are required.
  • The data is on the company's own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later, use several different programs, and analyze historical data with a new program.
  • Logfiles contain information on visits from search engine spiders, which generally do not execute JavaScript on a page and are therefore not recorded by page tagging. Although these should not be reported as part of the human activity, it is useful information for search engine optimization
    Search engine optimization
    Search engine optimization is the process of improving the visibility of a website or a web page in search engines via the "natural" or un-paid search results...

    .
  • Logfiles require no additional DNS
    Domain name system
    The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...

     Lookups. Thus there are no external server calls which can slow page load speeds, or result in uncounted page views.
  • The web server reliably records every transaction it makes, including e.g. serving PDF documents and content generated by scripts, and does not rely on the visitors' browsers co-operating

Advantages of page tagging

The main advantages of page tagging over logfile analysis are as follows:
  • Counting is activated by opening the page (given that the web client runs the tag scripts), not requesting it from the server. If a page is cached, it will not be counted by the server. Cached pages can account for up to one-third of all pageviews. Not counting cached pages seriously skews many site metrics. It is for this reason server-based log analysis is not considered suitable for analysis of human activity on websites.
  • Data is gathered via a component ("tag") in the page, usually written in JavaScript, though Java can be used, and increasingly Flash is used. JQuery and AJAX can also be used in conjunction with a server-side scripting language (such as PHP
    PHP
    PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

    ) to manipulate and (usually) store it in a database, basically enabling complete control over how the data is represented.
  • The script may have access to additional information on the web client or on the user, not sent in the query, such as visitors' screen sizes and the price of the goods they purchased.
  • Page tagging can report on events which do not involve a request to the web server, such as interactions within Flash
    Adobe Flash
    Adobe Flash is a multimedia platform used to add animation, video, and interactivity to web pages. Flash is frequently used for advertisements, games and flash animations for broadcast...

     movies, partial form completion, mouse events such as onClick, onMouseOver, onFocus, onBlur etc.
  • The page tagging service manages the process of assigning cookies to visitors; with logfile analysis, the server has to be configured to do this.
  • Page tagging is available to companies who do not have access to their own web servers.
  • Lately page tagging has become a standard in web analytics .

Economic factors

Logfile analysis is almost always performed in-house. Page tagging can be performed in-house, but it is more often provided as a third-party service. The economic difference between these two models can also be a consideration for a company deciding which to purchase.
  • Logfile analysis typically involves a one-off software purchase; however, some vendors are introducing maximum annual page views with additional costs to process additional information. In addition to commercial offerings, several open-source logfile analysis tools are available free of charge.
  • For Logfile analysis you have to store and archive your own data, which often grows very large quickly. Although the cost of hardware to do this is minimal, the overhead for an IT department can be considerable.
  • For Logfile analysis you need to maintain the software, including updates and security patches.
  • Complex page tagging vendors charge a monthly fee based on volume i.e. number of pageviews per month collected.


Which solution is cheaper to implement depends on the amount of technical expertise within the company, the vendor chosen, the amount of activity seen on the web sites, the depth and type of information sought, and the number of distinct web sites needing statistics.

Regardless of the vendor solution or data collection method employed, the cost of web visitor analysis and interpretation should also be included. That is, the cost of turning raw data into actionable information. This can be from the use of third party consultants, the hiring of an experienced web analyst, or the training of a suitable in-house person. A cost-benefit analysis
Cost-benefit analysis
Cost–benefit analysis , sometimes called benefit–cost analysis , is a systematic process for calculating and comparing benefits and costs of a project for two purposes: to determine if it is a sound investment , to see how it compares with alternate projects...

 can then be performed. For example, what revenue increase or cost savings can be gained by analysing the web visitor data?

Hybrid methods

Some companies are now producing programs that collect data through both logfiles and page tagging. By using a hybrid method, they aim to produce more accurate statistics than either method on its own. The first Hybrid solution was produced in 1998 by Rufus Evison, who then spun the product out to create a company based upon the increased accuracy of hybrid methods.

Geolocation of visitors

With IP geolocation, it is possible to track visitors location. Using IP geolocation database or API, visitors can be geolocated to city, region or country level.

IP Intelligence, or Internet Protocol (IP) Intelligence, is a technology that maps the Internet and catalogues IP addresses by parameters such as geographic location (country, region, state, city and postcode), connection type, Internet Service Provider (ISP), proxy information, and more. The first generation of IP Intelligence was referred to as geotargeting or geolocation
Geolocation
Geolocation is the identification of the real-world geographic location of an object, such as a radar, mobile phone or an Internet-connected computer terminal...

 technology. This information is used by businesses for online audience segmentation in applications such online advertising
Online advertising
Online advertising is a form of promotion that uses the Internet and World Wide Web to deliver marketing messages to attract customers. Examples of online advertising include contextual ads on search engine results pages, banner ads, blogs, Rich Media Ads, Social network advertising, interstitial...

, behavioral targeting
Behavioral targeting
Behavioral targeting is a technique used by online publishers and advertisers to increase the effectiveness of their campaigns.Behavioral targeting uses information collected on an individual's web-browsing behavior, such as the pages they have visited or the searches they have made, to select...

, content localization (or website localization
Website localization
Website localization is the process of adapting an existing website to local language and culture in the target market.Two factors are involved—programming expertise and linguistic/cultural knowledge....

), digital rights management
Digital rights management
Digital rights management is a class of access control technologies that are used by hardware manufacturers, publishers, copyright holders and individuals with the intent to limit the use of digital content and devices after sale. DRM is any technology that inhibits uses of digital content that...

, personalization
Personalization
Personalization involves using technology to accommodate the differences between individuals. Once confined mainly to the Web, it is increasingly becoming a factor in education, health care Personalization involves using technology to accommodate the differences between individuals. Once confined...

, online fraud detection, geographic rights management, localized search, enhanced analytics, global traffic management, and content distribution.

Click analytics

Click analytics
Click analytics
Click analytics is a special type of web analytics that gives special attention to clicks which constitute the first stage in the conversion funnel.Commonly, click analytics focuses on on-site analytics...

 is a special type of web analytics that gives special attention to clicks
Point-and-click
Point-and-click is the action of a computer user moving a cursor to a certain location on a screen and then pressing a mouse button, usually the left button , or other pointing device...

.

Commonly, click analytics
Click analytics
Click analytics is a special type of web analytics that gives special attention to clicks which constitute the first stage in the conversion funnel.Commonly, click analytics focuses on on-site analytics...

 focuses on on-site analytics. An editor of a web site uses click analytics to determine the performance of his or her particular site, with regards to where the users of the site are clicking.

Also, click analytics
Click analytics
Click analytics is a special type of web analytics that gives special attention to clicks which constitute the first stage in the conversion funnel.Commonly, click analytics focuses on on-site analytics...

 may happen real-time or "unreal"-time, depending on the type of information sought. Typically, front-page editors on high-traffic news media sites will want to monitor their pages in real-time, to optimize the content. Editors, designers or other types of stakeholders may analyze clicks on a wider time frame to aid them assess performance of writers, design elements or advertisements etc.

Data about clicks may be gathered in at least two ways. Ideally, a click is "logged" when it occurs, and this method requires some functionality that picks up relevant information when the event occurs. Alternatively, one may institute the assumption that a page view is a result of a click, and therefore log a simulated click that lead to that page view.

Customer lifecycle analytics

Customer lifecycle analytics is a visitor-centric approach to measuring that falls under the umbrella of lifecycle marketing. Page views, clicks and other events (such as API calls, access to third-party services, etc.) are all tied to an individual visitor instead of being stored as separate data points. Customer lifecycle analytics attempts to connect all the data points into a marketing funnel
Purchase funnel
The purchase or purchasing funnel is a consumer focused marketing model which illustrates the theoretical customer journey towards the purchase of a product or service....

 that can offer insights into visitor behavior and website optimization
Search engine optimization
Search engine optimization is the process of improving the visibility of a website or a web page in search engines via the "natural" or un-paid search results...

.

Other methods

Other methods of data collection are sometimes used. Packet sniffing collects data by sniffing
Packet sniffer
A packet analyzer is a computer program or a piece of computer hardware that can intercept and log traffic passing over a digital network or part of a network...

 the network traffic passing between the web server and the outside world. Packet sniffing involves no changes to the web pages or web servers. Integrating web analytics into the web server software itself is also possible. Both these methods claim to provide better real-time data than other methods.

Key definitions

There are no globally agreed definitions within web analytics as the industry bodies have been trying to agree definitions that are useful and definitive for some time. The main bodies who have had input in this area have been JICWEBS (The Joint Industry Committee for Web Standards in the UK and Ireland), ABCe (Audit Bureau of Circulations electronic, UK and Europe), The WAA (Web Analytics Association, US) and to a lesser extent the IAB (Interactive Advertising Bureau). This does not prevent the following list from being a useful guide, suffering only slightly from ambiguity. Both the WAA and the ABCe provide more definitive lists for those who are declaring their statistics using the metrics defined by either.
  • Hit
    Hit (internet)
    A hit is a request to a web server for a file . When a web page is uploaded from a server the number of "hits" or "page hits" is equal to the number of files requested. Therefore, one page load does not always equal one hit because often pages are made up of other images and other files which stack...

    - A request for a file from the web server. Available only in log analysis. The number of hits received by a website is frequently cited to assert its popularity, but this number is extremely misleading and dramatically over-estimates popularity. A single web-page typically consists of multiple (often dozens) of discrete files, each of which is counted as a hit as the page is downloaded, so the number of hits is really an arbitrary number more reflective of the complexity of individual pages on the website than the website's actual popularity. The total number of visitors or page views provides a more realistic and accurate assessment of popularity.
  • Page view
    Page view
    A page view or page impression is a request to load a single HTML file of an Internet site. On the World Wide Web a 'page' request would result from a web surfer clicking on a link on another 'page' pointing to the 'page' in question. This should be contrasted with a hit, which refers to a...

    - A request for a file whose type is defined as a page in log analysis. An occurrence of the script being run in page tagging. In log analysis, a single page view may generate multiple hits as all the resources required to view the page (images, .js and .css files) are also requested from the web server.
  • Visit / Session
    Visit filter
    Visit filters which are used by Web log analysis software include or exclude all the data in a visit session. The specifying ranges or types of data let you limit the web log data that is analyzed, letting you focus on relevant activity....

    - A visit is defined as a series of page requests from the same uniquely identified client with a time of no more than 30 minutes between each page request. A session is defined as a series of page requests from the same uniquely identified client with a time of no more than 30 minutes and no requests for pages from other domains intervening between page requests. In other words, a session ends when someone goes to another site, or 30 minutes elapse between pageviews, whichever comes first. A visit ends only after a 30 minute time delay. If someone leaves a site, then returns within 30 minutes, this will count as one visit but two sessions. In practice, most systems ignore sessions and many analysts use both terms for visits. Because time between pageviews is critical to the definition of visits and sessions, a single page view does not constitute a visit or a session (it is a "bounce").

  • First Visit / First Session - (also known as 'Absolute Unique Visitor) A visit from a visitor who has not made any previous visits.
  • Visitor / Unique Visitor / Unique User - The uniquely identified client generating requests on the web server (log analysis) or viewing pages (page tagging) within a defined time period (i.e. day, week or month). A Unique Visitor counts once within the timescale. A visitor can make multiple visits. Identification is made to the visitor's computer, not the person, usually via cookie and/or IP+User Agent. Thus the same person visiting from two different computers or with two different browsers will count as two Unique Visitors. Increasingly visitors are uniquely identified by Flash LSO's (Local Shared Object
    Local Shared Object
    Local Shared Objects , commonly called flash cookies are pieces of data that websites which use Adobe Flash may store on a user's computer...

    ), which are less susceptible to privacy enforcement.
  • Repeat Visitor - A visitor that has made at least one previous visit. The period between the last and current visit is called visitor recency and is measured in days.
  • New Visitor - A visitor that has not made any previous visits. This definition creates a certain amount of confusion (see common confusions below), and is sometimes substituted with analysis of first visits.
  • Impression - An impression is each time an advertisement loads on a user's screen. Anytime you see a banner, that is an impression.
  • Singletons - The number of visits where only a single page is viewed (a 'bounce'). While not a useful metric in and of itself the number of singletons is indicative of various forms of Click fraud
    Click fraud
    Click fraud is a type of Internet crime that occurs in pay per click online advertising when a person, automated script or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating a charge per click without having actual interest in the target...

     as well as being used to calculate bounce rate and in some cases to identify automatons bot
    Internet bot
    Internet bots, also known as web robots, WWW robots or simply bots, are software applications that run automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone...

    s.
  • Bounce Rate
    Bounce Rate
    Bounce rate is an Internet marketing term used in web traffic analysis. It represents the percentage of visitors who enter the site and "bounce" rather than continue viewing other pages within the same site....

    - The percentage of visits where the visitor enters and exits at the same page without visiting any other pages on the site in between.
  • % Exit - The percentage of users who exit from a page.
  • Visibility time - The time a single page (or a blog, Ad Banner...) is viewed.
  • Session Duration - Average amount of time that visitors spend on the site each time they visit. This metric can be complicated by the fact that analytics programs can not measure the length of the final page view.
  • Page View Duration / Time on Page - Average amount of time that visitors spend on each page of the site. As with Session Duration, this metric is complicated by the fact that analytics programs can not measure the length of the final page view unless they record a page close event, such as onUnload.
  • Active Time / Engagement Time - Average amount of time that visitors spend actually interacting with content on a web page, based on mouse moves, clicks, hovers and scrolls. Unlike Session Duration and Page View Duration / Time on Page, this metric can accurately measure the length of engagement in the final page view.
  • Page Depth / Page Views per Session - Page Depth is the average number of page views a visitor consumes before ending their session. It is calculated by dividing total number of page views by total number of sessions and is also called Page Views per Session or PV/Session.
  • Frequency / Session per Unique - Frequency measures how often visitors come to a website. It is calculated by dividing the total number of sessions (or visits) by the total number of unique visitors. Sometimes it is used to measure the loyalty of your audience.
  • Click path
    Click path
    A click path is the sequence of hyperlinks one or more website visitors follows on a given site. A visitor's click path may start within the website or at a 3rd party website and it continues as a sequence of successive webpages visited by the visitor....

    - the sequence of hyperlinks one or more website visitors follows on a given site.
  • Click - "refers to a single instance of a user following a hyperlink from one page in a site to another". Some use click analytics
    Click analytics
    Click analytics is a special type of web analytics that gives special attention to clicks which constitute the first stage in the conversion funnel.Commonly, click analytics focuses on on-site analytics...

     to analyze their web sites.
  • Site Overlay is a technique in which graphical statistics are shown besides each link on the web page. These statistics represent the percentage of clicks on each link.

The hotel problem

The hotel problem is generally the first problem encountered by a user of web analytics. The problem is that the unique visitors for each day in a month do not add up to the same total as the unique visitors for that month. This appears to an inexperienced user to be a problem in whatever analytics software they are using. In fact it is a simple property of the metric definitions.

The way to picture the situation is by imagining a hotel. The hotel has two rooms (Room A and Room B).















Day 1Day 2Day 3Total
Room AJohnJohnJane2 Unique Users
Room BMarkJaneMark2 Unique Users
Total222 ?


As the table shows, the hotel has two unique users each day over three days. The sum of the totals with respect to the days is therefore six.

During the period each room has had two unique users. The sum of the totals with respect to the rooms is therefore four.

Actually only three visitors have been in the hotel over this period. The problem is that a person who stays in a room for two nights will get counted twice if you count them once on each day, but is only counted once if you are looking at the total for the period. Any software for web analytics will sum these correctly for whatever time period, thus leading to the problem when a user tries to compare the totals.

New visitors + Repeat visitors unequal to total visitors

Another common misconception in web analytics is that the sum of the new visitors and the repeat visitors ought to be the total number of visitors. Again this becomes clear if the visitors are viewed as individuals on a small scale, but still causes a large number of complaints that analytics software cannot be working because of a failure to understand the metrics.

Here the culprit is the metric of a new visitor. There is really no such thing as a new visitor when you are considering a web site from an ongoing perspective. If a visitor makes their first visit on a given day and then returns to the web site on the same day they are both a new visitor and a repeat visitor for that day. So if we look at them as an individual which are they? The answer has to be both, so the definition of the metric is at fault.

A new visitor is not an individual; it is a fact of the web measurement. For this reason it is easiest to conceptualize the same facet as a first visit (or first session). This resolves the conflict and so removes the confusion. Nobody expects the number of first visits to add to the number of repeat visitors to give the total number of visitors. The metric will have the same number as the new visitors, but it is clearer that it will not add in this fashion.

On the day in question there was a first visit made by our chosen individual. There was also a repeat visit made by the same individual. The number of first visits and the number of repeat visits will add up to the total number of visits for that day.

Problems with cookies

Historically, vendors of page-tagging analytics solutions have used third-party cookies sent from the vendor's domain instead of the domain of the website being browsed. Third-party cookies can handle visitors who cross multiple unrelated domains within the company's site, since the cookie is always handled by the vendor's servers.

However, third-party cookies in principle allow tracking an individual user across the sites of different companies, allowing the analytics vendor to collate the user's activity on sites where he provided personal information with his activity on other sites where he thought he was anonymous. Although web analytics companies deny doing this, other companies such as companies supplying banner ads
Web banner
A web banner or banner ad is a form of advertising on the World Wide Web delivered by an ad server. This form of online advertising entails embedding an advertisement into a web page. It is intended to attract traffic to a website by linking to the website of the advertiser...

 have done so. Privacy concerns about cookies have therefore led a noticeable minority of users to block or delete third-party cookies. In 2005, some reports showed that about 28% of Internet users blocked third-party cookies and 22% deleted them at least once a month.

Most vendors of page tagging solutions have now moved to provide at least the option of using first-party cookies (cookies assigned from the client subdomain).

Another problem is cookie deletion. When web analytics depend on cookies to identify unique visitors, the statistics are dependent on a persistent cookie to hold a unique visitor ID. When users delete cookies, they usually delete both first- and third-party cookies. If this is done between interactions with the site, the user will appear as a first-time visitor at their next interaction point. Without a persistent and unique visitor id, conversions, click-stream analysis, and other metrics dependent on the activities of a unique visitor over time, cannot be accurate.

Cookies are used because IP addresses are not always unique to users and may be shared by large groups or proxies. In some cases, the IP address is combined with the user agent in order to more accurately identify a visitor if cookies are not available. However, this only partially solves the problem because often users behind a proxy server have the same user agent
User agent
In computing, a user agent is a client application implementing a network protocol used in communications within a client–server distributed computing system...

. Other methods of uniquely identifying a user are technically challenging and would limit the trackable audience or would be considered suspicious. Cookies are the selected option because they reach the lowest common denominator without using technologies regarded as spyware
Spyware
Spyware is a type of malware that can be installed on computers, and which collects small pieces of information about users without their knowledge. The presence of spyware is typically hidden from the user, and can be difficult to detect. Typically, spyware is secretly installed on the user's...

.

Secure analytics (metering) methods

All the methods described above (and some other methods not mentioned here, like sampling) have the central problem of being vulnerable to manipulation (both inflation and deflation). This means these methods are imprecise and insecure (in any reasonable model of security). This issue has been addressed in a number of papers
, but to-date the solutions suggested in these papers remain theoretic, possibly due to lack of interest from the engineering community, or because of financial gain the current situation provides to the owners of big websites. For more details, consult the aforementioned papers.

See also

  • Mobile Web Analytics
    Mobile Web Analytics
    Mobile web analytics studies the behavior of mobile website visitors in a similar way to traditional web analytics. In a commercial context, mobile web analytics refers to the use of data collected as visitors access a website from a mobile phone...

  • Eurocrypt
    Eurocrypt
    Eurocrypt is a conference for cryptography research. The full name of the conference is currently the Annual International Conference on the Theory and Applications of Cryptographic Techniques, but this has not always been its name...

  • List of web analytics software
  • Web bug
    Web bug
    A web bug is an object that is embedded in a web page or e-mail and is usually invisible to the user but allows checking that a user has viewed the page or e-mail. One common use is in e-mail tracking. Alternative names are web beacon, tracking bug, and tag or page tag...

  • Web log analysis software
    Web log analysis software
    Web log analysis software is a simple kind of Web analytics software that parses a log file from a web server, and based on the values contained in the log file, derives indicators about who, when, and how a web server is visited...

  • Online video analytics
    Online video analytics
    Online video analytics, also known as web video analytics, is way of measuring how viewers get to an online video and what they do when they watch it. A video is any length of video stream, such as a movie clip, video advertisement, movie trailer, television show or full-length video...

  • Post-click marketing
    Post-click marketing
    Post-click marketing is emerging as a recognized practice that aims at improving sales and marketing results by focusing on website visitors when they respond to online marketing activities such as pay per click advertising, HTML e-mails, and paid searches with the objective on increasing...

  • Geolocation
    Geolocation
    Geolocation is the identification of the real-world geographic location of an object, such as a radar, mobile phone or an Internet-connected computer terminal...

  • Geolocation software
    Geolocation software
    In computing, geolocation software is used to deduce the geolocation of another party. For example, on the Internet, one geolocation approach is to identify the subject party's IP address, then determine what country , organization, or user the IP address has been assigned to, and finally,...

  • Geomarketing
  • Geotargeting
  • Internet Protocol
    Internet Protocol
    The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

  • IP Address
    IP address
    An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...

  • Website correlation
    Website correlation
    Website correlation, or website matching, is a process used to identify websites that are similar or related. Websites are inherently easy to duplicate. This led to proliferation of identical websites or very similar websites for purposes ranging from translation to Internet marketing to...

  • Website localization
    Website localization
    Website localization is the process of adapting an existing website to local language and culture in the target market.Two factors are involved—programming expertise and linguistic/cultural knowledge....

  • Clickstream
    Clickstream
    A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing or using another software application. As the user clicks anywhere in the webpage or application, the action is logged on a client or inside the web server, as well as possibly the web browser,...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK