Question About Web Site Traffic from Inside ChatGPT Deep Research Mode



Watching GPT Deep Research search websites in my behalf, I had a few random questions about tracking AI hits, so I asked Perplexity Deep Research. 

Prompt/Questions:

When ChatGPT is searching the web (in web or in deep research mode), and it is visiting a range of different web sites to find answers to my questions, do the hits to to the website get counted by their Google site metrics or other traffic metric services the site may be using? 

Or, is AI undetectable to these services? 

And, is it detectable as one ip? 

Has the metrics for sites caught up with the Open ai Deep research ChatGPT type of agent website crawing hits? 

Or, does it seem like the traffic is down to these traffic stats tools, but really it's just coming from other places like GPT in web search mode? 

Does that have an effect on ad tracking like adwords or whatever ad services might be generating income for that web site?


 

 Note: This report below was generated by Perplexity Deep Research and reviewed by the author.




AI Web Crawling and Analytics: How Deep Research Impacts Website Traffic Metrics

As OpenAI's Deep Research and similar AI tools increasingly crawl the web for information, website owners and marketers face new challenges in understanding their traffic metrics. This report examines how AI research agents interact with analytics systems and what implications this has for website traffic measurement and monetization.

How AI Traffic Appears in Analytics Systems

When ChatGPT's Deep Research mode visits websites to gather information, these visits do generally register in website analytics platforms, though with important nuances. Website analytics systems like Google Analytics 4 (GA4) can detect and record these visits, but they may not categorize them the same way as regular human traffic[1].

Identification and Classification Challenges

AI crawlers and bots now represent a significant portion of overall web traffic. Current estimates suggest that bot traffic accounts for approximately half of all internet traffic worldwide, with "one bot hit for every human-generated hit to all web servers"[2]. However, distinguishing AI research tools from other types of traffic presents challenges for analytics systems:

  1. Default Classification Issues: Unless specifically configured, many analytics systems may record Deep Research visits as regular traffic, potentially skewing metrics and leading to misinterpretation of user behavior data[2].
  2. Referrer Data Limitations: When ChatGPT's Deep Research agent visits a website, the referrer information may not always clearly identify it as AI-generated traffic, making it difficult for standard analytics setups to categorize accurately[1].
  3. Variable Recognition: Some more obvious AI traffic might include identifiable terms in the source name (like "bot"), but sophisticated systems like Deep Research may not be as easily identified without specific tracking configurations[2].

Tracking AI Traffic in Analytics Systems

Fortunately, website owners can implement specific tracking mechanisms to identify and measure AI-generated traffic:

Google Analytics 4 Implementation

GA4 users can set up explorations with specific dimensions and metrics to track AI chatbot traffic[1]:

  1. Creating an exploration with "Page Referrer" as the dimension and "Sessions" as the metric
  2. Applying regex filters designed to isolate traffic from AI platforms
  3. Adding supplementary dimensions like "Landing Page + Query String" and "Session Source/Medium"[1]

Similar approaches can be implemented in other analytics platforms like Adobe Analytics, where segments can be created using comparable logic[3].

IP Address Patterns and Detection

Multiple IP Utilization

Deep Research and similar AI tools likely don't operate from a single IP address. Sophisticated crawling systems typically distribute requests across multiple IP addresses, including residential IPs in some cases, to appear less conspicuous and avoid triggering rate limits or blocks[2]. This distribution makes detection based solely on IP patterns challenging.

Location Data Access

There's evidence that ChatGPT can determine a user's general location based on IP information when conducting searches[4]. This capability suggests that when Deep Research accesses websites, it likely presents with identifiable IP addresses that could theoretically be tracked by website owners.

Impact on Traffic Metrics and Revenue

Potential Traffic Measurement Distortion

The growing prevalence of AI research tools may be affecting how website traffic is measured and understood:

  1. Unrecognized Traffic: If AI visits aren't properly identified, website owners might be missing significant portions of their actual site usage in their metrics[1].
  2. Traditional Metrics Disruption: AI agents interact with websites differently than humans, potentially creating unusual patterns in metrics like bounce rate, time on page, and page views per session[2].
  3. Attribution Challenges: When content is accessed via AI tools rather than direct visits, traditional attribution models may fail to properly credit traffic sources[1].

Advertising Implications

While the search results don't directly address how AI traffic affects advertising, there are logical implications:

  1. Impression Counting: Ad impressions may be registered when AI tools view pages with advertisements, but these impressions don't represent potential human customers[2].
  2. Click-Through Irrelevance: AI research agents don't click on advertisements, potentially reducing click-through rates if AI traffic is counted in the denominator of CTR calculations.
  3. Revenue Model Challenges: Website business models that rely on advertising revenue may need to adapt as the proportion of non-human traffic increases[1][2].

Analytics Adaptation and Future Outlook

The analytics industry is beginning to adapt to the rise of AI traffic, though significant gaps remain:

  1. Built-in Limitations: Google Analytics offers basic bot filtering options that exclude known bots, but these primarily target traditional web crawlers rather than sophisticated AI research tools[2].
  2. Custom Solutions Emerging: Marketing professionals are developing specialized tracking approaches for identifying AI-sourced traffic, indicating growing awareness of the issue[1][3].
  3. Emerging Best Practices: Setting up dedicated segments or views to isolate AI traffic is becoming an important practice for accurate traffic analysis[1][2].

Conclusion

When ChatGPT's Deep Research mode and similar AI tools crawl websites, they do generate hits that are recorded by analytics systems, though they may not be properly categorized without specific configuration. This growing segment of traffic has implications for how websites measure their audience, attribute traffic sources, and potentially monetize through advertising.

Website owners and marketers should implement specific tracking mechanisms to properly identify AI-generated traffic and develop a more accurate understanding of their true human audience. As AI tools continue to evolve, analytics systems will need to adapt further to provide meaningful data for business decision-making in a landscape where machine-generated visits represent an increasingly significant portion of overall web traffic.


  1. https://bendyourmarketing.com/blog/how-to-track-ai-and-llm-chatbot-traffic/         
  2. https://datadome.co/guides/bot-protection/exclude-bot-traffic-from-google-analytics/         
  3. https://experienceleaguecommunities.adobe.com/t5/adobe-analytics-questions/how-to-track-chat-gpt-bard-and-all-these-ai-searches/m-p/677086  
  4. https://www.reddit.com/r/ChatGPT/comments/1hosz7m/how_does_chatgpt_know_my_true_current_location_is/ 


I thought that was pretty interesting given the other things I've read about how people are effected by that sort of thing. Hopefully, you find it useful info.