CRAWLER BEHAVIOR RESEARCH - README
===================================

Project: Site 4726837462198733423
Date: February 2026
Status: Active data collection

OVERVIEW
--------
This site is a controlled research environment for studying how modern web
crawlers -- particularly AI-powered bots from companies like OpenAI, Anthropic,
Amazon, Google, and Microsoft -- interact with different types of web content.

The site hosts a variety of file types including HTML documents, stylesheets,
JavaScript files, structured data (JSON, XML), images in multiple formats
(PNG, JPEG, GIF, SVG, WebP), and other file types. Each file contains
substantial, non-trivial content designed to provide meaningful observations
about crawler behavior.

METHODOLOGY
-----------
All files are served from AWS S3 with static website hosting enabled. S3
server access logging captures every HTTP request with the following fields:

  - Timestamp (date and time of the request)
  - Remote IP address
  - HTTP method and URI
  - Response status code
  - Bytes transferred
  - Referrer (HTTP Referer header)
  - User-Agent string

This data allows us to identify specific crawlers, track their navigation
patterns, measure how they handle different content types, and observe
whether they respect directives in robots.txt.

FILE INVENTORY
--------------
Text formats:
  - index.html    : Main page with links to all other files
  - about.html    : Detailed research description
  - error.html    : Custom 404 error page
  - style.css     : Complete CSS stylesheet
  - app.js        : Client-side JavaScript
  - data.json     : Structured research dataset
  - sitemap.xml   : Standard XML sitemap
  - robots.txt    : Crawler directives
  - readme.txt    : This file
  - feed.xml      : RSS 2.0 feed
  - humans.txt    : Team credits
  - manifest.json : Web app manifest
  - server.log    : Simulated log file (large)

Image formats:
  - logo.png      : 256x256 PNG with geometric pattern
  - banner.jpg    : 800x200 JPEG gradient
  - icon.gif      : 32x32 animated GIF
  - diagram.svg   : SVG with shapes and paths
  - photo.webp    : 400x300 WebP image

Other:
  - config        : No file extension (binary/text content)
  - archive.zip   : ZIP archive containing multiple files

CONTACT
-------
This is a research project. Data collected is limited to publicly available
HTTP request metadata (IP addresses, user-agent strings, referrers) and is
used solely for academic analysis of crawler behavior patterns.