AI Scraper Pest
If you've recently found evidence in your web logs of a persistent scraper
of your content and with no identifying user agent string, I have the
Artesian Solutions (https://www.artesian.co) an AI powered sales
intelligence platform based in Boston, USA and Reading, near London, UK have
unleashed a crawler that is ignoring "robots.txt" (and not even consulting
it) and placing servers worldwide under great load.
The robot uses a standard web browser user agent string so as to mask its
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/67.0.3396.99 Safari/537.36"
While Artesian's website is hosted on Google Cloud, the rogue IP address to
look for in your web logs is [188.8.131.52] and is owned by Fluidata (now
known as FluidOne, https://www.fluidone.co.uk) but assigned to Artesian
Solutions as part of a 32-IP block.
Formal complaints to both Fluidata and Artesian have gone unanswered.
Blocking via the ".htaccess" file is highly recommended to save your
inetnum: 184.108.40.206 - 220.127.116.11
descr: Artesian Solutions IP Assignment
status: ASSIGNED PA
role: Fluidata Admin Team
address: 2 More London Riverside
address: SE1 2AP
source: RIPE # Filtered