BrowserPipe

From Organic Design wiki

A lot of projects I work with require the ability to have live bidirectional communications with a browser instance allowing the browser DOM content and events to be sent down the communications channel, and location requests to be sent up the channel.

One use of this would be to run a headless browser on a server to do scraping jobs where the content being scraped is the result of AJAX requests rather than being present directly in the initial page source. Scraping data is becoming increasingly difficult as sites use more and more AJAX functionality in their content, so a general purpose solution like this would be ideal.

It shouldn't be too difficult to write a Firefox extension to achieve this as the required functionality is actually quite minimal. The most basic functionality would require simply the ability for another program running on the same host as the browser to query the extension asking for the current DOM structure, or tell the extension to send the browser to a new location.

The next level of functionality after this would be for the extension to send notification of DOM changes down to the listening local program so that it could know for example when AJAX requests have completed.

See also