1
Vote

Implement an efficient web page thumbnail generator.

description

Using system components it is possible to generate web pages from offline content. This is nevertheless quite a problem forpages containing a lot of javascript functionality. Although we have captured all necessary objects we need to modify the rendering of web pages so these objects are provided from the captured content not from the Internet.

Idea:

Use some existing web rendering component to visualize web pages. There are more options.

MSHTML
Use MSHTML components to render web pages, but wrap them within host environment that serves as responder for all requests.

See http://www.codeproject.com/Articles/9925/Offline-Browser-using-WinInet-URL-Moniker-and-MSHT
for example how to load web pages and resolve references using MSHTML and Moniker objects.

And here are other reference worth of visiting:
http://technet.microsoft.com/en-us/exchange/bb508516(v=vs.85).aspx

Some utilities that can be found on the Internet: http://iecapt.sourceforge.net

WEBKIT
In comparison to MSHTML this engine can be made standalone and it seems to be easier to hack in. The main web page is here: https://www.webkit.org

The wiki pages describing how to hack WebKit (http://trac.webkit.org/wiki/QtWebKitHacking) provides information on the QNetworkAccessManager, which processes all HTTP communication. Thus this class should be modified to work with local cache instead.

Moreover some similar utilities were developed based on this rendering engine, e.g., http://sourceforge.net/projects/cutycapt/

Procedure:

  1. Install QtWebkit
  2. Use CutyCapt as the basis and allow for replacing the standard QNetworkAccessManager by a custom one:
    At CutyCap.cpp on line 388, the manager is instantiated then it can be set using page.setNetworkAccessManager(&manager);
  3. Implement NetworkAccessManager that reads content from offline storage. To do so, we need to modify HTTP export function to include headers in responses.
  4. How to implement custom network access manager. It seems that QNetworkAccessManager cannot be reimplemented as it is in the core of QtWebkit. Instead we may use cache mechanism to provide offline data:
void QNetworkAccessManager::setCache ( QAbstractNetworkCache * cache )
  1. Make an exportable function that can be called from C# code. Because of the performance, this function will performs all necessary steps.

comments