We collect web data by crawling the internet
using a high-tech spider platform. This platform has been developed using
academic principles. Only web data which is new to the web or has recently
been updated will be retrieved by the system.
Collecting web data is a 24/7 job. Thousands of spiders are crawling
simultaneously, collecting new or updated data. The system adjusts its
behaviour to the environment it is currently working in. It takes into account
many parameters, such as the response time of a webserver or which web data is
new or has been updated. The spider system then continuely adjusts its
algorithms automatically to match that pattern and to retrieve the data as
effeciently as possible. This is a non-stop process which combines the
collection and structuring of data together which effectively results in a
non-stop valueable feed of web data and insight into the web it self.
For more information on our webcrawlers, please visit our
webcrawler page on the subject.
Frequently asked questions
What is the purpose of your webcrawler?
We are constantly crawling the internet with a dozen different webcrawlers,
the largest being the Dutch crawler and the United Kingdom crawler. We run a
couple of public search engines like Track and
Kobala, we are trying to set up an index for the UK as well.
How can I manage the webcrawlers?
Please see the robots.txt FAQ on our webcrawler page.
What is the name of your crawler?
The pages you are visiting are broken. What can I do?
Use the form below to let us know about the problem. Please provide one or more of the broken URLs our
webcrawler visits on your site.