0

MediaWiki content management system creates many links which their webpages I want not to be discovered by search engine crawlers.

It's not only that I don't want them indexed and more so not only that I don't want them crawled, but I don't even want them discovered !

In theory I can try to customize the skin (theme/template) of my MediaWiki website to remove the HTML elements linking to these webpages but doing so sanely requires tremendous learning of the MediaWiki architecture which I'd prefer not to do if more simple solutions are available.

  • CSS display: none won't help as the markup would be evident in DOM
  • JavaScript document.querySelector("#x").remove(); won't help as until it runs, crawlers may discover the link element
  • I cannot use PHP 8.1.3 to ignore its own previous commands because the moment any markup with such link was processed, it would be served to the user.
  • I can use robots.txt to try to prevent crawling (if not indexing) of these page though, but, since my website URLs are multilingual and there are many patterns, this might be a hard task.

The only trick which might left to help me is to somehow ask the server to not serve any such markup by CSS ID or class.

As brute as it may be, can it work? If not, what other option do I have left?

4
  • If you don't want stuff discovered, don't put it on the public web. Keep your private stuff behind required authentication.
    – Mat
    Commented Mar 4, 2022 at 18:20
  • If MediaWiki does not support your requirements, you should look into other software for the purpose that supports the requirements. That is the only reasonable and maintainable way to reach your objectives. All other methods require lots of effort and can have many undesired side effects. Commented Mar 5, 2022 at 11:07
  • @TeroKilkanen I strongly agree, I would migrate to Drupal but it's already 2400 webpages and manually transfer content could take about 4 months and would be hard and I also like MediaWiki syntax a lot. Commented Mar 5, 2022 at 11:59
  • I can use robots.txt to try to prevent crawling (if not indexing) of these page though, but, since my website URLs are multilingual and there are many patterns, this might be a hard task. Still, much easier than migrating to Drupal. Commented Mar 5, 2022 at 11:59

0

You must log in to answer this question.

Browse other questions tagged .