Make webserver to prevent parsing of certain HTML elements

Ask Question

Asked 2 years, 3 months ago

Modified 2 years, 3 months ago

Viewed 106 times

MediaWiki content management system creates many links which their webpages I want not to be discovered by search engine crawlers.

It's not only that I don't want them indexed and more so not only that I don't want them crawled, but I don't even want them discovered !

In theory I can try to customize the skin (theme/template) of my MediaWiki website to remove the HTML elements linking to these webpages but doing so sanely requires tremendous learning of the MediaWiki architecture which I'd prefer not to do if more simple solutions are available.

CSS display: none won't help as the markup would be evident in DOM
JavaScript document.querySelector("#x").remove(); won't help as until it runs, crawlers may discover the link element
I cannot use PHP 8.1.3 to ignore its own previous commands because the moment any markup with such link was processed, it would be served to the user.
I can use robots.txt to try to prevent crawling (if not indexing) of these page though, but, since my website URLs are multilingual and there are many patterns, this might be a hard task.

The only trick which might left to help me is to somehow ask the server to not serve any such markup by CSS ID or class.

As brute as it may be, can it work? If not, what other option do I have left?

edited Mar 5, 2022 at 12:01

asked Mar 4, 2022 at 18:05

technology-liker

11 bronze badge

If you don't want stuff discovered, don't put it on the public web. Keep your private stuff behind required authentication.
– Mat
Commented Mar 4, 2022 at 18:20
If MediaWiki does not support your requirements, you should look into other software for the purpose that supports the requirements. That is the only reasonable and maintainable way to reach your objectives. All other methods require lots of effort and can have many undesired side effects.
– Tero Kilkanen
Commented Mar 5, 2022 at 11:07
@TeroKilkanen I strongly agree, I would migrate to Drupal but it's already 2400 webpages and manually transfer content could take about 4 months and would be hard and I also like MediaWiki syntax a lot.
– technology-liker
Commented Mar 5, 2022 at 11:59
I can use robots.txt to try to prevent crawling (if not indexing) of these page though, but, since my website URLs are multilingual and there are many patterns, this might be a hard task. Still, much easier than migrating to Drupal.
– technology-liker
Commented Mar 5, 2022 at 11:59

Add a comment |

Stack Exchange Network

Make webserver to prevent parsing of certain HTML elements

0

You must log in to answer this question.

Browse other questions tagged
php
javascript
mediawiki
robots.txt
css
.

Hot Network Questions

Make webserver to prevent parsing of certain HTML elements

0

You must log in to answer this question.

Browse other questions tagged phpjavascriptmediawikirobots.txtcss.

Related

Hot Network Questions

Browse other questions tagged
php
javascript
mediawiki
robots.txt
css
.