How to find all pages on a website

[ad_1]

Within the website positioning world, information reigns supreme. Entry to it’s essential for making knowledgeable selections and crafting a profitable technique on your undertaking. To win this sport, it is best to give attention to two issues: staying up-to-date with the market and having a radical understanding of your website.

On this article, we’ll give attention to attending to know your website, as the best alternatives and potential pitfalls generally lurk in essentially the most surprising of locations. And not using a full record of the pages in your web site, you run the chance of lacking out on beneficial alternatives for enchancment, which additionally means shedding an edge over your opponents. 

Let’s take a look at some doable the explanation why you may want to search out all of the pages of a web site, which instruments will allow you to do that successfully, and precisely how.

Why do I want to search out each single web page?

Search engines like google and yahoo are always introducing new algorithms and making use of handbook penalties to pages and websites. So should you don’t have a radical understanding of all of your web site’s pages — you’re tiptoeing by way of an website positioning minefield.

SEO minefield

With a purpose to avert a severe setback, you have to preserve a detailed eye on all of the pages that make up your web site. Doing so is not going to solely allow you to find pages you already knew about, however can even allow you to discover forgotten pages, pages you had no concept existed and would in any other case not be capable of view.

There are a number of doable eventualities when it’s a must to know the way to discover all the online pages of a website, comparable to:

  • Altering web site structure
  • Transferring to a brand new area or altering the URL construction
  • Discovering orphan pages
  • Discovering duplicate pages
  • Discovering 404 pages
  • Creating redirects
  • Creating a web site hreflang file
  • Checking for canonical and noindex tags
  • Organising inner linking
  • Creating an XML sitemap or robots.txt file

discover all pages of a web site

Now, when it comes right down to discovering all the online pages that belong to a single web site, the next choices can be found:

  • Use SE Rating’s Web site Audit to search out all crawlable internet pages.
  • Use Google Search Console to find pages which can be solely seen to Google. 
  • Use Google Analytics to detect all pages which have ever been visited.
  • Use Bing Webmaster instruments to catch pages which can be solely seen to Bing. 
  • Obtain the web site pages record utilizing command line instructions.
  • Extract web site URLs from Sitemap.xml.
  • View all pages of a web site by way of WordPress and different plugins.

Let’s discover the way to use these instruments to scan a web site for all pages.

Discovering crawlable pages by way of SE Rating’s Web site Audit

Let’s begin by amassing all of the URLs that each individuals and search engine crawlers can go to by following your website’s inner hyperlinks. Analyzing such pages ought to be your prime precedence as they get essentially the most quantity of consideration.

To do that, we’re going to first have to entry SE Rating, add a web site or choose an current one, and open the Overview tab within the Web site Audit Report.

Accessing Website Audit in SE Ranking

Be aware: The 14-day free trial provides you entry to all of SE Rating’s accessible instruments and options, together with Web site Audit.

Subsequent, let’s configure the settings to verify we’re telling the crawler to undergo the appropriate pages. To entry Web site Audit settings, click on on the Gear icon within the prime right-hand nook:

Accessing Website Audit settings

Below settings, go to the Supply of pages for web site audit tab, and allow the system to scan Website pages, Subdomains, XML sitemap to confirm that we’re solely scanning what’s been clearly specified, and are together with the positioning’s subdomains together with all their pages:

Accessing Source of pages for website audit

Then, go to Guidelines for scanning pages, and allow the Have in mind the robots.txt directives possibility to inform the system to observe the directions specified within the robots.txt file. Click on ‘Apply Modifications’ once you’re finished:

Accessing Rules for scanning pages

Now, return to Overview tab and launch the audit with the brand new settings utilized by hitting ‘Restart audit’:

Restarting Website Audit

As soon as the audit is full, go to Crawled Pages to view the total record of all crawlable pages:

List of Crawled Pages in Website Audit

Along with a listing of all URLs discovered, you’ll see the entire website’s important website positioning metrics, together with:

  • Referring pages: the variety of pages in your web site that hyperlink to the precise URL. 
  • Variety of errors on every web page.
  • Variety of characters within the web page’s URL.
  • Kind and model of URL protocol.
  • Whether or not the web page is within the sitemap.
  • Server response code on the web page.
  • Whether or not the web page is blocked by robots.txt.

The above metrics will seem as default columns within the Crawled Pages tab. Nonetheless, you’ll be able to customise the report to fulfill your particular necessities by including different parameters. Merely go to the Columns part above the desk on the appropriate and allow or disable any metric as desired.

Crawled pages columns

Crawled pages will be sorted by the errors, warnings, and notices that they comprise. This can allow you to prioritize pages when fixing points. You may as well examine reviews after getting audited your web site a minimum of twice. This helps you observe the adjustments in your crawled pages over time.

Comparing and sorting crawled pages

It’s additionally doable to make use of filters to kind out pages by completely different parameters. For example, should you solely wish to see 200-status-code pages, as in these which can be working accurately, add a filter like so:

Now it’s time to export the outcomes. Go to Export and select to save lots of the information both in .xls or .csv format.

Exporting data in Website Audit

RUN A WEBSITE AUDIT

Rating your web site in 2 minutes.

Enter any web site URL to get an in depth report on tech points and advised options.

Discovering all pages by way of Google Search Console

One other device that you need to use to search out all pages on a web site is Google Search Console. Be mindful, nevertheless, that GSC will solely present you pages that Google can entry.

Begin by opening up your account and going to the Indexing tab. Select the Pages report and choose ‘All recognized pages’ as an alternative of ‘All submitted pages’.

Pages in GSC

The inexperienced field labeled “Listed” will provide you with the variety of URLs listed by Google. Click on on ‘View information about listed pages’ beneath the graph. From right here, you could find all URLs on a website that Google listed, in addition to the date of their final crawl. You’ll be able to export them in Google Sheets or obtain them as .xls or .csv recordsdata.

Indexed pages in GSC

Now, let’s return to verify the Not Listed pages (pages that weren’t listed and gained’t seem in Google). Get able to roll up your sleeves and do a number of handbook work.

As you scroll down, you’ll see a listing of the explanation why a few of the pages in your web site aren’t listed.

Why Google doesn't index pages

You’ll be able to view completely different classes, comparable to redirect errors, pages excluded by ‘noindex’ tag, these blocked by robots.txt, and so forth.

By clicking on the error class, you’ll see a listing of affected pages. Going by way of every one in every of them will provide you with unfiltered entry to each single web page that Google has found in your website.

Pages blocked from indexing in GSC

One other helpful report in GSC for locating all web site pages is the Search Outcomes report. This report shows all pages which have obtained a minimum of one impression in search outcomes. To entry it, go to Search Ends in the Efficiency tab, arrange the biggest doable interval, and select Pages. Lastly, export your report.

Pages that get impressions

You may as well use the Search Analytics for Sheets add-on to retrieve information from GSC on demand and create computerized backups in Google Sheets.

Discovering all pages with pageviews by way of Google Analytics

You may as well discover all web site pages by rigorously learning the information in your Google Analytics account. There is just one situation: your web site should be linked to your Google Analytics account from the get-go, in order that it could actually acquire information behind the scenes.

The logic right here is easy: if somebody has ever visited any web page of your web site, Google Analytics could have the information to show it. And since these visits are made by individuals, we must always guarantee such pages serve a definite website positioning or advertising and marketing function.

Begin by going to Experiences → Engagement → Pages and Screens and click on on Web page path and display screen class.

Pages and screens report in GA

You may as well click on on ‘Views’ to get the arrow pointing up and kind the web page URLs from least to most pageviews. In the end, the least visited pages will likely be seen on the prime of the record.

Then, share or export the information right into a .csv file.

Discovering all web site pages through the use of Bing Webmaster Instruments

Google is taken into account a search big, however Bing remains to be one of the in style serps to this point. As of January 2023, its international search market is 8.85%, making correctly optimizing your web site for Bing a sensible technique. 

You may as well make use of Bing Webmaster Instruments to search out all pages on a web site which can be listed by Bing. The method is simple, however earlier than getting began, just remember to have added and verified your web site.

As soon as your website is about up, go to Website Explorer within the left-hand navigation bar and select to filter by ‘All URLs’. The report will show all of the pages that Bing sees in your website. Click on on the Obtain button to export the information.

Pages in Bing

Discovering all pages on a web site utilizing the command line

A extra techie method to discovering the record of all pages on a web site is to make use of the command line. This methodology includes interacting with a pc by way of textual content instructions. 

To get all URLs from a web site by way of the command line, you’ll first want to put in Wget. This command line device means that you can retrieve recordsdata from the online. The set up course of varies relying on the OS you’re utilizing. For example, macOS requires that you just set up the Homebrew package deal supervisor first earlier than putting in command line instruments like Wget.

To put in Wget, open a terminal window and kind within the following command:

brew set up wget

Subsequent, obtain your web site. Enter:

wget -r www.examplesite.com

Wget will obtain your web site recursively. It can begin with the principle web page of the web site and all its linked pages, pictures, and different recordsdata. From there, it can observe the hyperlinks on these pages and obtain the pages they hyperlink to, and can proceed this course of till it has downloaded your entire web site.

As soon as the web site is downloaded, you’ll be able to ask Wget to record out the URLs by typing:

discover www.examplesite.com

Discovering all web site pages by way of Sitemap.xml

XML sitemap can be an effective way to search out all pages on a web site as a result of it gives a listing of all of the pages on it that the proprietor needs serps to index. It really works like a roadmap, guiding search engine bots by way of the positioning’s construction and making it simpler for them to grasp how every part is organized.

The get the record, observe these steps:

The sitemap can often be discovered within the web site’s root listing or by typing “/sitemap.xml” on the finish of the web site’s URL, like in http://sitename.com/sitemap.xml. If the web site has a number of sitemaps, they are going to be listed in the principle sitemap file, which is often named sitemap-index.xml. Right here, you could find a listing of sitemaps and select essentially the most related one to get a listing of the web site’s pages.

  • Extract URLs out of your sitemap into Google Sheets. 

Although that is one other tech-heavy methodology, it can prevent tons of effort and time by importing all URLs into Google Sheets in only a few seconds. Nonetheless, you will want to do some preparatory work first. Start by creating a brand new sheet after which go to Extensions. Subsequent, select ‘Apps Script’.

Google Sheets extensions

Now you’ll want to repeat and paste the next customized JavaScript code into script editor to create a brand new operate:

operate sitemap(sitemapUrl,namespace) {

  

  strive {

    var xml = UrlFetchApp.fetch(sitemapUrl).getContentText();

    var doc = XmlService.parse(xml);

    var root = doc.getRootElement()

    var sitemapNameSpace = XmlService.getNamespace(namespace);

    

    var urls = root.getChildren('url', sitemapNameSpace)

    var locs = []

    

    for (var i=0;i <urls.size;i++) {

      locs.push(urls[i].getChild('loc', sitemapNameSpace).getText()) 

    }

    

    return locs  

  } catch (e) {

    return e 

  }

}
Script editor in Google Sheets

From right here, save and run the check. The editor will ask for entry permission. If the script is applied efficiently, no error messages will seem As an alternative, you’ll see that the execution has began and accomplished.

Execute new script

Now you will get again to your Google Sheet and enter the next method:

=sitemap("Sitemap Url","Namespace Url")

Upon getting situated your sitemap, it is best to have the sitemap URL at hand, from which you’ll be able to immediately copy the Namespace URL. Within the screenshot beneath, you’ll be able to see how this works with SE Rating’s sitemap:

  • Sitemap URL is within the browser tackle bar.
  • Namespace URL is within the first line of sitemap content material.
Sitemap and Namespace URLs

Copy these addresses and paste them as hyperlinks into the method. That is what it is best to get ultimately:

=sitemap("https://seranking.com/sitemap.xml","http://www.google.com/schemas/sitemap/0.9")

Paste this method into your Google Sheet, and press Enter to get all of the URLs extracted out of your sitemap.

Site URLs extracted from sitemap

Discovering all web site pages by way of WordPress plugins

In case your web site runs on WordPress, there are plugins that may allow you to discover all of its pages. The 2 mostly used choices are:

To make use of these plugins, obtain and set up them to your ‘/wp-content/plugins/’ listing. As soon as put in, activate them out of your Plugins web page in WordPress. When you’ve accomplished the activation, you’ll be able to record or export all URLs of your web site.

For those who use a distinct CMS, verify whether or not it has built-in plugins for this function or has a separate one within the extension listing. For example, Joomla has many plugins for producing sitemaps that can assist you get a listing of web site pages. Equally, Drupal has a sitemap module that may come in useful when amassing the pages in your website.

see all pages of a web site on Google

We have now mentioned varied instruments that may allow you to discover all of the pages in your web site, however we’ve saved the only possibility for final—and for good cause. Whereas this methodology is the simplest, you should be cautious when utilizing it. 

Google your website, however do it proper. This implies utilizing Google search operators, that are particular instructions that refine the search primarily based on set standards.

Merely go to Google Search and kind “website:web site title” like in “website:seranking.com”. You gained’t want so as to add http:// or www, however be sure that there are not any areas between the operator and question; in any other case, the outcomes will likely be incorrect.

Website pages on Google

Nonetheless, you have to take into account that this command wasn’t created to show all listed pages in your website, to which Google’s John Mueller confirmed.

John Mueller

Search Advocate at Google

The brief reply is {that a} website: question is just not meant to be full, nor used for diagnostics functions …This question limits the outcomes to a particular web site. It’s not meant to be a complete assortment of all of the pages from that web site.

One final tip is to make use of Google Search Console, however in a barely completely different means than was defined in earlier sections. Concentrate on the web page’s impressions (the variety of instances the web page was proven in search) however for a brief time period; like seven days, for example.

Closing ideas

Finding your whole web site’s pages is essential to bettering its website positioning and serves because the preliminary step in direction of additional optimization actions. With this information, you’ll be able to determine pages requiring enchancment, replace outdated content material, discover all web site hyperlinks and repair damaged ones, and optimize your total inner linking construction.

By utilizing the instruments and strategies described on this article, you’ll be able to rapidly uncover the entire pages in your web site and acquire beneficial insights into its efficiency. So, take a quick second to compile a web page record to remain on prime of your website positioning and advertising and marketing efforts. Take it from the SE Rating staff: it gained’t take a lot time in any respect once you’re utilizing our instruments.

[ad_2]

Scroll to Top