Archive for the ‘本站SEO译文’ Category

14
Jan

GOOGLE的PR值详解和如何提高PR值

   Posted by: admin   in 本站SEO译文

作者: 福瑞
原载: 三七魔域
版权声明:转载请以链接的形式注明作者及原出处,并保留本版权信息,严禁一切非法复制。

英文原文:Google’s PageRank Explained and how to make the most of it

 

from:http://www.webworkshop.net/pagerank.htmlby Phil Craven

什么是PageRank?

PageRank是一个代表你的页面在网络上有多么重要的数字值。GOOGLE计算当一个页面链接到其他页面,他就会有效的为那个页面投了一票。为某个页面投的票越多,这个页面的重要性就越高。而且,进行投票的页面自身的重要性也决定了他所投的票的重要性。GOOGLE依据这些投票来计算一个页面的重要性。当计算一个页面的PageRank时,每个投票的权重值将被计入这个页面的帐户中。PageRank是GOOGLE决定一个页面重要性的方式。他是很重要的,因为他是决定一个页面在搜索结果中的排名的一个因素。虽然他不是唯一的一个影响排名的因素,但是他是很重要的一个。

从现在起,我们会偶尔把PageRank称作是“PR” 。

注意:

不是所有的链接都被GOOGLE计算在内。举例子说,他们会过滤出已知链接农场中的导出链接。一些链接会导致一些站点被GOOGLE惩罚。他们恰当地指出站长不能控制哪个站点链接向他们的站点,但是他们可以控制他的站点会链接到哪个站点。因为这个原因,导入链接不会伤害这个站点,但是如果他们导出链接到一些被惩罚的站点却是有害的。所以小心你站点的导出链接。如果一个站点PR为0,那么他很可能是一个被惩罚的站点,链接向他可是不明智的。

PageRank 是如何计算的?

为一个页面计算PageRank,所有他的导入链接会被计算在内。这些包括内部导入链接和外部导入链接。

PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))

这就是计算一个页面的PR的等式。这是当PR刚开发出来的时候所公布的最初的一个 ,当然GOOGLE可能会对公式做过很多我们所不知道的调整,但这并不重要,因为这个公式已经足够好了。

在等式里‘t1-tn’是链接到A页面的页面,”C”是这个页面所拥有的导出链接的数目,“d”是一个阻尼系数,一般设置为0.85。

我们可以简化一下:-

目标页面的PR=0.15+0.85*(每个链接向目标页面的页面“分享”给目标页面的PR)

“分享”=链接向目标页面的页面的PR值,除以源页面的导出链接总数的值。

一个页面投一定的“票”(PR值)到每个他链接到的页面。 他所投出的PR值的数量比他自身的PR值稍微少一些(他的PR值*0.85)。这些值平均地分配到他链接到的页面上。

所以,我们可以得出结论:一个PR4有5个导出链接的链接,比 一个PR8但是有100个导出链接的链接要有价值。导入链接的PR值是重要的,但是这个来源页面拥有导出链接的数量也是重要的。来源页面有越多的导出链接,你从他那里所得到PR值就越少。

如果PR1,PR2,…,PR10之间的PR值差是相等的那么这个结论就可以成立, 但是很多人相信PR1和PR10之间的差呈对数进阶关系,而且这里有一个很好的原因让人们相信这一点。没有任何一个GOOGLE公司外部人士能够确定的知道一个结论,但是差值是很大的,达到了对数级或者类似的级别。如果是这样,这表明一个页面PR值升级到高一个级别所需要得到的PR值,要比这个页面从低一个级别升上来时得到的PR值要多的多。这个结果将推翻前面的那个结论,所以,一个拥有多个导出链接的PR8的网页所传递的PR值,要比一个只有少数几个导出链接的PR4的网页所传递的PR值要高得多。

不管GOOGLE采用那种等级算法,我们能够确定一件事。一个从其他站点的链接会增加你的站点的PR值。只是记住一点,避免链接农场的链接。
注意到当一个页面把PR值投票到其他页面的时候,他自身的PR并不因为他投出的PR值而减少。这个页面投票不会丢失PR值,也不会因任何原因而结束。这不是PR值的传递。这只是一个根据这个页面的PR值进行的投票。这就象股东会议上每个股东根据自己的股份进行的投票,但是他们的股份本身并不丢失。即使如此,页面会间接的丢失PR值,后面会谈及。

到此为止?好的。现在让我们看看真正的计算是怎样进行的。

计算一个页面PR,它已经有的PR将被完全放弃,全新的计算将完全依据页面现有的导入链接的“投票”值,当然这些“投票值”与上一次页面PR被计算的时候可能会有变化。

公式很清楚地显示出一个页面的PR值是如何达到的。但是结果并不是马上生效的,因为如果计算只计算一次,那么公式不会有效;假设我们有两个页面,A和B,彼此互相链接并且没有任何其他任何类型的链接。那么:

步骤一:根据A的导入链接计算它的PR

A页面现在有了一个新PR值。计算使用了从B页面传递来的值。但是B有一个从A导入的链接, 并且它自己的新PR值还没有计算出来,所以A页面的新PR值建立在一个不精确的数据上,并且也不可能是精确的。

步骤二:根据B的导入链接计算它的PR

B页面有一个新的PR值,但这并不是准确的数据,因为计算使用了A页面的不准确的新PR值。

这是一个进退两难的问题。我们无法计算出A的PR除非我们知道B的PR,但是我们无法计算出B的PR直到我们知道了A的PR后。

现在所有的页面都有了新计算出来的PR值,难道我们不能再执行一遍计算去获得精确的值?不能。我们可以再执行一遍计算以获得更精确的值,但是我们总是会使用不精确的值去参与计算,所以结果永远不会是精确的。

解决问题的办法是重复计算很多次。每次会产生一个稍微精确一点的值。实际上,最精确的值是永远也得不到的,因为这结果建立在一些不精确的值上。40到50次计算基本就足够精确了,再多次的计算能精确的程度可以忽略不计。GOOGLE每次更新的时候结果就更精确一些,这也是为什么PR更新会持续这么长时间的原因。

需要记住的一件事是我们从计算里得到的结果是一个比例。这个结果必须要使用一个系数来进行调整以达到真正的PR值,这个系数只有GOOGLE知道。即使如此,我们可以使用计算结果去传递PR到他所拥有的页面,使某些页面比其他页面能分配到更多的PR。

注意:
你可能会看见一样的PR计算等式,但是每次重复计算的结果都被加到了页面现有的PR值上。新的值(结果+原PR值)被用在了计算其他页面PR值的公式中。这种解释是错误的,下面是原因::-

1. 他们引用了相同的被公布的等式-但是现在它变了;

PR(A) = (1-d) + d(……)  到 PR(A) = PR(A) + (1-d) + d(……)

这是不正确的,也不是必要的.

2. We will be looking at how to organize links so that certain pages end up with a larger proportion of the PageRank than others. Adding to the page’s existing PageRank through the iterations produces different proportions than when the equation is used as published. Since the addition is not a part of the published equation, the results are wrong and the proportioning isn’t accurate.

According to the published equation, the page being calculated starts from scratch at each iteration. It relies solely on its inbound links. The ‘add to the existing PageRank’ idea doesn’t do that, so its results are necessarily wrong.

[TOP]

内部链接

Fact: A website has a maximum amount of PageRank that is distributed between its pages by internal links.

The maximum PageRank in a site equals the number of pages in the site * 1. The maximum is increased by inbound links from other sites and decreased by outbound links to other sites. We are talking about the overall PageRank in the site and not the PageRank of any individual page. You don’t have to take my word for it. You can reach the same conclusion by using a pencil and paper and the equation.

Fact: The maximum amount of PageRank in a site increases as the number of pages in the site increases.

The more pages that a site has, the more PageRank it has. Again, by using a pencil and paper and the equation, you can come to the same conclusion. Bear in mind that the only pages that count are the ones that Google knows about.

Fact: By linking poorly, it is possible to fail to reach the site’s maximum PageRank, but it is not possible to exceed it.

Poor internal linkages can cause a site to fall short of its maximum but no kind of internal link structure can cause a site to exceed it. The only way to increase the maximum is to add more inbound links and/or increase the number of pages in the site.

Cautions: Whilst I thoroughly recommend creating and adding new pages to increase a site’s total PageRank so that it can be channeled to specific pages, there are certain types of pages that should not be added. These are pages that are all identical or very nearly identical and are known as cookie-cutters. Google considers them to be spam and they can trigger an alarm that causes the pages, and possibly the entire site, to be penalized. Pages full of good content are a must.

What can we do with this ‘overall’ PageRank?

We are going to look at some example calculations to see how a site’s PageRank can be manipulated, but before doing that, I need to point out that a page will be included in the Google index only if one or more pages on the web link to it. That’s according to Google. If a page is not in the Google index, any links from it can’t be included in the calculations.

For the examples, we are going to ignore that fact, mainly because other ‘Pagerank Explained’ type documents ignore it in the calculations, and it might be confusing when comparing documents. The calculator operates in two modes:- Simple and Real. In Simple mode, the calculations assume that all pages are in the Google index, whether or not any other pages link to them. In Real mode the calculations disregard unlinked-to pages. These examples show the results as calculated in Simple mode. pagerank, page rank

Let’s consider a 3 page site (pages A, B and C) with no links coming in from the outside. We will allocate each page an initial PageRank of 1, although it makes no difference whether we start each page with 1, 0 or 99. Apart from a few millionths of a PageRank point, after many iterations the end result is always the same. Starting with 1 requires fewer iterations for the PageRanks to converge to a suitable result than when starting with 0 or any other number. You may want to use a pencil and paper to follow this or you can follow it with the calculator.

The site’s maximum PageRank is the amount of PageRank in the site. In this case, we have 3 pages so the site’s maximum is 3.

At the moment, none of the pages link to any other pages and none link to them. If you make the calculation once for each page, you’ll find that each of them ends up with a PageRank of 0.15. No matter how many iterations you run, each page’s PageRank remains at 0.15. The total PageRank in the site = 0.45, whereas it could be 3. The site is seriously wasting most of its potential PageRank.

Example 1 pagerank, page rank

Now begin again with each page being allocated PR1. Link page A to page B and run the calculations for each page. We end up with:-
Page A = 0.15
Page B = 1
Page C = 0.15

Page A has “voted” for page B and, as a result, page B’s PageRank has increased. This is looking good for page B, but it’s only 1 iteration - we haven’t taken account of the Catch 22 situation. Look at what happens to the figures after more iterations:-

After 100 iterations the figures are:-
Page A = 0.15
Page B = 0.2775
Page C = 0.15

It still looks good for page B but nowhere near as good as it did. These figures are more realistic. The total PageRank in the site is now 0.5775 - slightly better but still only a fraction of what it could be.

NOTE:
Technically, these particular results are incorrect because of the special treatment that Google gives to dangling links, but they serve to demonstrate the simple calculation.

Example 2 pagerank, page rank

Try this linkage. Link all pages to all pages. Each page starts with PR1 again. This produces:-
Page A = 1
Page B = 1
Page C = 1

Now we’ve achieved the maximum. No matter how many iterations are run, each page always ends up with PR1. The same results occur by linking in a loop. E.g. A to B, B to C and C to D. View this in the calculator.

This has demonstrated that, by poor linking, it is quite easy to waste PageRank and by good linking, we can achieve a site’s full potential. But we don’t particularly want all the site’s pages to have an equal share. We want one or more pages to have a larger share at the expense of others. The kinds of pages that we might want to have the larger shares are the index page, hub pages and pages that are optimized for certain search terms. We have only 3 pages, so we’ll channel the PageRank to the index page - page A. It will serve to show the idea of channeling.

Example 3 pagerank, page rank

Now try this. Link page A to both B and C. Also link pages B and C to A. Starting with PR1 all round, after 1 iteration the results are:-
Page A = 1.85
Page B = 0.575
Page C = 0.575

and after 100 iterations, the results are:-
Page A = 1.459459
Page B = 0.7702703
Page C = 0.7702703

In both cases the total PageRank in the site is 3 (the maximum) so none is being wasted. Also in both cases you can see that page A has a much larger proportion of the PageRank than the other 2 pages. This is because pages B and C are passing PageRank to A and not to any other pages. We have channeled a large proportion of the site’s PageRank to where we wanted it.

Example 4 pagerank, page rank

Finally, keep the previous links and add a link from page C to page B. Start again with PR1 all round. After 1 iteration:-
Page A = 1.425
Page B = 1
Page C = 0.575

By comparison to the 1 iteration figures in the previous example, page A has lost some PageRank, page B has gained some and page C stayed the same. Page C now shares its “vote” between A and B. Previously A received all of it. That’s why page A has lost out and why page B has gained. and after 100 iterations:-
Page A = 1.298245
Page B = 0.9999999
Page C = 0.7017543

When the dust has settled, page C has lost a little PageRank because, having now shared its vote between A and B, instead of giving it all to A, A has less to give to C in the A–>C link. So adding an extra link from a page causes the page to lose PageRank indirectly if any of the pages that it links to return the link. If the pages that it links to don’t return the link, then no PageRank loss would have occured. To make it more complicated, if the link is returned even indirectly (via a page that links to a page that links to a page etc), the page will lose a little PageRank. This isn’t really important with internal links, but it does matter when linking to pages outside the site.

Example 5: new pages

Adding new pages to a site is an important way of increasing a site’s total PageRank because each new page will add an average of 1 to the total. Once the new pages have been added, their new PageRank can be channeled to the important pages. We’ll use the calculator to demonstrate these.

Let’s add 3 new pages to Example 3 [view]. Three new pages but they don’t do anything for us yet. The small increase in the Total, and the new pages’ 0.15, are unrealistic as we shall see. So let’s link them into the site.

Link each of the new pages to the important page, page A [view]. Notice that the Total PageRank has doubled, from 3 (without the new pages) to 6. Notice also that page A’s PageRank has almost doubled.

There is one thing wrong with this model. The new pages are orphans. They wouldn’t get into Google’s index, so they wouldn’t add any PageRank to the site and they wouldn’t pass any PageRank to page A. They each need to be linked to from at least one other page. If page A is the important page, the best page to put the links on is, surprisingly, page A [view]. You can play around with the links but, from page A’s point of view, there isn’t a better place for them.

It is not a good idea for one page to link to a large number of pages so, if you are adding many new pages, spread the links around. The chances are that there is more than one important page in a site, so it is usually suitable to spread the links to and from the new pages. You can use the calculator to experiment with mini-models of a site to find the best links that produce the best results for its important pages.

Examples summary

You can see that, by organising the internal links, it is possible to channel a site’s PageRank to selected pages. Internal links can be arranged to suit a site’s PageRank needs, but it is only useful if Google knows about the pages, so do try to ensure that Google spiders them.

Inbound and Outbound links

Examples of these could be given but it is probably clearer to read about them (below) and to ‘play’ with them in the calculator.

Questions

When a page has several links to another page, are all the links counted?

E.g. if page A links once to page B and 3 times to page C, does page C receive 3/4 of page A’s shareable PageRank?

The PageRank concept is that a page casts votes for one or more other pages. Nothing is said in the original PageRank document about a page casting more than one vote for a single page. The idea seems to be against the PageRank concept and would certainly be open to manipulation by unrealistically proportioning votes for target pages. E.g. if an outbound link, or a link to an unimportant page, is necessary, add a bunch of links to an important page to minimize the effect.

Since we are unlikely to get a definitive answer from Google, it is reasonable to assume that a page can cast only one vote for another page, and that additional votes for the same page are not counted.

When a page links to itself, is the link counted?

Again, the concept is that pages cast votes for other pages. Nothing is said in the original document about pages casting votes for themselves. The idea seems to be against the concept and, also, it would be another way to manipulate the results. So, for those reasons, it is reasonable to assume that a page can’t vote for itself, and that such links are not counted.

[TOP]

Dangling links

pagerank, page rank “Dangling links are simply links that point to any page with no outgoing links. They affect the model because it is not clear where their weight should be distributed, and there are a large number of them. Often these dangling links are simply pages that we have not downloaded yet……….Because dangling links do not affect the ranking of any other page directly, we simply remove them from the system until all the PageRanks are calculated. After all the PageRanks are calculated they can be added back in without affecting things significantly.” - extract from the original PageRank paper by Google’s founders, Sergey Brin and Lawrence Page.

A dangling link is a link to a page that has no links going from it, or a link to a page that Google hasn’t indexed. In both cases Google removes the links shortly after the start of the calculations and reinstates them shortly before the calculations are finished. In this way, their effect on the PageRank of other pages in minimal.

The results shown in Example 1 (right diag.) are wrong because page B has no links going from it, and so the link from page A to page B is dangling and would be removed from the calculations. The results of the calculations would show all three pages as having 0.15.

It may suit site functionality to link to pages that have no links going from them without losing any PageRank from the other pages but it would be waste of potential PageRank. Take a look at this example. The site’s potential is 5 because it has 5 pages, but without page E linked in, the site only has 4.15.

Link page A to page E and click Calculate. Notice that the site’s total has gone down very significantly. But, because the new link is dangling and would be removed from the calculations, we can ignore the new total and assume the previous 4.15 to be true. That’s the effect of functionally useful, dangling links in the site. There’s no overall PageRank loss.

However, some of the site’s potential total is still being wasted, so link Page E back to Page A and click Calculate. Now we have the maximum PageRank that is possible with 5 pages. Nothing is being wasted.

Although it may be functionally good to link to pages within the site without those pages linking out again, it is bad for PageRank. It is pointless wasting PageRank unnecessarily, so always make sure that every page in the site links out to at least one other page in the site.

[TOP]

导入链接

Inbound links (links into the site from the outside) are one way to increase a site’s total PageRank. The other is to add more pages. Where the links come from doesn’t matter. Google recognizes that a webmaster has no control over other sites linking into a site, and so sites are not penalized because of where the links come from. There is an exception to this rule but it is rare and doesn’t concern this article. It isn’t something that a webmaster can accidentally do.

The linking page’s PageRank is important, but so is the number of links going from that page. For instance, if you are the only link from a page that has a lowly PR2, you will receive an injection of 0.15 + 0.85(2/1) = 1.85 into your site, whereas a link from a PR8 page that has another 99 links from it will increase your site’s PageRank by 0.15 + 0.85(7/100) = 0.2095. Clearly, the PR2 link is much better - or is it? See here for a probable reason why this is not the case.

Once the PageRank is injected into your site, the calculations are done again and each page’s PageRank is changed. Depending on the internal link structure, some pages’ PageRank is increased, some are unchanged but no pages lose any PageRank.

It is beneficial to have the inbound links coming to the pages to which you are channeling your PageRank. A PageRank injection to any other page will be spread around the site through the internal links. The important pages will receive an increase, but not as much of an increase as when they are linked to directly. The page that receives the inbound link, makes the biggest gain.

It is easy to think of our site as being a small, self-contained network of pages. When we do the PageRank calculations we are dealing with our small network. If we make a link to another site, we lose some of our network’s PageRank, and if we receive a link, our network’s PageRank is added to. But it isn’t like that. For the PageRank calculations, there is only one network - every page that Google has in its index. Each iteration of the calculation is done on the entire network and not on individual websites.

Because the entire network is interlinked, and every link and every page plays its part in each iteration of the calculations, it is impossible for us to calculate the effect of inbound links to our site with any realistic accuracy.

[TOP]

导出链接

Outbound links are a drain on a site’s total PageRank. They leak PageRank. To counter the drain, try to ensure that the links are reciprocated. Because of the PageRank of the pages at each end of an external link, and the number of links out from those pages, reciprocal links can gain or lose PageRank. You need to take care when choosing where to exchange links.

When PageRank leaks from a site via a link to another site, all the pages in the internal link structure are affected. (This doesn’t always show after just 1 iteration). The page that you link out from makes a difference to which pages suffer the most loss. Without a program to perform the calculations on specific link structures, it is difficult to decide on the right page to link out from, but the generalization is to link from the one with the lowest PageRank.

Many websites need to contain some outbound links that are nothing to do with PageRank. Unfortunately, all ‘normal’ outbound links leak PageRank. But there are ‘abnormal’ ways of linking to other sites that don’t result in leaks. PageRank is leaked when Google recognizes a link to another site. The answer is to use links that Google doesn’t recognize or count. These include form actions and links contained in javascript code.

Form actions
A form’s ‘action’ attribute does not need to be the url of a form parsing script. It can point to any html page on any site. Try it.

Example:
<form name=”myform” action=”http://www.domain.com/somepage.html”>
<a href=”javascript:document.myform.submit()”>Click here</a>

To be really sneaky, the action attribute could be in some javascript code rather than in the form tag, and the javascript code could be loaded from a ‘js’ file stored in a directory that is barred to Google’s spider by the robots.txt file.

Javascript
Example: <a href=”javascript:goto(’wherever’)”>Click here</a>

Like the form action, it is sneaky to load the javascript code, which contains the urls, from a seperate ‘js’ file, and sneakier still if the file is stored in a directory that is barred to googlebot by the robots.txt file.

The “rel” attribute
As of 18th January 2005, Google, together with other search engines, is recognising a new attribute to the anchor tag. The attribute is “rel”, and it is used as follows:-

<a href=”http://www.domain.com/somepage.html” rel=”nofollow”>link text</a>

The attribute tells Google to ignore the link completely. The link won’t help the target page’s PageRank, and it won’t help its rankings. It is as though the link doesn’t exist. With this attribute, there is no longer any need for javascript, forms, or any other method of hiding links from Google.

[TOP]

So how much additional PageRank do we need to move up the toolbar?

First, let me explain in more detail why the values shown in the Google toolbar are not the actual PageRank figures. According to the equation, and to the creators of Google, the billions of pages on the web average out to a PageRank of 1.0 per page. So the total PageRank on the web is equal to the number of pages on the web * 1, which equals a lot of PageRank spread around the web.

The Google toolbar range is from 1 to 10. (They sometimes show 0, but that figure isn’t believed to be a PageRank calculation result). What Google does is divide the full range of actual PageRanks on the web into 10 parts - each part is represented by a value as shown in the toolbar. So the toolbar values only show what part of the overall range a page’s PageRank is in, and not the actual PageRank itself. The numbers in the toolbar are just labels.

Whether or not the overall range is divided into 10 equal parts is a matter for debate - Google aren’t saying. But because it is much harder to move up a toolbar point at the higher end than it is at the lower end, many people (including me) believe that the divisions are based on a logarithmic scale, or something very similar, rather than the equal divisions of a linear scale.

Let’s assume that it is a logarithmic, base 10 scale, and that it takes 10 properly linked new pages to move a site’s important page up 1 toolbar point. It will take 100 new pages to move it up another point, 1000 new pages to move it up one more, 10,000 to the next, and so on. That’s why moving up at the lower end is much easier that at the higher end.

In reality, the base is unlikely to be 10. Some people think it is around the 5 or 6 mark, and maybe even less. Even so, it still gets progressively harder to move up a toolbar point at the higher end of the scale.

Note that as the number of pages on the web increases, so does the total PageRank on the web, and as the total PageRank increases, the positions of the divisions in the overall scale must change. As a result, some pages drop a toolbar point for no ‘apparent’ reason. If the page’s actual PageRank was only just above a division in the scale, the addition of new pages to the web would cause the division to move up slightly and the page would end up just below the division. Google’s index is always increasing and they re-evaluate each of the pages on more or less a monthly basis. It’s known as the “Google dance”. When the dance is over, some pages will have dropped a toolbar point. A number of new pages might be all that is needed to get the point back after the next dance.

The toolbar value is a good indicator of a page’s PageRank but it only indicates that a page is in a certain range of the overall scale. One PR5 page could be just above the PR5 division and another PR5 page could be just below the PR6 division - almost a whole division (toolbar point) between them.

[TOP]

Tips

Domain names and Filenames

To a spider, www.domain.com/, domain.com/, www.domain.com/index.html and domain.com/index.html are different urls and, therefore, different pages. Surfers arrive at the site’s home page whichever of the urls are used, but spiders see them as individual urls, and it makes a difference when working out the PageRank. It is better to standardize the url you use for the site’s home page. Otherwise each url can end up with a different PageRank, whereas all of it should have gone to just one url.

If you think about it, how can a spider know the filename of the page that it gets back when requesting www.domain.com/ ? It can’t. The filename could be index.html, index.htm, index.php, default.html, etc. The spider doesn’t know. If you link to index.html within the site, the spider could compare the 2 pages but that seems unlikely. So they are 2 urls and each receives PageRank from inbound links. Standardizing the home page’s url ensures that the Pagerank it is due isn’t shared with ghost urls.

Example: Go to my UK Holidays and UK Holiday Accommodation site - how’s that for a nice piece of link text ;). Notice that the url in the browser’s address bar contains “www.”. If you have the Google Toolbar installed, you will see that the page has PR5. Now remove the “www.” part of the url and get the page again. This time it has PR1, and yet they are the same page. Actually, the PageRank is for the unseen frameset page.

When this article was first written, the non-www URL had PR4 due to using different versions of the link URLs within the site. It had the effect of sharing the page’s PageRank between the 2 pages (the 2 versions) and, therefore, between the 2 sites. That’s not the best way to do it. Since then, I’ve tidied up the internal linkages and got the non-www version down to PR1 so that the PageRank within the site mostly stays in the “www.” version, but there must be a site somewhere that links to it without the “www.” that’s causing the PR1.

Imagine the page, www.domain.com/index.html. The index page contains links to several relative urls; e.g. products.html and details.html. The spider sees those urls as www.domain.com/products.html and www.domain.com/details.html. Now let’s add an absolute url for another page, only this time we’ll leave out the “www.” part - domain.com/anotherpage.html. This page links back to the index.html page, so the spider sees the index pages as domain.com/index.html. Although it’s the same index page as the first one, to a spider, it is a different page because it’s on a different domain. Now look what happens. Each of the relative urls on the index page is also different because it belongs to the domain.com/ domain. Consequently, the link stucture is wasting a site’s potential PageRank by spreading it between ghost pages.

Adding new pages

There is a possible negative effect of adding new pages. Take a perfectly normal site. It has some inbound links from other sites and its pages have some PageRank. Then a new page is added to the site and is linked to from one or more of the existing pages. The new page will, of course, aquire PageRank from the site’s existing pages. The effect is that, whilst the total PageRank in the site is increased, one or more of the existing pages will suffer a PageRank loss due to the new page making gains. Up to a point, the more new pages that are added, the greater is the loss to the existing pages. With large sites, this effect is unlikely to be noticed but, with smaller ones, it probably would.

So, although adding new pages does increase the total PageRank within the site, some of the site’s pages will lose PageRank as a result. The answer is to link new pages is such a way within the site that the important pages don’t suffer, or add sufficient new pages to make up for the effect (that can sometimes mean adding a large number of new pages), or better still, get some more inbound links.

[TOP]

Miscellaneous

The Google toolbar
If you have the Google toolbar installed in your browser, you will be used to seeing each page’s PageRank as you browse the web. But all isn’t always as it seems. Many pages that Google displays the PageRank for haven’t been indexed in Google and certainly don’t have any PageRank in their own right. What is happening is that one or more pages on the site have been indexed and a PageRank has been calculated. The PageRank figure for the site’s pages that haven’t been indexed is allocated on the fly - just for your toolbar. The PageRank itself doesn’t exist.

It’s important to know this so that you can avoid exchanging links with pages that really don’t have any PageRank of their own. Before making exchanges, search for the page on Google to make sure that it is indexed.

Sub-directories
Some people believe that Google drops a page’s PageRank by a value of 1 for each sub-directory level below the root directory. E.g. if the value of pages in the root directory is generally around 4, then pages in the next directory level down will be generally around 3, and so on down the levels. Other people (including me) don’t accept that at all. Either way, because some spiders tend to avoid deep sub-directories, it is generally considered to be beneficial to keep directory structures shallow (directories one or two levels below the root).

ODP and Yahoo!
It used to be thought that Google gave a Pagerank boost to sites that are listed in the Yahoo! and ODP (a.k.a. DMOZ) directories, but these days general opinion is that they don’t. There is certainly a PageRank gain for sites that are listed in those directories, but the reason for it is now thought to be this:-

Google spiders the directories just like any other site and their pages have decent PageRank and so they are good inbound links to have. In the case of the ODP, Google’s directory is a copy of the ODP directory. Each time that sites are added and dropped from the ODP, they are added and dropped from Google’s directory when they next update it. The entry in Google’s directory is yet another good, PageRank boosting, inbound link. Also, the ODP data is used for searches on a myriad of websites - more inbound links!

Listings in the ODP are free but, because sites are reviewed by hand, it can take quite a long time to get in. The sooner a working site is submitted, the better. For tips on submitting to DMOZ, see this this DMOZ article.

评论和建议
欢迎评论和建议. 如果你有想法请 到论坛发帖子

延伸和资源

 

 

  • 另一篇 PageRank 解释 文章 (by Ian Rogers): here
  • 网络营销文章, 提示, 计策和秘密: here
  • 作者: 福瑞
    原载: 三七魔域
    版权声明:转载请以链接的形式注明作者及原出处,并保留本版权信息,严禁一切非法复制。

    原文参见:原文地址

    学习如何通过有效的内部链接获得更高的PR并躲避GOOGLE惩罚

    最近有一些客户问我,当把特定的长尾关键词做为目标的时候,如何增加特定的子页面的权重 。我也被问到一些页面被降权或者成为补充材料的原因。所有这些情况的原因基本都是因为,这些站点大多有几万甚至几十万的页面,而其中很多页面都被深深地埋在了站点深处。为了进入这些深层页面,一个人必须要点击数次,才能到达这个只有一个内部反向连接的页面。象这样的页面往往不会有很高的权重,并且经常被GOOGLE作为补充材料收录。

    为了分析深层页面的人气,我会:

    1、我首先会先看这个网站有多少页面被GOOGLE索引,访问GOOGLE并输入(举例): site:mysite.com 这个命令可以让你了解,在GOOGLE眼睛里你的网站是由多少个页面构成的,

    2、第二步,我会去YAHOO!检查有多少个内页连接连接到这个特定的关键词,在YAHOO输入(举例):

    link:http://www.mysite.com/widgets/blue/bright/small/id=373 site:mysite.com

    这会显示有多少个内页连接连接到特定的页面,同时也会标记出主页(权重最高的页面) 连接到页面的问题,并显示从主页到达这个页面需要进行几次点击(这个页面的PR值是多少?)(我刚才说了PR值吗?)

    3、下一步,我会观察是否有外部连接直接连接到这个页面上。要做这些,访问YAHOO,并使用LINK命令加上你的页面的完整URL(举例):

    link:http://www.mysite.com/widgets/blue/bright/small/id=373 -site:mysite.com

    这会显示有多少其他网站的其他页面链接到了这个页面,如果有结果,说明还有其他网站的页面链接到了这个特定页面,问题是,不过你应该逐一点击这些页面,以确认这些页面都有被GOOGLE缓存。就在几天前,我观察我们的两个链接,其中一个有92个来自其他网站的反向链接链接到这个特定的页面,但是当我检查这些反向链接的页面时,他们全部是没有被GOOGLE缓存的MFA(什么是MFA)和scrapers(什么是scrapers);另一个站点(译者认为应该笔误,估计是“页面”)有52个反向链接,其中1-10的链接是真实的,其他的全部是MFA(什么是MFA)和scrapers(什么是scrapers)(但他至少有10个来自其他真实页面的真实投票)。

    所以举例来说,如果一个站点有10,000个页面被索引,并且一个特定的页面(就象上面的)只有10个内部链接, 并且没有任何外部链接链接到他,他基本上是在告诉搜索引擎这个页面没有什么重要性。(低权重)

    从其他站点拥有外部反向链接或许比所有内部链接捆绑起来还要重要…但是如果你有10,000个页面,让每个单独的页面都有一个来自其他站点的反向链接是非常困难的。在这种情况下,所有你能得到的只是你的内部链接结构,所以,把它尽可能地做好是一件非常重要的事情。

    如果让你的内部链接结构更强大:

    开始,你需要一个对PR工作机制的基本理解(打断一下,我刚才又说了PR了吗?)

    这有一个有点老但是很不错的页面,告诉你一个用简单英文写的非常长而且复杂的PR如何工作的故事。

    你是数学狂人,这就是PR的公式:

    PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
    PR= PageRank
    d= 抑制因素(~0.85)
    c = 页面上的链接数量
    PR(T1)/C(T1) = 页面1的PR值除以页面1上的链接数目 (被传递的PR值)

    下面是一个简单的方法来看看这个:

    如果你有1000个外部站点链接到你的主页,那么你的主页就有了一定数量的权重。每个你主页上的导出链接权重被平均分配到导出首页的链接对象上。所以举例来说,如果你的首页有100个外链,每个外链则只得到了1/100的权重传递,但是如果你的主页只有10个链接,则那每个链接则会得到1/10的权重传递。

    所以如果你有一个页面仅仅只是能够到达(从首页开始),进入专集列表,然后进入蓝色专集列表,然后进入浅蓝色专集列表,然后才能找到特定的页面…好,那就没剩多少果汁(PR)给你的特定页面喝了,因为在传递的过程中很多的PR权重分流了,所以,这个页面拥有一个好的PR的机会是低的。

    优化你的内部链接结构的解决方案:

    1、第一步一般是依靠关键词搜索。回到wordtracker或者keyword discovery(两种关键词工具),或者更好的,你的PPC(什么是PPC)或者ROI(投资回报率)统计数据.定义你的目标关键词。

    2、然后我会着手制作一张列表,一张基于搜索量和投资回报率,对你来说重要性高低的短语列表,最重要的页面应该给予尽可能多的内部链接(你能达到的跟站点范围一样多)。如果你有20个非常重要的页面,并且这些页面从任何页面都能得到链接,包括主页,那么这些页面就会得到你所有页面中最重要的权重,(我也看到过一些人禁止了他们的版权页面或者其他非重要的页面)

    3、通过交叉链接得到更多的内部链接。看下我的案例URL:

    http://www.mysite.com/widgets/blue/bright/small/id=373

    如果这个页面只有1个内部反向链接(服装 ->男士服装–>休闲男士服装 –> 杰克琼斯休闲男士服装) ,那他将很难那么幸运地被任何搜索显示出来。

    但是如果所有你的单独产品页面互相链接起来,那么至少每个独立的页面会有一些内部链接。举个例子,你在 “服装 ->男士服装–>休闲男士服装 –> 杰克琼斯休闲男士服装” 最底层添加了第373个页面,你把目标页链接到所有“服装 ->男士服装–>休闲男士服装 –> 杰克琼斯休闲男士服装”下的其他页面,那么每个页面将会至少有少量的内部反向链接。你甚至可以将所有“休闲男士服装” 的文章用关联交叉链接的方法全部链接起来,那么,现在这些单独页面每个至少有了成百上千个内部链接,这相对原来老的只有一个链接的方法,会增加单独目标页面的PR权重(如果只是对部分章节进行操作(不是所有章节))。这也会赋予现在拥有成百上千的内部链接的终端产品页面以价值。一个拥有更多这种交叉反向链接的页面往往会击败一个只有1个或几个内部链接的页面。4、根据PR,你的权重体现在,链接向你的网站自身权重分配给你的那部分,所以发现你的权重存在你网站的哪个地方,并且保证从这里流动通畅。

    一个非常好的检查你的站点哪些页面拥有权重的工具,是利用 powerful subpage tool.观察那些有很多内部链接的页面,并且保证将那些拥有很多外部链接的页面(拥有最高权重的页面取决于外部链接的数量),链接到了很多重要页面。

    5、确定内部类别站点地图的主要页面,并且尝试去链接到那些“结点”,这样才能让你的终端页面更加接近那些拥有反向链接的页面。举例来说,如果你卖进口轴承,并且你有5个主要目录页面,现在这些目录页面应该有大量的内部链接,而且链接应该更多地集中在那些区域(更加密切联系到终端页面,以帮助传递PR)。
    最后,你刚才或许不得不“牺牲”一些页面去GOOGLE的补充材料,或者去接受一个页面因为缺乏权重传递而被深埋……但是如果你明智地选择那些你希望传递权重的目标页面,并且正确地调整你的内部链接结构,你的重要页面,甚至一个你的非常大的长尾也能拥有最高的权重(从而最具有排名潜力)。

    译者:其实上面的文章说简单一些的意思,大概就是增加交叉链接,并且把一些你认为重要的内容或者关键词,尽量增加其出现的频率。我自己也进行了实际操作的实验:在百度上找到一个指数约150左右的一个关键词“宋平顺情妇照片”(为了增加实验的速度,没有选难度大的词),并在我的网站“全天津”上加了一篇同名文章,大概5天后(具体几天记不得了),百度和GOOGLE都收录了我网站上的这篇文章,百度排名第24位,GOOGLE排名在20多位左右;然后,我在“全天津”头部公用文件的搜索框右侧添加了“宋平顺情妇照片”这个关键词,并链接到了这篇文章,这样相当于把整个网站都加了一个指向这篇文章的链接,大概用了不到一个星期,百度排名没有变化(这可能与我的网站被百度降权有关,因为这篇文章一直没有以独立的题目出现在百度的搜索结果里),GOOGLE上升为第5位,而且YAHOO升到了第3位,而这期间,我没有对这个词做过任何的外部链接,这也充分说明了内部链接结构的重要性所在。所以说从实验效果看,这种优化手段相对国外的搜索引擎是相对有效的,当然百度近期一直反常的表现也有可能导致了实验结果的误差,供大家参考。