librelist archives

« back to archive

mails are not indexed my google

mails are not indexed my google

From:
deepak kannan
Date:
2011-03-25 @ 13:42
hi,
http://librelist.com/browser/ does not provide search so i tried using
google search.
But librelist.com/browser is not indexed by google.

To replicate, search "site:http://librelist.com/browser/homebrew" on google.
Maybe this is because http://librelist.com/robots.txt is empty, having
"Allow: All" would help.

cheers,
deepak

Re: mails are not indexed my google

From:
Dylan Grose
Date:
2011-04-01 @ 04:39
On 25/03/11 09:42 AM, deepak kannan wrote:
> hi, http://librelist.com/browser/ does not provide search so i tried
> using google search. But librelist.com/browser
> <http://librelist.com/browser> is not indexed by google.
> 
> To replicate, search "site:http://librelist.com/browser/homebrew" on > google.

The librelist archives are in fact indexed by Google, the problem is
simply that the hyperlinks to individual messages on the mailing list
archive pages contain an extra forward slash before the mailing list
name, and therefore Google archives the messages using that extra slash
in their URLs. This has the side-effect that site-specific searches on
Google fail if that second slash is not added in the site specification.
Try using "site:librelist.com/browser//homebrew" to search for a
specific message. Of course, those hyperlinks shouldn't contain that
extra slash.

> Maybe this is because http://librelist.com/robots.txt is empty,
> having "Allow: All" would help.

That isn't necessary, Google's indexing bot indexes as much as it can
and wouldn't index only if there is a rule blocking it.

Dylan

Re: mails are not indexed my google

From:
Eric Wong
Date:
2011-03-25 @ 16:40
deepak kannan <kannan.deepak@gmail.com> wrote:
> hi,
> http://librelist.com/browser/ does not provide search so i tried using
> google search.
> But librelist.com/browser is not indexed by google.

I've found this an issue, too.  However I mirror lists I create to
gmane.org anyways for redundancy and those seem to get indexed.

> To replicate, search "site:http://librelist.com/browser/homebrew" on google.
> Maybe this is because http://librelist.com/robots.txt is empty, having
> "Allow: All" would help.

I figured it was an issue with over-reliance on AJAX/JavaScript, but I
hate fancy web UI stuff in general so I'm quick to blame it :)

-- 
Eric Wong

Re: mails are not indexed my google

From:
deepak kannan
Date:
2011-03-26 @ 18:46
hi,
other than the robots.txt, a sitemap.xml would be needed, so that google
knows what pages to index.
it does seems old skool to give a separate file for following links in a
hypertext document.

This page gives some details on how to generate a sitemap for a large site.
http://dynamical.biz/blog/seo-technical/sitemap-strategy-large-sites-17.html

overall this is the process,
http://www.wikihow.com/Get-Your-Website-Indexed-by-Google

The robots.txt would help so that good bots are cleared to browse the site,
bad bots would not respect them anyways.
did not search for python but ruby has this
https://github.com/alexrabarts/big_sitemap and others are rails/merb plugins

Maybe we can submit the url on "google webmaster tools" and generate the
robots.txt. This is just to check if google-bot crawls the site when
robots.txt says to allow all.

http://www.google.com/support/webmasters/bin/answer.py?answer=156449

please if i can patch something let me know.

cheers,
deepak

On Fri, Mar 25, 2011 at 10:10 PM, Eric Wong <normalperson@yhbt.net> wrote:

> deepak kannan <kannan.deepak@gmail.com> wrote:
> > hi,
> > http://librelist.com/browser/ does not provide search so i tried using
> > google search.
> > But librelist.com/browser is not indexed by google.
>
> I've found this an issue, too.  However I mirror lists I create to
> gmane.org anyways for redundancy and those seem to get indexed.
>
> > To replicate, search "site:http://librelist.com/browser/homebrew" on
> google.
> > Maybe this is because http://librelist.com/robots.txt is empty, having
> > "Allow: All" would help.
>
> I figured it was an issue with over-reliance on AJAX/JavaScript, but I
> hate fancy web UI stuff in general so I'm quick to blame it :)
>
> --
> Eric Wong
>

unsubscribe

From:
Shane Becker
Date:
2011-03-25 @ 16:21
unsubscribe
On Mar 25, 2011, at 6:42 AM, deepak kannan wrote:

> hi,
> http://librelist.com/browser/ does not provide search so i tried using 
google search.
> But librelist.com/browser is not indexed by google. 
> 
> To replicate, search "site:http://librelist.com/browser/homebrew" on google.
> Maybe this is because http://librelist.com/robots.txt is empty, having 
"Allow: All" would help.
> 
> cheers,
> deepak


--
Shane Becker
(801) 898-9481
http://iamshane.com
@veganstraightedge



Re: mails are not indexed my google

From:
Zed A. Shaw
Date:
2011-03-31 @ 13:26
On Fri, Mar 25, 2011 at 07:12:41PM +0530, deepak kannan wrote:
> hi,
> http://librelist.com/browser/ does not provide search so i tried using
> google search.
> But librelist.com/browser is not indexed by google.

Hmm, ok I'll check that out later.



-- 
Zed A. Shaw
http://zedshaw.com/