Thursday 26 January 2012

POP



POP , otherwise known as Post office protocol , is a protocol used by the email-clients to retrieve emails from a remote server over a TCP/IP connection.

let us try retrieving mail from my gmail account without using any client ,starting from scratch.

gmail uses POP and IMAP , Let us for the time being concentrate on POP.

wikipedia page on POP gives information which reads

"A POP3 server listens on port 110. Encrypted communication for POP3 is either requested after protocol initiation, using the STLS command, if supported, or by POP3S, which connects to the server using secure sockets layer (SSL) on well-known TCP port 995 (e.g. google Gmail)."

so I need to connect using SSL protocol( this is needed because gmail has SSL security layer protection)

The openssl command does this job . By the way i, need to create a client where i can view the mails . This is also done by passing s_client as argument for this openssl command

$openssl s_client

wait ,dont execute this command in command line, the command is not yet over,
we need to connect it to the gmail POP server ( gmail server that uses POP protocol) , so

$openssl s_client -connect pop.gmail.com:995

we can get the port number detail from wikipedia information i quoted above.

so now ,the task that we have completed so far are

1) client has been created

2) client communictaes using SSL ( this is needed because gmail has SSL security layer protection)

3) We have established connection between the client the POP gmail server

thats all, the rest is what we do in the GUI ....

Just give username , password and then read the mail
So the next thing that we need to type is shown in color

user <type your user name here>

And Another Kind Request :
Chase those fellows standing near by , because the password is exposed as you type.

pass <type your user password here>
STAT
LIST
retr <type the mail number of the mail you wanted to read by seeing the list dispalyed>


after reading the mail ,you can quit and signout using command

quit

Thats all I know about POP .

Bye ,

Harish Kayarohanam

Tuesday 10 January 2012

Apache

Apache in computer technology is defined as a web server software.

Taking it for granted that Apache has already been installed in a ubuntu installed system , the procedure for creating a module and loading it in apache is as follows ,


$ cd /etc/apache2

This is the place were we have the .c version of the newly created modules
now create a module ( here  I have given a name to the module as pracmod3)

$ sudo apxs2 -g -n "pracmod3"

so we have created the module.
now get into the module folder pracmod3 by normal cd command

$ cd pracmod3

Now we have to compile the .c file mod_pracmod3.c

$ sudo apxs2 -c -i mod_pracmod3.c

Now mod_pracmod3.so has been created . This is the dynamic shared object . This is similar to dll files in windows.
Now its our job to load into the apache and this is done by the following sequence of steps

$ cd ..


then open the mod_pracmod3.c and copy that which follows the lines (shown in red here )"Then activate it in Apache's apache2.conf file for instance
for the URL /pracmod3 in as follows: 
#   apache2.conf"
the content to be copied will be as follows

LoadModule pracmod3_module modules/mod_pracmod3.so
<Location /pracmod3>
SetHandler pracmod3


$ sudo vim apache2.conf

now paste the copied content below the line (shown in red here) Include sites-enabled/

now change the path of the file pracmod3.so , to the place where it is there in your system .
In my system it was in 
/usr/lib/apache2/modules

now we have to restart the apache as 

$ sudo apachectl restart

Now , our job is over .
To check whether our module has been loaded properly. check if it displays the content that was in "The sample content handler " portion of mod_pracmod3.c 

to do that type command
$ lynx -mime_header http://localhost/pracmod3 


the result is seen as (shown in red here )
HTTP/1.1 200 OK
Date: Tue, 10 Jan 2012 05:25:55 GMT
Server: Apache/2.2.20 (Ubuntu)
Content-Length: 62
Connection: close
Content-Type: text/html

The sample page from mod_pracmod3.c and it was done by harish




Thank You .









Wednesday 4 January 2012

Screen Scraping

As soon as I saw the word , in the training agenda sent by my educator, I felt that it is something related to   " Discarding a part of the screen as useless " , as i thought that the word scrap means waste material .
Yes I am in a way correct that scrap means waste .But ...................................................................    then when I started studying about this concept, I found that it is scraping ,,, which comes from the word scrape that means "remove from something " .
Then I understood that this topic deals with "EXTRACTING INFORMATION FROM THE SCREEN ".

I took a website  http://money.livemint.com and tried to extract the eps field information .
I wanted to learn python , so thought of doing this task in python itself .


import urllib
import re
def eps():
    base_url = 'http://money.livemint.com/IID42/F132540/QuickQuote/Company.aspx'
    content = urllib.urlopen(base_url).read()
    me = re.search(r'EPS\s*\(Rs\.\)<.*?><.*?>\s*<.*>\s*\d*\.\d*\s*<.*>', content)
    eps = me.group()
    ma = re.search(r'\d+\.\d+', eps)
    if ma:
                 epse = ma.group()
    else:
                 epse = 'no match available : '
    return epse  
   

This does the screen scraping .....

I wanted to give some color and fragrance for this code so that It makes sense ( the deep seated hidden motive is to make the code in such a way so that the user of our website, may not even get the very thought that the data has been scraped from somewhere )

import urllib
import re


def get_eps():
   
    baseone_url = 'http://money.livemint.com/IID42/'
    basethree_url = '/QuickQuote/Company.aspx'
    symbol = input('Enter the company name and should be one among TCS , INFOSYS ,HCL , WIPRO  : ');
    tcs = "TCS"
    infe = "INFOSYS"
    hcl = "HCL"
    wip = "WIPRO"
    if symbol == tcs :
        code = "F132540"
        content = urllib.urlopen(baseone_url + code + basethree_url).read()
   
        me = re.search(r'EPS\s*\(Rs\.\)<.*?><.*?>\s*<.*>\s*\d*\.\d*\s*<.*>', content)
        eps = me.group()
   
        ma = re.search(r'\d+\.\d+', eps)
        if ma:
                 epse = ma.group()
                 print 'EPS is ' + epse
        else:
                 epse = 'no match available : '
                 return epse
    elif symbol == infe :
        code = "F100209"
        content = urllib.urlopen(baseone_url + code + basethree_url).read()
   
        me = re.search(r'EPS\s*\(Rs\.\)<.*?><.*?>\s*<.*>\s*\d*\.\d*\s*<.*>', content)
        eps = me.group()
   
        ma = re.search(r'\d+\.\d+', eps)
        if ma:
                 epse = ma.group()
                 print 'EPS is ' + epse
        else:
                 epse = 'no match available : '
                 return epse
    elif symbol == wip :
        code = "F107685"
        content = urllib.urlopen(baseone_url + code + basethree_url).read()
   
        me = re.search(r'EPS\s*\(Rs\.\)<.*?><.*?>\s*<.*>\s*\d*\.\d*\s*<.*>', content)
        eps = me.group()
   
        ma = re.search(r'\d+\.\d+', eps)
        if ma:
                 epse = ma.group()
                 print 'EPS is ' + epse
        else:
                 epse = 'no match available : '
                 return epse
    elif symbol == hcl :
        code = "F132281"
        content = urllib.urlopen(baseone_url + code + basethree_url).read()
   
        me = re.search(r'EPS\s*\(Rs\.\)<.*?><.*?>\s*<.*>\s*\d*\.\d*\s*<.*>', content)
        eps = me.group()
   
        ma = re.search(r'\d+\.\d+', eps)
        if ma:
                 epse = ma.group()
                 print 'EPS is ' + epse
        else:
                 epse = 'no match available : '
                 return epse
    else:
        print "Enter a valid company name"
 
 This code gets input from the user and searches the page that belongs to that company and displays the scrapped scraped data .

For your info : Web Scraping from a multitude of sites is known as WEB HARVESTING .

Thank You ,

Meet You in next post ,

Harish Kayarohanam