TechiWarehouse.Com

Programming Languages


Computer Training Excuses - No ExcuseYou wouldn't go to a doctor who had never been to medical school, or hire a lawyer who never studied law. One side-effect of a world advancing as rapidly as ours is that fields are becoming more and more specialized and narrow. People can no longer get by on general knowledge in their careers, something I found out for myself not too long ago. I'd been out of high school for two years, scraping by on my own and picking up scraps of programming as I went. I saw all of the self-taught programmers breaking into the IT industry, and I hoped to do the same. After all, IT is one of the few industries out there where being creative and a quick learner is more important than a degree.
Divider Line

Which is the Easiest Programming Language? - Programming LanguagesIf you have some misconception in mind that the programming languages are easy or hard, I am afraid to say that you are mistaken. To tell you frankly about the programming languages, there is nothing such as easy or tough programming language. The only thing that you need to keep in mind while stepping into the world of programming languages is that which programming language would set your base by giving you in-depth knowledge of programming logics and the basics of programming techniques.
Divider Line

UNIX Programming Environment - UNIX-Programming-EnvironmentCaution: This tutorial is HUGE. Please allow ample time to load.
If you are coming to unix for the first time, from a Windows or MacIntosh environment, be prepared for a rather different culture than the one you are used to. UNIX is not about `products' and off-the-shelf software, it is about open standards, free software and the ability to change just about everything.

Divider Line

Entering into Programming World - Good Luck ProgrammingOnce you have learned some programming languages, you feel yourself completely prepared to enter into the programming world, and start working on the programming languages that you have learned. Many students also feel that they have learned extra things which might be helpful in their programming careers. The matter of fact is that the real world is entirely different from what you have been taught in the classrooms.
Divider Line

Best of the Web Hosting Show Guides - Those Arent the Moving PantsWhen moving from house to house, you have to pack up all of your belongings in the old house before you can move, right? The same could be said when you are moving from web host to web host. The first and most important thing you can do is backup your web site and get it ready for the move. Remember to grab all of your static files. This would be all of your non-dynamic pages, images, templates and more. The exact files that you do backup might change depending on how your site is setup.
Divider Line

Ten Perl Myths - Ten Perl MythsOne of the things you might not realize when you're thinking about Perl and hearing about Perl is that there is an awful lot of disinformation out there, and it's really hard for someone who's not very familiar with Perl to separate the wheat from the chaff, and it's very easy to accept some of these things as gospel truth - sometimes without even realising it. What I'm going to do, then, is to pick out the top ten myths that you'll hear bandied around, and give a response to them. I'm not going to try to persuade you to use Perl - the only way for you to know if it's for you is to get on and try it - but hopefully I can let you see that not all of what you hear is true.
Divider Line

Why Programmers Love the Night - Programming at night One famous saying says that programmers are machines that turn caffeine from coffee and Coca Cola into a programming code. And if you ask a programmer when does he like working the most and when is he most productive - he is probably going to say late at night or early in the morning, that is when he has the greatest energy and concentration to work.
Divider Line

How to be a Programmer - Programing training To be a good programmer is difficult and noble. The hardest part of making real a collective vision of a software project is dealing with one's coworkers and customers. Writing computer programs is important and takes great intelligence and skill. But it is really child's play compared to everything else that a good programmer must do to make a software system that succeeds for both the customer and myriad colleagues for whom she is partially responsible. In this essay I attempt to summarize as concisely as possible those things that I wish someone had explained to me when I was twenty-one.
Divider Line

TW Tech Glossary - Misplaced your bible? Well here it is - Tech Glosasary! This truly took a while to complete and should be used by all from beginners to advance techies.
Divider Line

What Programming Language To Learn - Programming LanguagesOne of the most common questions we hear from individuals hoping to enter the IT industry is, "What programming languages do I need to know?" Obviously this is a complex question, and the answer will depend on what field the questioner is going into. Many programming languages are taught during courses used to obtain a computer information science degree. However, those already in IT know that the greatest skill you can have is to be a jack-of-all-trades. A well-prepared worker can switch between computer programming jobs with only minimal training, thanks to a wide knowledge of multiple programming languages.
Divider Line

What is CGI?

CGI stands for Common Gateway Interface, a specification for transferring information between a World Wide Web server and a CGI program. A CGI program is any program designed to accept and return data that conforms to the CGI specification. The program could be written in any programming language, including C, Perl, Java or Visual Basic.

CGI programs are the most common way for Web servers to interact dynamically with users. Many HTML pages that contain forms, for example, use a CGI program to process the form's data once it's submitted. Another increasingly common way to provide dynamic feedback for Web users is to include script or programs that run on the user's machine rather than the Web server. These programs can be Java applet, Java scripts, or ActiveX controls. The use of CGI is a server-side solution because the processing occurs on the Web server.

One problem with CGI is that each time a CGI script is executed, a new process is started. For busy Web sites, this can slow down the server noticeably.


Tips

How to direct a browser to display a different HTML page

This is actually very simple to do in a CGI script. Instead of the usual header

Content-type: text/html

make your script print this

Location: URL to display

Don't forget the blank line afterwards.

How to write a No-Parse-Header script

When a server is returning to a browser the output from a script, it normally adds the elements of the header not supplied by the script. So by the time it reaches the browser, the Content-type header line output by your script is accompanied by header lines specifying information such as the date, the status code, and the server type.

If you want to prevent this happening, so your script can control the entire header, you need to create a no-parse-header script. It is amazingly simple to do this. All you need to do for most servers is make sure the name of your script starts with.

How to tell the browser to do nothing

You need to create a no-parse-header script. This script should return the header

HTTP/1.0 204 No Response

Browsers that handle this header correctly will do nothing. The MS Internet Explorer is one browser that does not handle this correctly; it displays a message to the user to say that the link "did not have a target".

Putting the Script and Form in One File

Having your web form and the corresponding CGI script in separate files in separate directories often makes them more difficult to manage if you have a lot of them. It would help to be able to put the script and the associated web pages into one script file (at the expense of a larger file). But if they are in a single script file, how will the script know whether to return the initial web page with the form, or to use the corresponding script to process the form data?

The answer lies in the ReadParse function, which knows, when a browser accesses the script, whether any form data is being supplied. If not, you can presume that the browser is accessing the script for the first time, and wants the web form. If the browser is sending form data, then it has already received the web form, has filled it out and wants the data to be processed by the script.

ReadParse knows whether it has received any form data and it tells you so. If there was form data, it returns the number of characters in the CGI bundle.

 

Testing File Names

When your script writes to a new file, you probably want it to create a new and unique name for the new file, one that doesn't conflict with any existing files, which would be overwritten. One way to create a new file name that's unique is to incorporate the process id and the time into the name. Perl's special variable, $$ returns the current pid and $^T returns the time (in seconds since 1970). So you could use something like $filename = "$$" "$^T" . ".html"; Neither alone will guarantee uniqueness since there are only a finite number of process id's, which are recycled, and your script could have been accessed twice within the same second.

This results in ugly filenames, something like "83498127310497.html". If you have prettier names that you insist on, you can test for the existence of a file with the proposed new name, using Perl's -e operator. -e $file_name is true if a file already exists with that name. In the example below, the variable $text holds some key text taken from the contents of the file that we want to use in the name. We're also assuming that the script is constructing a web page, so we add ".html" to the end of the name.

$file_name = $your_chosen_dir . $text ".html";if ( -e $file_name ) {## do something to make it different, like## substitute pidtime.html for html at the end$file_name =~ s/html$/$$$^T\.html/;

Redirecting to Another Page

Most CGI scripts will process form data and then construct and return an acknowledgement page to the browser. To return a web page, a CGI script must first send an HTTP header to the server signaling that a web page is about to follow:

print "Content_type: text/html\n\n"; # send HTTP header
print " $html_source_for_the_web_page "; # send the page

There are a number of other HTTP headers availabl e, however, including the Location: header which redirects the browser to another page. If you have a web page prepared in another file, your CGI script needn't read it in and then print it to the server. After processing the form data however you wish, return the Location: some_url header for that file. The URL may be absolute or relative (to your script).

You could use it in your script something like this:

&ReadParse; # read the form data into %inif ( $in{'some_variable_name'} eq 'first_choice' ) {print "Location: first_url.html\n\n";} elsif ( $in{'some_variable_name'} eq 'second_choice' ) {print "Location: second_url.html\n\n";} else ( $in{'some_variable_name'} eq 'third_choice' ) {print "Location: third_url.html\n\n";}exit;

This hard codes the URLs into the script. You can write a more general purpose script by placing them into the web form as data, perhaps as hidden variables if you don't wish to bother the user with them. In that case you might use a locution like:

if ( $in{'some_variable_name'} eq 'first_choice') {print "Location: $in{'first_url'}\n\n";}

 where there were tags in the web form that looked like

This technique was used in the web2mail anonymous web form remailer.

In general, you can't customize on the fly the web page to which you redirect, unless you are redirecting to another CGI script, in which case you might as well customize the page from the original script. So use this technique when you want to return a static web page, though you can return different pages depending on the form data that was supplied.

CGI Security

There are important security issues when mailing or calling any other program from a CGI script. But CGI security is deep magic, far beyond the scope of this tutorial. A collection of documents discussing security is available on the Web. I can only give a brief example here.

In general, you should never write scripts that allow a user's form data to be executed on your system. The most obvious example might be something like

exec "$in{message}";

This would allow a browser to execute commands on your system, whatever was submitted through the variable named message on your web form. (Perhaps rm -rf?) Perl has some built-in safeguards against this (TaintPerl), as do most web servers, though they are not perfect and can sometimes be circumvented by crafty web surfers.

As a more devious and realistic example, suppose your mail program is mail and you put the recipient on the command line:

$recipient = $in{email_address};open(MAIL, "|mail $recipient");


If the browser supplied her address as "nobody ; rm -rf", the second command might be executed after the mail program completed. (Recent versions of sendmail have safeguards against this sort of spoofing.)

So what can you do?

  • Realize you have no control over what form data is passed to your script, and anyone can bypass your form and access your script directly. All they need to do is point their own form at it.
  • Study the security documents linked above. These are technical issues, but they make for morbidly interesting reading.
  • You should be reasonably safe if you don't execute any other scripts (including mail or other CGI scripts) in your code. (This is the kind of sweeping statement that often proves wrong, so I can't guarantee it.)
  • Sanitize any form data that you pass to other scripts that you must execute. s/\W//g; will remove all nonalphanumeric characters from a variable, including punctuation (*.;'/). Even better would be to accept only a pre-determined list of possible answers.

I believe the code in this document (including sendmail -t, which keeps the email address off the command line) is reasonably secure. No guarantees though, and if you know otherwise, please let me know.

If you'd like to look at a script which goes to great lengths to be security conscious (because it's able to write to any file on your web site) see SiteMgr.

Debugging

Debugging is a challenge with CGI scripts, because the web server runs your script, not you, so you can't easily get access to standard error.

The first step is to ensure your script works when you run it by hand, even before you put it on the web server and try to fill out your forms. If your script doesn't parse, for example, your browser would only report "Document contains no data" or "Server error" or something even less informative. To run your script by hand, type Perl script_name at the command prompt, or Perl -d script_name if you want to use Perl's very useful debugger.

However, the point of a CGI script is to process a browser's form data, and these commands don't supply any to the script. If you use METHOD = POST (recommended) in your form, the data will be passed to the script on STDIN, which will need some environment variables to properly interpret it. You need to run your script under a special environment.

Most Unix systems support the env command which does precisely this. Here is a simple example of its use on the command line (but place the first two lines on one line).

env REQUEST_METHOD=POST CONTENT_LENGTH=53 perl -dscript_name << HEREname=John+Doe&email=a@a.com&msg=This+is+a+test.HERE


This command first creates two new environment variables, REQUEST_METHOD and CONTENT_LENGTH, and sets their
values as shown. It then executes perl -d under this augmented environment, and passes the contents of the HERE document as STDIN.

The HERE document contains the user's form data. It consists of a &-separated list of name=value pairs. name is the variable name you used for a form element in your form page, and value is the corresponding data that the browser supplied for that form element. In addition, spaces are converted to +'s. CONTENT_LENGTH must be equal to the number of characters in the HERE document.

This is fairly awkward (especially counting characters) so I have altered the ReadParse function to accept form data on the command line. Simply run your script something like

script_name name=John+Doe\&email=a@a.com\&msg=This+is+a+test.


supplying it with the list of name/value pairs as an argument on the command line. The ReadParse function will not detect methods GET or POST and will look to the command line arguments for the form data. Note, however, that the shell requires that you escape &'s by preceding them with a backslash.

This is an overview of the CGI protocol (under method POST) and how web servers pass data to scripts. It should be sufficient for most debugging. There are a few more details, which you can learn by examining the ReadParse function, a tutorial called Reading CGI Data, or the The Common Gateway Interface documentation.

After debugging your script by hand, you can run it from the web server by placing it in your site's cgi-bin directory. If it has been thoroughly debugged, the major additional problems might be file and directory access permissions. Make sure the script itself is world readable and executable, and that any directories and files the script must write to are world readable and writeable.

Permissions

You will also have to make the programexecutable. To do this you can use your FTP program, most ftp programs like WS_FTPLE will do this. Once the file is uploaded click on the program once, then right click on it again. You will see some options, you want to find either a "chmod" command or "change permissions".

Click on the "Execute" boxes and hit "ok

I can't figure out the correct path to my files.

Paths to files uploaded to your account begin with your account's root path. This is as follows

/usr/local/etc/httpd/sites/yourdomain.com/

You can determine the rest of the path from the location of your file relative to your root directory. For example, a file called "data.dat" located in your htdocs directory would have a full path as follows:

/usr/local/etc/httpd/sites/yourdomain.com/htdocs/data.dat

If you will be referencing many files, it is probably easiest to create a variable called $root and assign it the full path to the root directory:

$root = "/usr/local/etc/httpd/sites/yourdomain.com";

You can then simply perpend the root path variable to the relative locations of your files throughout your script. For example:

$datafile = "$root/htdocs/data.dat";

Unless you need to reference files outside your cgi-bin, we recommend using the shorter and simpler relative paths. Relative paths are always determined based on the location of the required file relative to the location of your script. For example, if you have a cgi-lib.pl library file in a directory called "Library" inside your cgi-bin, the relative path would be:

"./Library/cgi-lib.pl";

HTTP variables

If scripts are executed outside the server, the shell trigger will start a fresh Python interpreter process and the code will be executed - but it's worth remembering that in this case, the usual HTTP variables ($HTTP_REFERER, $QUERY_STRING, etc) won't be set, so if your script relies on values being available for these variables, you'll need to test for this and set sensible default values.

Cross-site scripting The cross site scripting issue shouldn't be ignored. One recommended method is to set the default Charest in your httpd.conf:

AddDefaultCharset = "iso-8859-1"


(But this does ignore the difficulty of the amount of content in character sets other than iso-8859-1).

Incoming data It almost goes without saying but never pass any received QUERY_STRING or PATH_INFO data to external programs (e.g. sendmail, etc.) without first escaping potentially problematic characters -- see Paul Phillip's Safe CGI paper (go on, you know you should - and take a bookmark while you're there).

Here's something I prepared earlier. Note - it ignores CR and LF which it probably shouldn't.

def esc_chars(tgt):# will change, for example, a!!a to a\!\!aimport rematchstr = re.compile(r"""([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])""")return matchstr.sub(r'\\\1', tgt)e.g.:>>> attack = """a!a!""">>> print esc_chars(attack)a\!\!a>>> attack = """#/bin/bash""">>> print esc_chars(attack)\!\#/bin/bash

Active Error Documents

I would imagine that most server administrators will be familiar with the idea of improving the functionality of the old "500 Server Error" message. I've found it useful to create a Python script which informs the user of the fact of an error, inform them that the janitor (me) will be automatically emailed with details of the error and offer them an opportunity to send me their email address so I can get back to them when the problem has been fixed.

Here's how the error page appears to user and here's the source for the Error Document and here's the source for the "fixit.py" script which processes the form field in which they enter their email address.

Infinite Loops

If your CGI somehow gets into an infinite loop, the web server may well wait forever for the CGI to return results. This, in turn, means that the user will probably be left staring at a blank or partially filled browser for quite some time. Or worse, they'll just hit the Back button and then try again, putting another infinitely long CGI in motion on your server, and thus using up CPU time that produces nothing.

CGI programs don't know if and when the user hits the Stop button on the browser. The program often finds out only when it tries to output HTML and receives a SIGPIPE signal because the socket is no longer valid, but this may depend on the configuration of the operating system and web server.

How to find and kill infinitely looping CGIs

To kill an infinitely looping CGI, you must first find its process ID (PID). The classic way to do this is with the Unix ps command. Under Solaris, for example, you can list all of your processes like this:

ps -ef

Look for unusually large values in the TIME column and note the PID for that process. Note that you can't trust the name given by ps, because it can be set on some systems by setting argv[0] in the executing program. Once you have the PID of the looping CGI, you can kill it with the kill command, like this:

kill 2353

However, this is not guaranteed to stop processes that choose to ignore the TERM signal. If the process is still present after a few seconds, try the -9 option, as in kill -9 2353. This should not be your first option because processes killed with the -9 option do not get a chance to clean up temp files or finish writing buffered output to a file. The kill command may leave a zombie process on the system, which cannot be killed but occupies only minimal system resources. Zombie processes are marked with Z or defunct in ps output. If a process is not a zombie but cannot be killed, then it is probably waiting on an NFS call or a stuck device.

There are a number of more user-friendly tools for hunting down rogue processes, such as top, skill, and killall.

UserTime.cgi

UserTime is a CGI (common gateway interface). It was written by Jochen "Joe" Savelberg to allow the customers of euregio.net to have an up-to-date indication of the time that they've spend online.

How does it work?

First, the user has to enter some information, such as his/her username, his/her password and to select some more options. Please note that the username and password are case-sensitive! Just enter the same information which is in your PPP connection script and you'll be fine.

The web server passes the information to HyperCard - the program the UserTime.cgi was written in. HyperCard then calls another script which sends some TCP/IP commands to euregio.net's UNIX server. This server returns the login times for the requested account. Then HyperCard takes over again and calculates the costs and creates the information page which is sent back to the web server and to the user's WWW client (such as Netscape Navigator, Mosaic, etc.).

When several users are requesting this service at the same time it works according to the principle of FIFO (first in, first out). That means that the first request will be handled first and the remaining requests will be queued.

The drawback is that there is a certain time-out (in our case 360 seconds). Whenever there is a request that is in the queue for a longer period than this time-out, the request will be cancelled. The user gets a message that the gateway timed out. The user could try to send the form again a little bit later.

While the UserTime.cgi is processing the request, the user can still continue his quest for the holy grail, i.e. he can continue surfing the Internet. All he/she has to do is to choose 'New window' from his/her browsers menu. The other window will still be waiting for the results of UserTime.cgi.


Did You Know?

  • CGI is not a language in itself at all. CGI is basically a way some languages can be manipulated to be used on the net. Perl being the winner of the CGI cake. And C being the second.
  • CGI is language independent: C, Perl, Java, Unix shell script.
  • CGIs run on the server, they need to take care not to allow a
    rogue client to compromise the server either by hurting the server or serving up information
    which was supposed to be secret.








Partners Links