Home Search Download Documentation
FAQ Community SIGs Modules

The whole Python FAQ

See also the Python FAQ Wizard, which has a search engine and allows PSA members to update entries!

Last changed on Fri May 08 11:49:26 1998 EDT

(Entries marked with ** were changed within the last 24 hours; entries marked with * were changed within the last 7 days.)


1. General information and availability


2. Python in the real world


3. Building Python and Other Known Bugs


4. Programming in Python


5. Extending Python


6. Python's design


7. Using Python on non-UNIX platforms


8. Python on Windows


1. General information and availability


1.1. What is Python?

Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many brands of UNIX, on the Mac, and on PCs under MS-DOS, Windows, Windows NT, and OS/2.

To find out more, the best thing to do is to start reading the tutorial from the documentation set (see a few questions further down).

See also question 1.17 (what is Python good for).


1.2. Why is it called Python?

Apart from being a computer scientist, I'm also a fan of "Monty Python's Flying Circus" (a BBC comedy series from the seventies, in the -- unlikely -- case you didn't know). It occurred to me one day that I needed a name that was short, unique, and slightly mysterious. And I happened to be reading some scripts from the series at the time... So then I decided to call my language Python. But Python is not a joke. And don't you associate it with dangerous reptiles either! (If you need an icon, use an image of the 16-ton weight from the TV series or of a can of SPAM :-)


1.3. How do I obtain a copy of the Python source?

The latest complete Python source distribution is always available by anonymous ftp, e.g. ftp://ftp.python.org/pub/python/src/python1.4.tar.gz. It is a gzipped tar file containing the complete C source, LaTeX documentation, Python library modules, example programs, and several useful pieces of freely distributable software. This will compile and run out of the box on most UNIX platforms. (See section 7 for non-UNIX information.)

An index of said ftp directory can be found in the file INDEX. An HTML version of the index can be found in the file index.html, ftp://ftp.python.org/pub/python/index.html.


1.4. How do I get documentation on Python?

All documentation is available on-line, starting at http://www.python.org/doc/.

The LaTeX source for the documentation is part of the source distribution. If you don't have LaTeX, the latest Python documentation set is available, in various formats like postscript and html, by anonymous ftp - visit the above URL for links to the current versions.

PostScript for a high-level description of Python is in the file nluug-paper.ps (a separate file on the ftp site).


1.5. Are there other ftp sites that mirror the Python distribution?

The following anonymous ftp sites keep mirrors of the Python distribution:

USA:

        ftp://ftp.python.org/pub/python/
        ftp://gatekeeper.dec.com/pub/plan/python/
        ftp://ftp.uu.net/languages/python/
        ftp://ftp.wustl.edu/graphics/graphics/sgi-stuff/python/
        ftp://ftp.sterling.com/programming/languages/python/
        ftp://uiarchive.cso.uiuc.edu/pub/lang/python/
        ftp://ftp.pht.com/mirrors/python/python/
	ftp://ftp.cdrom.com/pub/python/
Europe:

        ftp://ftp.cwi.nl/pub/python/
        ftp://ftp.funet.fi/pub/languages/python/
        ftp://ftp.sunet.se/pub/lang/python/
        ftp://unix.hensa.ac.uk/mirrors/uunet/languages/python/
        ftp://ftp.ibp.fr/pub/python/
        ftp://sunsite.cnlab-switch.ch/mirror/python/
        ftp://ftp.informatik.tu-muenchen.de/pub/comp/programming/languages/python/
Australia:

        ftp://ftp.dstc.edu.au/pub/python/


1.6. Is there a newsgroup or mailing list devoted to Python?

There is a newsgroup, comp.lang.python, and a mailing list. The newsgroup and mailing list are gatewayed into each other -- if you can read news it's unnecessary to subscribe to the mailing list. Send e-mail to <python-list-request@cwi.nl> to (un)subscribe to the mailing list (a person reads this, don't use majordomo or listserv commands).

More info about the newsgroup and mailing list, and about other lists, can be found at http://www.python.org/python/MailingLists.html.

Recent archives of the newsgroup are kept by Deja News and accessible through the "locator" web page, http://www.python.org/locator/. This page also contains pointer to older archival collections.


1.7. Is there a WWW page devoted to Python?

Yes, http://www.python.org/ is the official Python home page.


1.8. Is the Python documentation available on the WWW?

Yes, see http://www.python.org/ (Python's home page). It contains pointers to hypertext versions of the whole documentation set (as hypertext, not just PostScript).

If you wish to browse this collection of HTML files on your own machine, it is available bundled up by anonymous ftp, e.g. ftp://ftp.python.org/pub/python/doc/html.tar.gz.

An Emacs-INFO set containing the library manual is also available by ftp, e.g. ftp://ftp.python.org/pub/python/doc/lib-info.tar.gz.


1.9. Are there any books on Python?

Yes, several:

      + Internet Programming with Python 
        by Aaron Watters, Guido van Rossum, and James Ahlstrom
        MIS Press/Henry Holt publishers
        ISBN: 1-55851-484-8
        First published October, 1996
      + Programming Python 
        by Mark Lutz
        O'Reilly & Associates
        ISBN: 1-56592-197-6
        First published October, 1996
      + Das Python-Buch (in German)
        by Martin von Loewis and Nils Fischbeck
        Addison-Wesley-Longman, 1997
        ISBN: 3-8273-1110-1


1.10. Are there any published articles about Python that I can reference?

If you can't reference the web site, and you don't want to reference the books (see previous question), there are several articles on Python that you could reference.

The only place that has references to most articles published on Python is currently the News Flashes page on the web site (search for "article"):

    http://www.python.org/python/News.html
There's also a very old article by Python's author:

    Guido van Rossum and Jelke de Boer, "Interactively Testing Remote
    Servers Using the Python Programming Language", CWI Quarterly, Volume
    4, Issue 4 (December 1991), Amsterdam, pp 283-303.


1.11. Are there short introductory papers or talks on Python?

There are several - you can find links to some of them collected at http://www.python.org/doc/Hints.html#intros.


1.12. How does the Python version numbering scheme work?

Python versions are numbered A.B.C or A.B. A is the major version number -- it is only incremented for major changes in functionality or source structure. B is the minor version number, incremented for less earth-shattering changes to a release. C is the patchlevel -- it is incremented for each new patch release. Not all releases have patch releases. Note that in the past, patches have added significant changes; in fact the changeover from 0.9.9 to 1.0.0 was the first time that either A or B changed!

Beta versions have an additional suffix of "betaN" for some small number N. Note that (for instance) all versions labeled 1.4betaN precede the actual release of 1.4. 1.4b3 is short for 1.4beta3.


1.13. How do I get a beta test version of Python?

If there are any beta releases, they are published in the normal source directory (e.g. ftp://ftp.python.org/pub/python/src/).

Alpha releases are only open to PSA members. See http://www.python.org/psa/ for information on how to join ($50/year).


1.14. Are there copyright restrictions on the use of Python?

Hardly. You can do anything you want with the source, as long as you leave the copyrights in, and display those copyrights in any documentation about Python that you produce. Also, don't use the author's institute's name in publicity without prior written permission, and don't hold them responsible for anything (read the actual copyright for a precise legal wording).

In particular, if you honor the copyright rules, it's OK to use Python for commercial use, to sell copies of Python in source or binary form, or to sell products that enhance Python or incorporate Python (or part of it) in some form. I would still like to know about all commercial use of Python!


1.15. Why was Python created in the first place?

Here's a very brief summary of what got me started:

I had extensive experience with implementing an interpreted language in the ABC group at CWI, and from working with this group I had learned a lot about language design. This is the origin of many Python features, including the use of indentation for statement grouping and the inclusion of very-high-level data types (although the details are all different in Python).

I had a number of gripes about the ABC language, but also liked many of its features. It was impossible to extend the ABC language (or its implementation) to remedy my complaints -- in fact its lack of extensibility was one of its biggest problems. I had some experience with using Modula-2+ and talked with the designers of Modula-3 (and read the M3 report). M3 is the origin of the syntax and semantics used for exceptions, and some other Python features.

I was working in the Amoeba distributed operating system group at CWI. We needed a better way to do system administration than by writing either C programs or Bourne shell scripts, since Amoeba had its own system call interface which wasn't easily accessible from the Bourne shell. My experience with error handling in Amoeba made me acutely aware of the importance of exceptions as a programming language feature.

It occurred to me that a scripting language with a syntax like ABC but with access to the Amoeba system calls would fill the need. I realized that it would be foolish to write an Amoeba-specific language, so I decided that I needed a language that was generally extensible.

During the 1989 Christmas holidays, I had a lot of time on my hand, so I decided to give it a try. During the next year, while still mostly working on it in my own time, Python was used in the Amoeba project with increasing success, and the feedback from colleagues made me add many early improvements.

In February 1991, after just over a year of development, I decided to post to USENET. The rest is in the Misc/HISTORY file.


1.16. Do I have to like "Monty Python's Flying Circus"?

No, but it helps. Pythonistas like the occasional reference to SPAM, and of course, nobody expects the Spanish Inquisition

The two main reasons to use Python are:

 - Portable
 - Easy to learn
The three main reasons to use Python are:

 - Portable
 - Easy to learn
 - Powerful standard library
(And nice red uniforms.)

And remember, there is no rule six.


1.17. What is Python good for?

Python is used in many situations where a great deal of dynamism, ease of use, power, and flexibility are required.

In the area of basic text manipulation core Python (without any non-core extensions) is easier to use and is roughly as fast as just about any language, and this makes Python good for many system administration type tasks and for CGI programming and other application areas that manipulate text and strings and such.

When augmented with standard extensions (such as PIL, COM, Numeric, oracledb, kjbuckets, tkinter, win32api, etc.) or special purpose extensions (that you write, perhaps using helper tools such as SWIG, or using object protocols such as ILU/CORBA or COM) Python becomes a very convenient "glue" or "steering" language that helps make heterogeneous collections of unrelated software packages work together. For example by combining Numeric with oracledb you can help your SQL database do statistical analysis, or even Fourier transforms. One of the features that makes Python excel in the "glue language" role is Python's simple, usable, and powerful C language runtime API.

Many developers also use Python extensively as a graphical user interface development aide.


1.18. Can I use the FAQ Wizard software to maintain my own FAQ?

Sure. Version 0.9.0 was distributed in the Tools subdirectory of the Python 1.5 source release at

  http://www.python.org/ftp/python/src/python1.5.tar.gz


2. Python in the real world


2.1. How many people are using Python?

I don't know, but the maximum number of simultaneous subscriptions to the Python mailing list before it was gatewayed into the newsgroup was about 180 (several of which were local redistribution lists). I believe that many active Python users don't bother to subscribe to the list, and now that there's a newsgroup the mailing list subscription is even less meaningful. I see new names on the newsgroup all the time and my best guess is that there are currently at least several thousands of users.

Another statistic is the number of accesses to the Python WWW server. Have a look at http://www.python.org/stats/.


2.2. Have any significant projects been done in Python?

At CWI (the former home of Python), we have written a 20,000 line authoring environment for transportable hypermedia presentations, a 5,000 line multimedia teleconferencing tool, as well as many many smaller programs.

At CNRI (Python's new home), we have written two large applications: Grail, a fully featured web browser (see http://grail.cnri.reston.va.us), and the Knowbot Operating Environment, a distributed environment for mobile code.

The University of Virginia uses Python to control a virtual reality engine. See http://alice.cs.cmu.edu.

The ILU project at Xerox PARC can generate Python glue for ILU interfaces. See ftp://ftp.parc.xerox.com/pub/ilu/ilu.html. ILU is a free CORBA compliant ORB which supplies distributed object connectivity to a host of platforms using a host of languages.

Mark Hammond and Greg Stein and others are interfacing Python to Microsoft's COM and ActiveX architectures. This means, among other things, that Python may be used in active server pages or as a COM controller (for example to automatically extract from or insert information into Excel or MSAccess or any other COM aware application). Mark claims Python can even be a ActiveX scripting host (which means you could embed JScript inside a Python application, if you had a strange sense of humor). Python/AX/COM is distributed as part of the PythonWin distribution.

The University of California, Irvine uses a student administration system called TELE-Vision written entirely in Python. Contact: Ray Price <rlprice@uci.edu>.

The Melbourne Cricket Ground (MCG) in Australia (a 100,000+ person venue) has it's scoreboard system written largely in Python on MS Windows. Python expressions are used to create almost every scoring entry that appears on the board. The move to Python/C++ away from exclusive C++ has provided a level of functionality that would simply not have been viable otherwise.

See also the next question.

If you have done a significant project in Python that you'd like to be included in the list above, send me email!


2.3. Are there any commercial projects going on using Python?

Yes, there's lots of commercial activity using Python. See http://www.python.org/python/Users.html for a list.


2.4. How stable is Python?

Very stable. While the current version number would suggest it is in the early stages of development, in fact new, stable releases (numbered 0.9.x through 1.4) have been coming out roughly every 3 to 6 or 12 months for the past four years.


2.5. What new developments are expected for Python in the future?

Follow the newsgroup discussions! The workshop proceedings (http://www.python.org/workshops/) may also contain interesting looks into the future.


2.6. Is it reasonable to propose incompatible changes to Python?

In general, no. There are already millions of lines of Python code around the world, so any changes in the language that invalidates more than a very small fraction of existing programs has to be frowned upon. Even if you can provide a conversion program, there still is the problem of updating all documentation. Providing a gradual upgrade path is the only way if a feature has to be changed.


2.7. What is the future of Python?

If I knew, I'd be rich :-)

Seriously, the formation of the PSA (Python Software Activity, see http://www.python.org/psa/) ensures some kind of support even in the (unlikely!) event that I'd be hit by a bus (actually, here in the U.S., a car accident would be more likely :-), were to join a nunnery, or would be head-hunted. A large number of Python users have become experts at Python programming as well as maintenance of the implementation, and would easily fill the vacuum created by my disappearance.

In the meantime, I have no plans to disappear -- rather, I am committed to improving Python, and my current benefactor, CNRI (see http://www.cnri.reston.va.us) is just as committed to continue its support of Python and the PSA. In fact, we have great plans for Python -- we just can't tell yet!


2.8. What is the PSA, anyway?

The Python Software Activity http://www.python.org/psa/ was created by a number of Python aficionados who want Python to be more than the product and responsibility of a single individual. It has found a home at CNRI http://www.cnri.reston.va.us. Anybody who wishes Python well should join the PSA.


2.9. How do I join the PSA?

The full scoop is available on the web, see http://www.python.org/psa/Joining.html. Summary: send a check of at least $50 to CNRI/PSA, 1895 Preston White Drive, Suite 100, in Reston, VA 20191. Full-time students pay $25. Prices drop by half the second half of the fiscal year (April - September). Companies can join for a mere $500. Pets may join for only $15!


2.10. What are the benefits of joining the PSA?

Like National Public Radio, without your support, Python will wither.

If you join, your name will be mentioned on the PSA's web server. Workshops organized by the PSA http://www.python.org/workshops/ are only accessible to PSA members (you can join at the door). The PSA is working on additional benefits, such as reduced prices for books and software, and early access to alpha versions of Python. (The latter has been realized -- the 1.5 alpha testing program is accessible only to PSA members.)

You might also consider to become a member of the starship project. It is a free group of Python enthusiasts, and you get a free account. They just happen to admit only PSA members. Check out http://starship.skyport.net for further information.


2.11. Is Python Y2K (Year 2000) Compliant?

Since Python is available free of charge, I don't want to make any absolute guarantees. If there is a problem that I didn't foresee, I don't want to be sued for damages.

That said, I'm pretty convinced that there are no Y2K problems anywhere in the core distribution, either 1.5 or 1.4. Python does few date manipulations, and what it does is all based on the Unix representation for time (even on non-Unix systems) which uses seconds since 1970 and won't overflow until 2038.


3. Building Python and Other Known Bugs


3.1. Is there a test set?

Sure. You can run it after building with "make test", or you can run it manuall with the command

	import test.autotest
In 1.4 or earlier, use

	import autotest
The test set doesn't test all features of Python, but it goes a long way to confirm that Python is actually working.

NOTE: if "make test" fails, don't just mail the output to the newsgroup -- this doesn't give enough information to debug the problem. Instead, find out which test fails, and run that test manually from an interactive interpreter. For example, if "make test" reports that test_spam fails, try this interactively:

	import test.test_spam
This generally produces more verbose output which can be diagnosed to debug the problem.


3.2. When running the test set, I get complaints about floating point operations, but when playing with floating point operations I cannot find anything wrong with them.

The test set makes occasional unwarranted assumptions about the semantics of C floating point operations. Until someone donates a better floating point test set, you will have to comment out the offending floating point tests and execute similar tests manually.


3.3. Link errors after rerunning the configure script.

It is generally necessary to run "make clean" after a configuration change.


3.4. The python interpreter complains about options passed to a script (after the script name).

You are probably linking with GNU getopt, e.g. through -liberty. Don't. The reason for the complaint is that GNU getopt, unlike System V getopt and other getopt implementations, doesn't consider a non-option to be the end of the option list. A quick (and compatible) fix for scripts is to add "--" to the interpreter, like this:

        #! /usr/local/bin/python --
You can also use this interactively:

        python -- script.py [options]
Note that a working getopt implementation is provided in the Python distribution (in Python/getopt.c) but not automatically used.


3.5. When building on the SGI, make tries to run python to create glmodule.c, but python hasn't been built or installed yet.

Comment out the line mentioning glmodule.c in Setup and build a python without gl first; install it or make sure it is in your $PATH, then edit the Setup file again to turn on the gl module, and make again. You don't need to do "make clean"; you do need to run "make Makefile" in the Modules subdirectory (or just run "make" at the toplevel).


3.6. I use VPATH but some targets are built in the source directory.

On some systems (e.g. Sun), if the target already exists in the source directory, it is created there instead of in the build directory. This is usually because you have previously built without VPATH. Try running "make clobber" in the source directory.


3.7. Trouble building or linking with the GNU readline library.

Consider using readline 2.0. Some hints:

You can use the GNU readline library to improve the interactive user interface: this gives you line editing and command history when calling python interactively. You need to configure and build the GNU readline library before running the configure script. Its sources are no longer distributed with Python; you can ftp them from any GNU mirror site, or from its home site ftp://slc2.ins.cwru.edu/pub/dist/readline-2.0.tar.gz (or a higher version number -- using version 1.x is not recommended). Pass the Python configure script the option --with-readline=DIRECTORY where DIRECTORY is the absolute pathname of the directory where you've built the readline library. Some hints on building and using the readline library: On SGI IRIX 5, you may have to add the following to rldefs.h:

        #ifndef sigmask
        #define sigmask(sig) (1L << ((sig)-1))
        #endif
On most systems, you will have to add #include "rldefs.h" to the top of several source files, and if you use the VPATH feature, you will have to add dependencies of the form foo.o: foo.c to the Makefile for several values of foo. The readline library requires use of the termcap library. A known problem with this is that it contains entry points which cause conflicts with the STDWIN and SGI GL libraries. The STDWIN conflict can be solved by adding a line saying '#define werase w_erase' to the stdwin.h file (in the STDWIN distribution, subdirectory H). The GL conflict has been solved in the Python configure script by a hack that forces use of the static version of the termcap library. Check the newsgroup gnu.bash.bug news:gnu.bash.bug for specific problems with the readline library (I don't read this group but I've been told that it is the place for readline bugs).


3.8. Trouble with socket I/O on older Linux 1.x versions.

Once you've built Python, use it to run the regen.py script in the Lib/linux1 directory. Apparently the files as distributed don't match the system headers on some Linux versions.


3.9. Trouble with prototypes on Ultrix.

Ultrix cc seems broken -- use gcc, or edit config.h to #undef HAVE_PROTOTYPES.


3.10. Other trouble building Python on platform X.

Please email the details to <guido@cnri.reston.va.us> and I'll look into it. Please provide as many details as possible. In particular, if you don't tell me what type of computer and what operating system (and version) you are using it will be difficult for me to figure out what is the matter. If you get a specific error message, please email it to me too.


3.11. How to configure dynamic loading on Linux.

This is now automatic as long as your Linux version uses the ELF object format (all recent Linuxes do).


3.12. I can't get shared modules to work on Linux 2.0 (Slackware96)?

This is a bug in the Slackware96 release. The fix is simple: Make sure that there is a link from /lib/libdl.so to /lib/libdl.so.1 so that the following links are setup: /lib/libdl.so -> /lib/libdl.so.1 /lib/libdl.so.1 -> /lib/libdl.so.1.7.14 You may have to rerun the configure script, after rm'ing the config.cache file, before you attempt to rebuild python after this fix.


3.13. Trouble when making modules shared on Linux.

This happens when you have built Python for static linking and then enable
  *shared*
in the Setup file. Shared library code must be compiled with "-fpic". If a .o file for the module already exist that was compiled for static linking, you must remove it or do "make clean" in the Modules directory.


3.14. How to use threads on Linux.

[Greg Stein] You need to have a very recent libc, or even better, get the LinuxThreads-0.5 distribution. Note that if you install LinuxThreads normally, then you shouldn't need to specify the directory to the -with-thread configuration switch. The configure script ought to find it without a problem. To make sure everything builds properly, do a "make clean", remove config.cache, re-run configure with that switch, and then build.


3.15. Errors when linking with a shared library containing C++ code.

Link the main Python binary with C++. Change the definition of LINKCC in Modules/Makefile to be your C++ compiler. You may have to edit config.c slightly to make it compilable with C++.


3.16. I built with tkintermodule.c enabled but get 'Tkinter not found'

Tkinter.py (note: upper case T) lives in a subdirectory of Lib, Lib/tkinter. If you are using the default module search path, you probably didn't enable the line in the Modules/Setup file defining TKPATH; if you use the environment variable PYTHONPATH, you'll have to add the proper tkinter subdirectory.

For Windows, see question 7.11.


3.17. I built with Tk 4.0 but Tkinter complains about the Tk version.

Several things could cause this. You most likely have a Tk 3.6 installation that wasn't completely eradicated by the Tk 4.0 installation (which tends to add "4.0" to its installed files). You may have the Tk 3.6 support library installed in the place where the Tk 4.0 support files should be (default /usr/local/lib/tk/); you may have compiled Python with the old tk.h header file (yes, this actually compiles!); you may actually have linked with Tk 3.6 even though Tk 4.0 is also around. Similar for Tcl 7.4 vs. Tcl 7.3.


3.18. Link errors for Tcl/Tk symbols when linking with Tcl/Tk.

Quite possibly, there's a version mismatch between the Tcl/Tk header files (tcl.h and tk.h) and the tck/tk libraries you are using (the "-ltk4.0" and "-ltcl7.4" arguments for _tkinter in the Setup file). If you have installed both versions 7.4/4.0 and 7.5/4.1 of Tcl/Tk, most likely your header files are for The newer versions, but the Setup line for _tkinter in some Python distributions references 7.4/4.0 by default. Changing this to 7.5/4.1 should take care of this.


3.19. I configured and built Python for Tcl/Tk but "import Tkinter" fails.

Most likely, you forgot to enable the line in Setup that says "TKPATH=:$(DESTLIB)/tkinter".


3.20. Tk doesn't work right on DEC Alpha.

You probably compiled either Tcl, Tk or Python with gcc. Don't. For this platform, which has 64-bit integers, gcc is known to generate broken code. The standard cc (which comes bundled with the OS!) works. If you still prefer gcc, at least try recompiling with cc before reporting problems to the newsgroup or the author; if this fixes the problem, report the bug to the gcc developers instead. (As far as we know, there are no problem with gcc on other platforms -- the instabilities seem to be restricted to the DEC Alpha.) See also question 3.6.

There's also a 64-bit bugfix for Tcl/Tk; see

	http://grail.cnri.reston.va.us/grail/info/patches/tk64bit.txt


3.21. Several common system calls are missing from the posix module.

Most likely, all test compilations run by the configure script are failing for some reason or another. Have a look in config.log to see what could be the reason. A common reason is specifying a directory to the --with-readline option that doesn't contain the libreadline.a file.


3.22. ImportError: No module named string, on MS Windows.

Most likely, your PYTHONPATH environment variable should be set to something like:

set PYTHONPATH=c:\python;c:\python\lib;c:\python\scripts

(assuming Python was installed in c:\python)


3.23. Core dump on SGI when using the gl module.

There are conflicts between entry points in the termcap and curses libraries and an entry point in the GL library. There's a hack of a fix for the termcap library if it's needed for the GNU readline library, but it doesn't work when you're using curses. Concluding, you can't build a Python binary containing both the curses and gl modules.


3.24. "Initializer not a constant" while building DLL on MS-Windows

Static type object initializers in extension modules may cause compiles to fail with an error message like "initializer not a constant". Fredrik Lundh <<Fredrik.Lundh@image.combitech.se>> explains:

This shows up when building DLL under MSVC. There's two ways to address this: either compile the module as C++, or change your code to something like:

  statichere PyTypeObject bstreamtype = {
      PyObject_HEAD_INIT(NULL) /* must be set by init function */
      0,
      "bstream",
      sizeof(bstreamobject),
  ...
  void
  initbstream()
  {
      /* Patch object type */
      bstreamtype.ob_type = &PyType_Type;
      Py_InitModule("bstream", functions);
      ...
  }


3.25. Output directed to a pipe or file disappears on Linux.

Some people have reported that when they run their script interactively, it runs great, but that when they redirect it to a pipe or file, no output appears.

    % python script.py
    ...some output...
    % python script.py >file
    % cat file
    % # no output
    % python script.py | cat
    % # no output
    %
Nobody knows what causes this, but it is apparently a Linux bug. Most Linux users are not affected by this.

There's at least one report of someone who reinstalled Linux (presumably a newer version) and Python and got rid of the problem; so this may be the solution.


3.26. Syntax Errors all over the place in Linux with libc 5.4

``I have installed python1.4 on my Linux system. When I try run the import statement I get the following error message:''

   File "<stdin>", line 1
       import sys
          ^
   Syntax Error: "invalid syntax"
Did you compile it yourself? This usually is caused by an incompatibility between libc 5.4.x and earlier libc's. In particular, programs compiled with libc 5.4 give incorrect results on systems which had libc 5.2 installed because the ctype.h file is broken. In this case, Python can't recognize which characters are letters and so on. The fix is to install the C library which was used when building the binary that you installed, or to compile Python yourself. When you do this, make sure the C library header files which get used by the compiler match the installed C library.

[adapted from an answer by Martin v. Loewis]

PS [adapted from Andreas Jung]: If you have upgraded to libc 5.4.x, and the problem persists, check your library path for an older version of libc. Try to clean update libc with the libs and the header files and then try to recompile all.


3.27. Crash in XIO on Linux when using Tkinter.

When Python is built with threads under Linux, use of Tkinter can cause crashes like the following:

  >>> from Tkinter import *
  >>> root = Tk()
  XIO:  fatal IO error 0 (Unknown error) on X server ":0.0"
        after 45 requests (40 known processed) with 1 events remaining.
The reason is that the default Xlib is not built with support for threads. If you rebuild Xlib with threads enabled the problems go away. Alternatively, you can rebuild Python without threads ("make clean" first!).

(Disclaimer: this is from memory.)


3.28. How can I test if Tkinter is working?

Try the following:

  python
  >>> import _tkinter
  >>> import Tkinter
  >>> Tkinter._test()
This should pop up a window with two buttons, one "Click me" and one "Quit".

If the first statement (import _tkinter) fails, your Python installation probably has not been configured to support Tcl/Tk. On Unix, if you have installed Tcl/Tk, you have to rebuild Python after editing the Modules/Setup file to enable the _tkinter module and the TKPATH environment variable.

It is also possible to get complaints about Tcl/Tk version number mismatches or missing TCL_LIBRARY or TK_LIBRARY environment variables. These have to do with Tcl/Tk installation problems.

A common problem is to have installed versions of tcl.h and tk.h that don't match the installed version of the Tcl/Tk libraries; this usually results in linker errors or (when using dynamic loading) complaints about missing symbols during loading the shared library.


4. Programming in Python


4.1. Is there a source code level debugger with breakpoints, step, etc.?

Yes. Check out module pdb. It is documented in the Library Reference Manual; pdb.help() also prints the documentation. You can write your own debugger by using the code for pdb as an example.

Pythonwin also has a GUI debugger available, based on bdb, which colors breakpoints and has quite a few cool features (including debugging non-Pythonwin programs). The interface needs some work, but is interesting none the less. A reference can be found in http://www.python.org/ftp/python/pythonwin/pwindex.html


4.2. Can I create an object class with some methods implemented in C and others in Python (e.g. through inheritance)? (Also phrased as: Can I use a built-in type as base class?)

No, but you can easily create a Python class which serves as a wrapper around a built-in object, e.g. (for dictionaries):

        # A user-defined class behaving almost identical
        # to a built-in dictionary.
        class UserDict:
                def __init__(self): self.data = {}
                def __repr__(self): return repr(self.data)
                def __cmp__(self, dict):
                        if type(dict) == type(self.data):
                                return cmp(self.data, dict)
                        else:
                                return cmp(self.data, dict.data)
                def __len__(self): return len(self.data)
                def __getitem__(self, key): return self.data[key]
                def __setitem__(self, key, item): self.data[key] = item
                def __delitem__(self, key): del self.data[key]
                def keys(self): return self.data.keys()
                def items(self): return self.data.items()
                def values(self): return self.data.values()
                def has_key(self, key): return self.data.has_key(key)
A2. See Jim Fulton's ExtensionClass for an example of a mechanism which allows you to have superclasses which you can inherit from in Python -- that way you can have some methods from a C superclass (call it a mixin) and some methods from either a Python superclass or your subclass. See http://www.digicool.com/papers/ExtensionClass.html.


4.3. Is there a curses/termcap package for Python?

[Andrew Kuchling] The standard Python distribution comes with a curses module in the Modules/ subdirectory, though it's not compiled by default. However, that module only supports plain curses; you can't use ncurses features like colors with it (though it will link with ncurses).

Oliver Andrich has an enhanced module that does support such features; there's a older version available at http://www.uni-koblenz.de/~andrich/projects.html, but e-mail him and ask for a copy of the current version.


4.4. Is there an equivalent to C's onexit() in Python?

Yes, if you import sys and assign a function to sys.exitfunc, it will be called when your program exits, is killed by an unhandled exception, or (on UNIX) receives a SIGHUP or SIGTERM signal.


4.5. When I define a function nested inside another function, the nested function seemingly can't access the local variables of the outer function. What is going on? How do I pass local data to a nested function?

Python does not have arbitrarily nested scopes. When you need to create a function that needs to access some data which you have available locally, create a new class to hold the data and return a method of an instance of that class, e.g.:

        class MultiplierClass:
            def __init__(self, factor):
                self.factor = factor
            def multiplier(self, argument):
                return argument * self.factor
        def generate_multiplier(factor):
            return MultiplierClass(factor).multiplier
        twice = generate_multiplier(2)
        print twice(10)
        # Output: 20
An alternative solution uses default arguments, e.g.:

        def generate_multiplier(factor):
            def multiplier(arg, fact = factor):
                return arg*fact
            return multiplier
        twice = generate_multiplier(2)
        print twice(10)
        # Output: 20


4.6. How do I iterate over a sequence in reverse order?

If it is a list, the fastest solution is

        list.reverse()

        try:

                for x in list:

                        "do something with x"

        finally:

                list.reverse()

This has the disadvantage that while you are in the loop, the list is temporarily reversed. If you don't like this, you can make a copy. This appears expensive but is actually faster than other solutions:

        rev = list[:]

        rev.reverse()

        for x in rev:

                <do something with x>

If it's not a list, a more general but slower solution is:

        for i in range(len(sequence)-1, -1, -1):

                x = sequence[i]

                <do something with x>

A more elegant solution, is to define a class which acts as a sequence and yields the elements in reverse order (solution due to Steve Majewski):

        class Rev:

                def __init__(self, seq):

                        self.forw = seq

                def __len__(self):

                        return len(self.forw)

                def __getitem__(self, i):

                        return self.forw[-(i + 1)]

You can now simply write:

        for x in Rev(list):

                <do something with x>

Unfortunately, this solution is slowest of all, due to the method call overhead...


4.7. My program is too slow. How do I speed it up?

That's a tough one, in general. There are many tricks to speed up Python code; I would consider rewriting parts in C only as a last resort. One thing to notice is that function and (especially) method calls are rather expensive; if you have designed a purely OO interface with lots of tiny functions that don't do much more than get or set an instance variable or call another method, you may consider using a more direct way, e.g. directly accessing instance variables. Also see the standard module "profile" (described in the Library Reference manual) which makes it possible to find out where your program is spending most of its time (if you have some patience -- the profiling itself can slow your program down by an order of magnitude).

Remember that many standard optimization heuristics you may know from other programming experience may well apply to Python. For example it may be faster to send output to output devices using larger writes rather than smaller ones in order to avoid the overhead of kernel system calls. Thus CGI scripts that write all output in "one shot" may be notably faster than those that write lots of small pieces of output.

Also, be sure to use "aggregate" operations where appropriate. For example the "slicing" feature allows programs to chop up lists and other sequence objects in a single tick of the interpreter mainloop using highly optimized C implementations. Thus to get the same effect as

  L2 = []
  for i in range[3]:
       L2.append(L1[i])
it is much shorter and far faster to use

  L2 = list(L1[:3]) # "list" is redundant if L1 is a list.
Note that the map() function, particularly used with builtin methods or builtin functions can be a convenient accellerator. For example to pair the elements of two lists together:

  >>> map(None, [1,2,3], [4,5,6])
  [(1, 4), (2, 5), (3, 6)]
or to compute a number of sines:

  >>> map( math.sin, (1,2,3,4))
  [0.841470984808, 0.909297426826, 0.14112000806,   -0.756802495308]
The map operation completes very quickly in such cases.

Other examples of aggregate operations include the join, joinfields, split, and splitfields methods of the standard string builtin module. For example if s1..s7 are large (10K+) strings then string.joinfields([s1,s2,s3,s4,s5,s6,s7], "") may be far faster than the more obvious s1+s2+s3+s4+s5+s6+s7, since the "summation" will compute many subexpressions, whereas joinfields does all copying in one pass. For manipulating strings also consider the regular expression libraries and the "substitution" operations String % tuple and String % dictionary. Also be sure to use the list.sort builtin method to do sorting, and see FAQ's 4.51 and 4.59 for examples of moderately advanced usage -- list.sort beats other techniques for sorting in all but the most extreme circumstances.

There are many other aggregate operations available in the standard libraries and in contributed libraries and extensions.

Another common trick is to "push loops into functions or methods." For example suppose you have a program that runs slowly and you use the profiler (profile.run) to determine that a Python function ff is being called lots of times. If you notice that ff

   def ff(x):
       ...do something with x computing result...
       return result
tends to be called in loops like (A)

   list = map(ff, oldlist)
or (B)

   for x in sequence:
       value = ff(x)
       ...do something with value...
then you can often eliminate function call overhead by rewriting ff to

   def ffseq(seq):
       resultseq = []
       for x in seq:
           ...do something with x computing result...
           resultseq.append(result)
       return resultseq
and rewrite (A) to

    list = ffseq(oldlist)
and (B) to

    for value in ffseq(sequence):
        ...do something with value...
Other single calls ff(x) translate to ffseq([x])[0] with little penalty. Of course this technique is not always appropriate and there are other variants, which you can figure out.

For an anecdote related to optimization, see

	http://grail.cnri.reston.va.us/python/essays/list2str.html


4.8. When I have imported a module, then edit it, and import it again (into the same Python process), the changes don't seem to take place. What is going on?

For reasons of efficiency as well as consistency, Python only reads the module file on the first time a module is imported. (Otherwise a program consisting of many modules, each of which imports the same basic module, would read the basic module over and over again.) To force rereading of a changed module, do this:

        import modname
        reload(modname)
Warning: this technique is not 100% fool-proof. In particular, modules containing statements like

        from modname import some_objects
will continue to work with the old version of the imported objects.


4.9. How do I find the current module name?

A module can find out its own module name by looking at the (predefined) global variable __name__. If this has the value '__main__' you are running as a script.


4.10. I have a module in which I want to execute some extra code when it is run as a script. How do I find out whether I am running as a script?

See the previous question. E.g. if you put the following on the last line of your module, main() is called only when your module is running as a script:

        if __name__ == '__main__': main()


4.11. I try to run a program from the Demo directory but it fails with ImportError: No module named ...; what gives?

This is probably an optional module (written in C!) which hasn't been configured on your system. This especially happens with modules like "Tkinter", "stdwin", "gl", "Xt" or "Xm". For Tkinter, STDWIN and many other modules, see Modules/Setup.in for info on how to add these modules to your Python, if it is possible at all. Sometimes you will have to ftp and build another package first (e.g. Tcl and Tk for Tkinter). Sometimes the module only works on specific platforms (e.g. gl only works on SGI machines).

NOTE: if the complaint is about "Tkinter" (upper case T) and you have already configured module "tkinter" (lower case t), the solution is not to rename tkinter to Tkinter or vice versa. There is probably something wrong with your module search path. Check out the value of sys.path.

For X-related modules (Xt and Xm) you will have to do more work: they are currently not part of the standard Python distribution. You will have to ftp the Extensions tar file, i.e. ftp://ftp.python.org/pub/python/src/X-extension.tar.gz and follow the instructions there.

See also the next question.


4.12. I have successfully built Python with STDWIN but it can't find some modules (e.g. stdwinevents).

There's a subdirectory of the library directory named 'stdwin' which should be in the default module search path. There's a line in Modules/Setup(.in) that you have to enable for this purpose -- unfortunately in the latest release it's not near the other STDWIN-related lines so it's easy to miss it.


4.13. What GUI toolkits exist for Python?

Depending on what platform(s) you are aiming at, there are several.

Currently supported solutions:

There's a neat object-oriented interface to the Tcl/Tk widget set, called Tkinter. It is part of the standard Python distribution and well-supported -- all you need to do is build and install Tcl/Tk and enable the _tkinter module and the TKPATH definition in Modules/Setup when building Python. This is probably the easiest to install and use, and the most complete widget set. It is also very likely that in the future the standard Python GUI API will be based on or at least look very much like the Tkinter interface. For more info about Tk, including pointers to the source, see the Tcl/Tk home page at http://sunscript.sun.com. Tcl/Tk is now fully portable to the Mac and Windows platforms (NT and 95 only); you need Python 1.4beta3 or later and Tk 4.1patch1 or later.

There's an interface to X11, including the Athena and Motif widget sets (and a few individual widgets, like Mosaic's HTML widget and SGI's GL widget) available from ftp://ftp.python.org/pub/python/src/X-extension.tar.gz. Support by Sjoerd Mullender <sjoerd@cwi.nl>.

On top of the X11 interface there's the (recently revived) vpApp toolkit by Per Spilling, now also maintained by Sjoerd Mullender <sjoerd@cwi.nl>. See ftp://ftp.cwi.nl/pub/sjoerd/vpApp.tar.gz.

The Mac port has a rich and ever-growing set of modules that support the native Mac toolbox calls. See the documentation that comes with the Mac port. See ftp://ftp.python.org/pub/python/mac. Support by Jack Jansen <jack@cwi.nl>.

The NT port supported by Mark Hammond <MHammond@skippinet.com.au> (see question 7.2) includes an interface to the Microsoft Foundation Classes and a Python programming environment using it that's written mostly in Python. See ftp://ftp.python.org/pub/python/pythonwin/.

There's an object-oriented GUI based on the Microsoft Foundation Classes model called WPY, supported by Jim Ahlstrom <jim@interet.com>. Programs written in WPY run unchanged and with native look and feel on Windows NT/95, Windows 3.1 (using win32s), and on Unix (using Tk). Source and binaries for Windows and Linux are available in ftp://ftp.python.org/pub/python/wpy/.

Obsolete or minority solutions:

There's an interface to wxWindows. wxWindows is a portable GUI class library written in C++. It supports XView, Motif, MS-Windows as targets. There is some support for Macs and CURSES as well. wxWindows preserves the look and feel of the underlying graphics toolkit. See the wxPython WWW page at http://www.aiai.ed.ac.uk/~jacs/wx/wxpython/wxpython.html. Support for wxPython (by Harri Pasanen <pa@tekla.fi>) appears to have a low priority.

For SGI IRIX only, there are unsupported interfaces to the complete GL (Graphics Library -- low level but very good 3D capabilities) as well as to FORMS (a buttons-and-sliders-etc package built on top of GL by Mark Overmars -- ftp'able from ftp://ftp.cs.ruu.nl/pub/SGI/FORMS/). This is probably also becoming obsolete, as OpenGL takes over.

There's an interface to STDWIN, a platform-independent low-level windowing interface for Mac and X11. This is totally unsupported and rapidly becoming obsolete. The STDWIN sources are at ftp://ftp.cwi.nl/pub/stdwin/. (For info about STDWIN 2.0, please refer to Steven Pemberton <steven@cwi.nl> -- I believe it is also dead.)

There is an interface to WAFE, a Tcl interface to the X11 Motif and Athena widget sets. WAFE is at http://www.wu-wien.ac.at/wafe/wafe.html.

(The Fresco port that was mentioned in earlier versions of this FAQ no longer seems to exist. Inquire with Mark Linton.)


4.14. Are there any interfaces to database packages in Python?

There's a whole collection of them in the contrib area of the ftp server, see http://www.python.org/ftp/python/contrib/Database/.


4.15. Is it possible to write obfuscated one-liners in Python?

Yes. See the following three examples, due to Ulf Bartelt:

        # Primes < 1000
        print filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0,
        map(lambda x,y=y:y%x,range(2,int(pow(y,0.5)+1))),1),range(2,1000)))
        # First 10 Fibonacci numbers
        print map(lambda x,f=lambda x,f:(x<=1) or (f(x-1,f)+f(x-2,f)): f(x,f),
        range(10))
        # Mandelbrot set
        print (lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+y,map(lambda y,
        Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
        Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
        i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
        >=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
        64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy
        ))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)
        #    \___ ___/  \___ ___/  |   |   |__ lines on screen
        #        V          V      |   |______ columns on screen
        #        |          |      |__________ maximum of "iterations"
        #        |          |_________________ range on y axis
        #        |____________________________ range on x axis
Don't try this at home, kids!


4.16. Is there an equivalent of C's "?:" ternary operator?

Not directly. In many cases you can mimic a?b:c with "a and b or c", but there's a flaw: if b is zero (or empty, or None -- anything that tests false) then c will be selected instead. In many cases you can prove by looking at the code that this can't happen (e.g. because b is a constant or has a type that can never be false), but in general this can be a problem.

Tim Peters (who wishes it was Steve Majewski) suggested the following solution: (a and [b] or [c])[0]. Because [b] is a singleton list it is never false, so the wrong path is never taken; then applying [0] to the whole thing gets the b or c that you really wanted. Ugly, but it gets you there in the rare cases where it is really inconvenient to rewrite your code using 'if'.


4.17. My class defines __del__ but it is not called when I delete the object.

There are several possible reasons for this.

The del statement does not necessarily call __del__ -- it simply decrements the object's reference count, and if this reaches zero __del__ is called.

If your data structures contain circular links (e.g. a tree where each child has a parent pointer and each parent has a list of children) the reference counts will never go back to zero. You'll have to define an explicit close() method which removes those pointers. Please don't ever call __del__ directly -- __del__ should call close() and close() should make sure that it can be called more than once for the same object.

If the object has ever been a local variable (or argument, which is really the same thing) to a function that caught an expression in an except clause, chances are that a reference to the object still exists in that function's stack frame as contained in the stack trace. Normally, deleting (better: assigning None to) sys.exc_traceback will take care of this. If a stack was printed for an unhandled exception in an interactive interpreter, delete sys.last_traceback instead.

There is code that deletes all objects when the interpreter exits, but it is not called if your Python has been configured to support threads (because other threads may still be active). You can define your own cleanup function using sys.exitfunc (see question 4.4).

Finally, if your __del__ method raises an exception, this will be ignored. Starting with Python 1.4beta3, a warning message is printed to sys.stderr when this happens.

See also question 6.14 for a discussion of the possibility of adding true garbage collection to Python.


4.18. How do I change the shell environment for programs called using os.popen() or os.system()? Changing os.environ doesn't work.

You must be using either a version of python before 1.4, or on a (rare) system that doesn't have the putenv() library function.

Before Python 1.4, modifying the environment passed to subshells was left out of the interpreter because there seemed to be no well-established portable way to do it (in particular, some systems, have putenv(), others have setenv(), and some have none at all). As of Python 1.4, almost all Unix systems do have putenv(), and so does the Win32 API, and thus the os module was modified so that changes to os.environ are trapped and the corresponding putenv() call is made.


4.19. What is a class?

A class is the particular object type that is created by executing a class statement. Class objects are used as templates, to create class instance objects, which embody both the data structure and program routines specific to a datatype.


4.20. What is a method?

A method is a function that you normally call as x.name(arguments...) for some object x. The term is used for methods of classes and class instances as well as for methods of built-in objects. (The latter have a completely different implementation and only share the way their calls look in Python code.) Methods of classes (and class instances) are defined as functions inside the class definition.


4.21. What is self?

Self is merely a conventional name for the first argument of a method -- i.e. a function defined inside a class definition. A method defined as meth(self, a, b, c) should be called as x.meth(a, b, c) for some instance x of the class in which the definition occurs; the called method will think it is called as meth(x, a, b, c).


4.22. What is an unbound method?

An unbound method is a method defined in a class that is not yet bound to an instance. You get an unbound method if you ask for a class attribute that happens to be a function. You get a bound method if you ask for an instance attribute. A bound method knows which instance it belongs to and calling it supplies the instance automatically; an unbound method only knows which class it wants for its first argument (a derived class is also OK). Calling an unbound method doesn't "magically" derive the first argument from the context -- you have to provide it explicitly.

Trivia note regarding bound methods: each reference to a bound method of a particular object creates a bound method object. If you have two such references (a = inst.meth; b = inst.meth), they will compare equal (a == b) but are not the same (a is not b).


4.23. How do I call a method defined in a base class from a derived class that overrides it?

If your class definition starts with "class Derived(Base): ..." then you can call method meth defined in Base (or one of Base's base classes) as Base.meth(self, arguments...). Here, Base.meth is an unbound method (see previous question).


4.24. How do I call a method from a base class without using the name of the base class?

DON'T DO THIS. REALLY. I MEAN IT. It appears that you could call self.__class__.__bases__[0].meth(self, arguments...) but this fails when a doubly-derived method is derived from your class: for its instances, self.__class__.__bases__[0] is your class, not its base class -- so (assuming you are doing this from within Derived.meth) you would start a recursive call.

Often when you want to do this you are forgetting that classes are first class in Python. You can "point to" the class you want to delegate an operation to either at the instance or at the subclass level. For example if you want to use a "glorp" operation of a superclass you can point to the right superclass to use.

  class subclass(superclass1, superclass2, superclass3):
      delegate_glorp = superclass2
      ...
      def glorp(self, arg1, arg2):
            ... subclass specific stuff ...
            self.delegate_glorp.glorp(self, arg1, arg2)
       ...
  class subsubclass(subclass):
       delegate_glorp = superclass3
       ...
Note, however that setting delegate_glorp to subclass in subsubclass would cause an infinite recursion on subclass.delegate_glorp. Careful! Maybe you are getting too fancy for your own good. Consider simplifying the design (?).


4.25. How can I organize my code to make it easier to change the base class?

You could define an alias for the base class, assign the real base class to it before your class definition, and use the alias throughout your class. Then all you have to change is the value assigned to the alias. Incidentally, this trick is also handy if you want to decide dynamically (e.g. depending on availability of resources) which base class to use. Example:

        BaseAlias = <real base class>
        class Derived(BaseAlias):
                def meth(self):
                        BaseAlias.meth(self)
                        ...


4.26. How can I find the methods or attributes of an object?

This depends on the object type.

For an instance x of a user-defined class, instance attributes are found in the dictionary x.__dict__, and methods and attributes defined by its class are found in x.__class__.__bases__[i].__dict__ (for i in range(len(x.__class__.__bases__))). You'll have to walk the tree of base classes to find all class methods and attributes.

Many, but not all built-in types define a list of their method names in x.__methods__, and if they have data attributes, their names may be found in x.__members__. However this is only a convention.

For more information, read the source of the standard (but undocumented) module newdir.


4.27. I can't seem to use os.read() on a pipe created with os.popen().

os.read() is a low-level function which takes a file descriptor (a small integer). os.popen() creates a high-level file object -- the same type used for sys.std{in,out,err} and returned by the builtin open() function. Thus, to read n bytes from a pipe p created with os.popen(), you need to use p.read(n).


4.28. How can I create a stand-alone binary from a Python script?

The "freeze" tool in "Tools/freeze/" does what you want. See the README.

This works by scanning your source recursively for import statements (both forms) and looking for the modules on the standard Python path as well as in the source directory (for built-in modules). It then "compiles" the modules written in Python to C code (array initializers that can be turned into code objects using the marshal module) and creates a custom-made config file that only contains those built-in modules which are actually used in the program. It then compiles the generated C code and links it with the rest of the Python interpreter to form a self-contained binary which acts exactly like your script.

Hint: the freeze program only works if your script's filename ends in ".py".


4.29. What WWW tools are there for Python?

See the chapter titled "Internet and WWW" in the Library Reference Manual. There's also a web browser written in Python, called Grail -- see http://grail.cnri.reston.va.us/grail/.


4.30. How do I run a subprocess with pipes connected to both input and output?

Use the standard popen2 module. For example:

	import popen2
	fromchild, tochild = popen2.popen2("command")
	tochild.write("input\n")
	tochild.flush()
	output = fromchild.readline()
Warning: in general, it is unwise to do this, because you can easily cause a deadlock where your process is blocked waiting for output from the child, while the child is blocked waiting for input from you. This can be caused because the parent expects the child to output more text than it does, or it can be caused by data being stuck in stdio buffers due to lack of flushing. The Python parent can of course explicitly flush the data it sends to the child before it reads any output, but if the child is a naive C program it can easily have been written to never explicitly flush its output, even if it is interactive, since flushing is normally automatic.

Note on a bug in popen2: unless your program calls wait() or waitpid(), finished child processes are never removed, and eventually calls to popen2 will fail because of a limit on the number of child processes. Calling os.waitpid with the os.WNOHANG option can prevent this; a good place to insert such a call would be before calling popen2 again.

In many cases, all you really need is to run some data through a command and get the result back. Unless the data is infinite in size, the easiest (and often the most efficient!) way to do this is to write it to a temporary file and run the command with that temporary file as input. The standard module tempfile exports a function mktemp() which generates unique temporary file names.

Note that many interactive programs (e.g. vi) don't work well with pipes substituted for standard input and output. You will have to use pseudo ttys ("ptys") instead of pipes. There is some undocumented code to use these in the library module pty.py -- I'm afraid you're on your own here.

A different answer is a Python interface to Don Libes' "expect" library. A Python extension that interfaces to expect is called "expy" and available from ftp://ftp.python.org/pub/python/contrib/System/.

A pure Python solution that works like expect is PIPE by John Croix. A prerelease of PIPE is available from ftp://ftp.python.org/pub/python/contrib/System/.


4.31. How do I call a function if I have the arguments in a tuple?

Use the built-in function apply(). For instance,

    func(1, 2, 3)
is equivalent to

    args = (1, 2, 3)
    apply(func, args)
Note that func(args) is not the same -- it calls func() with exactly one argument, the tuple args, instead of three arguments, the integers 1, 2 and 3.


4.32. How do I enable font-lock-mode for Python in Emacs?

If you are using XEmacs 19.14 or later, any XEmacs 20, FSF Emacs 19.34 or any Emacs 20, font-lock should work automatically for you if you are using the latest python-mode.el.

If you are using an older version of XEmacs or Emacs you will need to put this in your .emacs file:

        (defun my-python-mode-hook ()
          (setq font-lock-keywords python-font-lock-keywords)
          (font-lock-mode 1))
        (add-hook 'python-mode-hook 'my-python-mode-hook)


4.33. Is there a scanf() or sscanf() equivalent?

Not as such.

For simple input parsing, the easiest approach is usually to split the line into whitespace-delimited words using string.split(), and to convert decimal strings to numeric values using string.atoi(), string.atol() or string.atof(). (Python's atoi() is 32-bit and its atol() is arbitrary precision.) If you want to use another delimiter than whitespace, use string.splitfield() (possibly combining it with string.strip() which removes surrounding whitespace from a string).

For more complicated input parsing, regular expressions (see module regex) are better suited and more powerful than C's sscanf().

There's a contributed module that emulates sscanf(), by Steve Clift; see contrib/Misc/sscanfmodule.c of the ftp site:

    http://www.python.org/ftp/python/contrib/Misc/sscanfmodule.c


4.34. Can I have Tk events handled while waiting for I/O?

Yes, and you don't even need threads! But you'll have to restructure your I/O code a bit. Tk has the equivalent of Xt's XtAddInput() call, which allows you to register a callback function which will be called from the Tk mainloop when I/O is possible on a file descriptor. Here's what you need:

        from Tkinter import tkinter
        tkinter.createfilehandler(file, mask, callback)
The file may be a Python file or socket object (actually, anything with a fileno() method), or an integer file descriptor. The mask is one of the constants tkinter.READABLE or tkinter.WRITABLE. The callback is called as follows:

        callback(file, mask)
You must unregister the callback when you're done, using

        tkinter.deletefilehandler(file)
Note: since you don't know *how many bytes* are available for reading, you can't use the Python file object's read or readline methods, since these will insist on reading a predefined number of bytes. For sockets, the recv() or recvfrom() methods will work fine; for other files, use os.read(file.fileno(), maxbytecount).


4.35. How do I write a function with output parameters (call by reference)?

[Mark Lutz] The thing to remember is that arguments are passed by assignment in Python. Since assignment just creates references to objects, there's no alias between an argument name in the caller and callee, and so no call-by-reference per se. But you can simulate it in a number of ways:

1) By using global variables; but you probably shouldn't :-)

2) By passing a mutable (changeable in-place) object:

      def func1(a):
          a[0] = 'new-value'     # 'a' references a mutable list
          a[1] = a[1] + 1        # changes a shared object
      args = ['old-value', 99]
      func1(args)
      print args[0], args[1]     # output: new-value 100
3) By returning a tuple, holding the final values of arguments:

      def func2(a, b):
          a = 'new-value'        # a and b are local names
          b = b + 1              # assigned to new objects
          return a, b            # return new values
      x, y = 'old-value', 99
      x, y = func2(x, y)
      print x, y                 # output: new-value 100
4) And other ideas that fall-out from Python's object model. For instance, it might be clearer to pass in a mutable dictionary:

      def func3(args):
          args['a'] = 'new-value'     # args is a mutable dictionary
          args['b'] = args['b'] + 1   # change it in-place
      args = {'a':' old-value', 'b': 99}
      func3(args)
      print args['a'], args['b']
5) Or bundle-up values in a class instance:

      class callByRef:
          def __init__(self, **args):
              for (key, value) in args.items():
                  setattr(self, key, value)
      def func4(args):
          args.a = 'new-value'        # args is a mutable callByRef
          args.b = args.b + 1         # change object in-place
      args = callByRef(a='old-value', b=99)
      func4(args)
      print args.a, args.b
   But there's probably no good reason to get this complicated :-).
[Python's author favors solution 3 in most cases.]


4.36. Please explain the rules for local and global variables in Python.

[Ken Manheimer] In Python, procedure variables are implicitly global, unless they assigned anywhere within the block. In that case they are implicitly local, and you need to explicitly declare them as 'global'.

Though a bit surprising at first, a moments consideration explains this. On one hand, requirement of 'global' for assigned vars provides a bar against unintended side-effects. On the other hand, if global were required for all global references, you'd be using global all the time. Eg, you'd have to declare as global every reference to a builtin function, or to a component of an imported module. This clutter would defeat the usefulness of the 'global' declaration for identifying side-effects.


4.37. How can I have modules that mutually import each other?

Jim Roskind recommends the following order in each module:

First: all exports (like globals, functions, and classes that don't need imported base classes).

Then: all import statements.

Finally: all active code (including globals that are initialized from imported values).

Python's author doesn't like this approach much because the imports appear in a strange place, but has to admit that it works. His recommended strategy is to avoid all uses of "from <module> import *" (so everything from an imported module is referenced as <module>.<name>) and to place all code inside functions. Initializations of global variables and class variables should use constants or built-in functions only.


4.38. How do I copy an object in Python?

There is no generic copying operation built into Python, however most object types have some way to create a clone. Here's how for the most common objects:

For immutable objects (numbers, strings, tuples), cloning is unnecessary since their value can't change. For lists (and generally for mutable sequence types), a clone is created by the expression l[:]. For dictionaries, the following function returns a clone:

        def dictclone(o):
            n = {}
            for k in o.keys(): n[k] = o[k]
            return n
Finally, for generic objects, the "copy" module defines two functions for copying objects. copy.copy(x) returns a copy as shown by the above rules. copy.deepcopy(x) also copies the elements of composite objects. See the section on this module in the Library Reference Manual.


4.39. How to implement persistent objects in Python? (Persistent == automatically saved to and restored from disk.)

The library module "pickle" now solves this in a very general way (though you still can't store things like open files, sockets or windows), and the library module "shelve" uses pickle and (g)dbm to create persistent mappings containing arbitrary Python objects. For possibly better performance also look for the latest version of the relatively recent cPickle module.

A more awkward way of doing things is to use pickle's little sister, marshal. The marshal module provides very fast ways to store noncircular basic Python types to files and strings, and back again. Although marshal does not do fancy things like store instances or handle shared references properly, it does run extremely fast. For example loading a half megabyte of data may take less than a third of a second (on some machines). This often beats doing something more complex and general such as using gdbm with pickle/shelve.


4.40. I try to use __spam and I get an error about _SomeClassName__spam.

Variables with double leading underscore are "mangled" to provide a simple but effective way to define class private variables. See the chapter "New in Release 1.4" in the Python Tutorial.


4.41. How do I delete a file? And other file questions.

Use os.remove(filename) or os.unlink(filename); for documentation, see the posix section of the library manual. They are the same, unlink() is simply the Unix name for this function. In earlier versions of Python, only os.unlink() was available.

To remove a directory, use os.rmdir(); use os.mkdir() to create one.

To rename a file, use os.rename().

To truncate a file, open it using f = open(filename, "w+"), and use f.truncate(offset); offset defaults to the current seek position. There's also os.ftruncate(fd, offset) for files opened with os.open() -- for advanced Unix hacks only.


4.42. How to modify urllib or httplib to support HTTP/1.1?

Apply the following patch to the vanilla Python 1.4 httplib.py:

  41c41
  < replypat = regsub.gsub('\\.', '\\\\.', HTTP_VERSION) + \
  ---
  > replypat = regsub.gsub('\\.', '\\\\.', 'HTTP/1.[0-9]+') + \


4.43. Unexplicable syntax errors in compile() or exec.

When a statement suite (as opposed to an expression) is compiled by compile(), exec or execfile(), it must end in a newline. In some cases, when the source ends in an indented block it appears that at least two newlines are required.


4.44. How do I convert a string to a number?

To convert, e.g., the string '144' to the number 144, import the module string and use the string.atoi() function. For floating point numbers, use string.atof(); for long integers, use string.atol(). See the library reference manual section for the string module for more details. While you could use the built-in function eval() instead of any of those, this is not recommended, because someone could pass you a Python expression that might have unwanted side effects (like reformatting your disk).


4.45. How do I convert a number to a string?

To convert, e.g., the number 144 to the string '144', use the built-in function repr() or the backquote notation (these are equivalent). If you want a hexadecimal or octal representation, use the built-in functions hex() or oct(), respectively. For fancy formatting, use the % operator on strings, just like C printf formats, e.g. "%04d" % 144 yields '0144' and "%.3f" % (1/3.0) yields '0.333'. See the library reference manual for details.


4.46. How do I copy a file?

Most of the time this will do:

   infile = open("file.in", "rb")
   outfile = open("file.out", "wb")
   outfile.write(infile.read())
However for huge files you may want to do the reads/writes in pieces (or you may have to), and if you dig deeper you may find other technical problems.

Unfortunately, there's no totally platform independent answer. On Unix, you can use os.system() to invoke the "cp" command (see your Unix manual for how it's invoked). On DOS or Windows, use os.system() to invoke the "COPY" command. On the Mac, use macostools.copy(srcpath, dstpath). It will also copy the resource fork and Finder info.

There's also the shutil module which contains a copyfile() function that implements the copy loop; but in Python 1.4 and earlier it opens files in text mode, and even in Python 1.5 it still isn't good enough for the Macintosh: it doesn't copy the resource fork and Finder info.


4.47. How do I check if an object is an instance of a given class or of a subclass of it?

If you are developing the classes from scratch it might be better to program in a more proper object-oriented style -- instead of doing a different thing based on class membership, why not use a method and define the method differently in different classes?

However, there are some legitimate situations where you need to test for class membership.

In Python 1.5, you can use the built-in function isinstance(obj, cls).

The following approaches can be used with earlier Python versions:

An unobvious method is to raise the object as an exception and to try to catch the exception with the class you're testing for:

	def is_instance_of(the_instance, the_class):
	    try:
		raise the_instance
	    except the_class:
		return 1
	    except:
		return 0
This technique can be used to distinguish "subclassness" from a collection of classes as well

                try:
                              raise the_instance
                except Audible:
                              the_instance.play(largo)
                except Visual:
                              the_instance.display(gaudy)
                except Olfactory:
                              sniff(the_instance)
                except:
                              raise ValueError, "dunno what to do with this!"
This uses the fact that exception catching tests for class or subclass membership.

A different approach is to test for the presence of a class attribute that is presumably unique for the given class. For instance:

	class MyClass:
	    ThisIsMyClass = 1
	    ...
	def is_a_MyClass(the_instance):
	    return hasattr(the_instance, 'ThisIsMyClass')
This version is easier to inline, and probably faster (inlined it is definitely faster). The disadvantage is that someone else could cheat:

	class IntruderClass:
	    ThisIsMyClass = 1    # Masquerade as MyClass
	    ...
but this may be seen as a feature (anyway, there are plenty of other ways to cheat in Python). Another disadvantage is that the class must be prepared for the membership test. If you do not "control the source code" for the class it may not be advisable to modify the class to support testability.


4.48. What is delegation?

Delegation refers to an object oriented technique Python programmers may implement with particular ease. Consider the following:

  from string import upper
  class UpperOut:
        def __init__(self, outfile):
              self.__outfile = outfile
        def write(self, str):
              self.__outfile.write( upper(str) )
        def __getattr__(self, name):
              return getattr(self.__outfile, name)
Here the UpperOut class redefines the write method to convert the argument string to upper case before calling the underlying self.__outfile.write method, but all other methods are delegated to the underlying self.__outfile object. The delegation is accomplished via the "magic" __getattr__ method. Please see the language reference for more information on the use of this method.

Note that for more general cases delegation can get trickier. Particularly when attributes must be set as well as gotten the class must define a __settattr__ method too, and it must do so carefully.

The basic implementation of __setattr__ is roughly equivalent to the following:

   class X:
        ...
        def __setattr__(self, name, value):
             self.__dict__[name] = value
        ...
Most __setattr__ implementations must modify self.__dict__ to store local state for self without causing an infinite recursion.


4.49. How do I test a Python program or component.

First, it helps to write the program so that it may be easily tested by using good modular design. In particular your program should have almost all functionality encapsulated in either functions or class methods -- and this sometimes has the surprising and delightful effect of making the program run faster (because local variable accesses are faster than global accesses). Furthermore the program should avoid depending on mutating global variables, since this makes testing much more difficult to do.

The "global main logic" of your program may be as simple as

  if __name__=="__main__":
       main_logic()
at the bottom of the main module of your program.

Once your program is organized as a tractible collection of functions and class behaviours you should write test functions that exercise the behaviours. A test suite can be associated with each module which automates a sequence of tests. This sounds like a lot of work, but since Python is so terse and flexible it's surprisingly easy. You can make coding much more pleasant and fun by writing your test functions in parallel with the "production code", since this makes it easy to find bugs and even design flaws earlier.

"Support modules" that are not intended to be the main module of a program may include a "test script interpretation" which invokes a self test of the module.

   if __name__ == "__main__":
      self_test()
Even programs that interact with complex external interfaces may be tested when the external interfaces are unavailable by using "fake" interfaces implemented in Python. For an example of a "fake" interface, the following class defines (part of) a "fake" file interface:

 import string
 testdata = "just a random sequence of characters"
 class FakeInputFile:
   data = testdata
   position = 0
   closed = 0
   def read(self, n=None):
       self.testclosed()
       p = self.position
       if n is None:
          result= self.data[p:]
       else:
          result= self.data[p: p+n]
       self.position = p + len(result)
       return result
   def seek(self, n, m=0):
       self.testclosed()
       last = len(self.data)
       p = self.position
       if m==0: 
          final=n
       elif m==1:
          final=n+p
       elif m==2:
          final=len(self.data)+n
       else:
          raise ValueError, "bad m"
       if final<0:
          raise IOError, "negative seek"
       self.position = final
   def isatty(self):
       return 0
   def tell(self):
       return self.position
   def close(self):
       self.closed = 1
   def testclosed(self):
       if self.closed:
          raise IOError, "file closed"
Try f=FakeInputFile() and test out its operations.


4.50. My multidimensional list (array) is broken! What gives?

You probably tried to make a multidimensional array like this.

   A = [[None] * 2] * 3
This makes a list containing 3 references to the same list of length two. Changes to one row will show in all rows, which is probably not what you want. The following works much better:

   A = [None]*3
   for i in range(3):
        A[i] = [None] * 2
This generates a list containing 3 different lists of length two.

If you feel weird, you can also do it in the following way:

   w, h = 2, 3
   A = map(lambda i,w=w: [None] * w, range(h))


4.51. I want to do a complicated sort: can you do a Schwartzian Transform in Python?

Yes, and in Python you only have to write it once:

 def st(List, Metric):
     def pairing(element, M = Metric):
           return (M(element), element)
     paired = map(pairing, List)
     paired.sort()
     return map(stripit, paired)
 def stripit(pair):
     return pair[1]
This technique, attributed to Randal Schwartz, sorts the elements of a list by a metric which maps each element to its "sort value". For example, if L is a list of string then

   import string
   Usorted = st(L, string.upper)
   def intfield(s):
         return string.atoi( string.strip(s[10:15] ) )
   Isorted = st(L, intfield)
Usorted gives the elements of L sorted as if they were upper case, and Isorted gives the elements of L sorted by the integer values that appear in the string slices starting at position 10 and ending at position 15. Note that Isorted may also be computed by

   def Icmp(s1, s2):
         return cmp( intfield(s1), intfield(s2) )
   Isorted = L[:]
   Isorted.sort(Icmp)
but since this method computes intfield many times for each element of L, it is slower than the Schwartzian Transform.


4.52. How to convert between tuples and lists?

The function tuple(seq) converts any sequence into a tuple with the same items in the same order. For example, tuple([1, 2, 3]) yields (1, 2, 3) and tuple('abc') yields ('a', 'b', 'c'). If the argument is a tuple, it does not make a copy but returns the same object, so it is cheap to call tuple() when you aren't sure that an object is already a tuple.

The function list(seq) converts any sequence into a list with the same items in the same order. For example, list([1, 2, 3]) yields [1, 2, 3] and list('abc') yields ['a', 'b', 'c']. If the argument is a list, it makes a copy just like seq[:] would.


4.53. Files retrieved with urllib contain leading garbage that looks like email headers.

The server is using HTTP/1.1; the vanilla httplib in Python 1.4 only recognizes HTTP/1.0. See question 4.42 for a patch.


4.54. How do I get a list of all instances of a given class?

Python does not keep track of all instances of a class (or of a built-in type).

You can program the class's constructor to keep track of all instances, but unless you're very clever, this has the disadvantage that the instances never get deleted,because your list of all instances keeps a reference to them.

(The trick is to regularly inspect the reference counts of the instances you've retained, and if the reference count is below a certain level, remove it from the list. Determining that level is tricky -- it's definitely larger than 1.)


4.55. A regular expression fails with regex.error: match failure.

This is usually caused by too much backtracking; the regular expression engine has a fixed size stack which holds at most 4000 backtrack points. Every character matched by e.g. ".*" accounts for a backtrack point, so even a simple search like

  regex.match('.*x',"x"*5000)
will fail.

This is fixed in Python 1.5; see the string-sig archives for more regular expression news.


4.56. I can't get signal handlers to work.

The most common problem is that the signal handler is declared with the wrong argument list. It is called as

	handler(signum, frame)
so it should be declared with two arguments:

	def handler(signum, frame):
		...


4.57. I can't use a global variable in a function? Help!

Did you do something like this?

   x = 1 # make a global
   def f():
         print x # try to print the global
         ...
         for j in range(100):
              if q>3:
                 x=4
If you did, all references to x in f are local, not global by virtue of the "x=4" assignment. Any variable assigned in a function is local to that function unless it is declared global. Consequently the "print x" attempts to print an uninitialized local variable and will trigger a NameError.


4.58. What's a negative index? Why doesn't list.insert() use them?

Python sequences are indexed with positive numbers and negative numbers. For positive numbers 0 is the first index 1 is the second index and so forth. For negative indices -1 is the last index and -2 is the pentultimate (next to last) index and so forth. Think of seq[-n] as the same as seq[len(seq)-n].

Using negative indices can be very convenient. For example if the string Line ends in a newline then Line[:-1] is all of Line except the newline.

Sadly the list builtin method L.insert does not observe negative indices. This feature could be considered a mistake but since existing programs depend on this feature it may stay around forever. L.insert for negative indices inserts at the start of the list. To get "proper" negative index behaviour use L[n:n] = [x] in place of the insert method.


4.59. How can I sort one list by values from another list?

You can sort lists of tuples.

  >>> list1 = ["what", "I'm", "sorting", "by"]
  >>> list2 = ["something", "else", "to", "sort"]
  >>> pairs = map(None, list1, list2)
  >>> pairs
  [('what', 'something'), ("I'm", 'else'), ('sorting', 'to'), ('by', 'sort')]
  >>> pairs.sort()
  >>> pairs
  [("I'm", 'else'), ('by', 'sort'), ('sorting', 'to'), ('what', 'something')]
  >>> result = pairs[:]
  >>> for i in xrange(len(result)): result[i] = result[i][1]
  ...
  >>> result
  ['else', 'sort', 'to', 'something']
And if you didn't understand the question, please see the example above ;c). Note that "I'm" sorts before "by" because uppercase "I" comes before lowercase "b" in the ascii order. Also see 4.51.


4.60. Why doesn't dir() work on builtin types like files and lists?

It should have -- and it does starting with Python 1.5 (currently in development -- see Questions 1.13 and 2.10).

Using 1.4, you can find out which methods a given object supports by looking at its __methods__ attribute:

    >>> List = []
    >>> List.__methods__
    ['append', 'count', 'index', 'insert', 'remove', 'reverse', 'sort']


4.61. How can I mimic CGI form submission (METHOD=POST)?

I would like to retrieve web pages that are the result of POSTing a form. Is there existing code that would let me do this easily?

Yes. Here's a simple example that uses httplib.

    #!/usr/local/bin/python
    import httplib, sys, time
    ### build the query string
    qs = "First=Josephine&MI=Q&Last=Public"
    ### connect and send the server a path
    httpobj = httplib.HTTP('www.some-server.out-there', 80)
    httpobj.putrequest('POST', '/cgi-bin/some-cgi-script')
    ### now generate the rest of the HTTP headers...
    httpobj.putheader('Accept', */*')
    httpobj.putheader('Connection', 'Keep-Alive')
    httpobj.putheader('Content-type', 'application/x-www-form-urlencoded')
    httpobj.putheader('Content-length', '%d' % len(qs))
    httpobj.endheaders()
    httpobj.send(qs)
    ### find out what the server said in response...
    reply, msg, hdrs = httpobj.getreply()
    if reply != 200:
	sys.stdout.write(httpobj.getfile().read())
Note that in general for "url encoded posts" (the default) query strings must be "quoted" to, for example, change equals signs and spaces to an encoded form when they occur in name or value. Use urllib.quote to perform this quoting. For example to send name="Guy Steele, Jr.":

   >>> from urllib import quote
   >>> x = quote("Guy Steele, Jr.")
   >>> x
   'Guy%20Steele,%20Jr.'
   >>> query_string = "name="+x
   >>> query_string
   'name=Guy%20Steele,%20Jr.'


4.62. If my program crashes with a bsddb (or anydbm) database open, it gets corrupted. How come?

Databases opened for write access with the bsddb module (and often by the anydbm module, since it will preferentially use bsddb) must explcitly be closed using the close method of the database. The underlying libdb package caches database contents which need to be converted to on-disk form and written, unlike regular open files which already have the on-disk bits in the kernel's write buffer, where they can just be dumped by the kernel with the program exits.

If you have initialized a new bsddb database but not written anything to it before the program crashes, you will often wind up with a zero-length file and encounter an exception the next time the file is opened.


4.63. How do I make a Python script executable on Unix?

You need to do two things: the script file's mode must be executable (include the 'x' bit), and the first line must begin with #! followed by the pathname for the Python interpreter.

The first is done by executing 'chmod +x scriptfile' or perhaps 'chmod 755 scriptfile'.

The second can be done in a number of way. The most straightforward way is to write

  #!/usr/local/bin/python
as the very first ine of your file - or whatever the pathname is where the python interpreter is installed on your platform.

If you would like the script to be independent of where the python interpreter lives, you can use the "env" program. On almost all platforms, the following woll work, assuming the python interpreter is in a directory on the user's $PATH:

  #! /usr/bin/env python
Note -- *don't* do this for CGI scripts. The $PATH variable for CGI scripts is often very minimal, so you need to use the actual absolute pathname of the interpreter.

Occasionally, a user's environment is so full that the /usr/bin/env program fails; or there's no env program at all. In that case, you can try the following hack (due to Alex Rezinsky):

  #! /bin/sh
  """:"
  exec python $0 ${1+"$@"}
  """
The disadvantage is that this defines the script's __doc__ string. However, you can fix that by adding

  __doc__ = """...Whatever..."""


4.64. How do you remove duplicates from a list?

Generally, if you don't mind reordering the List

   if List:
      List.sort()
      last = List[-1]
      for i in range(len(List)-2, -1, -1):
          if last==List[i]: del List[i]
          else: last=List[i]
If all elements of the list may be used as dictionary keys (ie, they are all hashable) this is often faster

   d = {}
   for x in List: d[x]=x
   List = d.values()
Also, for extremely large lists you might consider more optimal alternatives to the first one. The second one is pretty good whenever it can be used.


4.65. Are there any known year 2000 problems in Python?

I am not aware of year 2000 deficiencies in Python 1.5. Python does very few date calculations and for what it does, it relies on the C library functions. Python generally represent times either as seconds since 1970 or as a tuple (year, month, day, ...) where the year is expressed with four digits, which makes Y2K bugs unlikely. So as long as your C library is okay, Python should be okay. Of course, I cannot vouch for your Python code!

Given the nature of freely available software, I have to add that this statement is not legally binding. The Python copyright notice contains the following disclaimer:

  STICHTING MATHEMATISCH CENTRUM AND CNRI DISCLAIM ALL WARRANTIES WITH
  REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF
  MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH
  CENTRUM OR CNRI BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
  DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
  PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
  TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  PERFORMANCE OF THIS SOFTWARE.
The good news is that if you encounter a problem, you have full source available to track it down and fix it!


4.66. I want a version of map that applies a method to a sequence of objects! Help!

Get fancy!

  def method_map(objects, method, arguments):
       """method_map([a,b], "flog", (1,2)) gives [a.flog(1,2), b.flog(1,2)]"""
       nobjects = len(objects)
       methods = map(getattr, objects, [method]*nobjects)
       return map(apply, methods, [arguments]*nobjects)
It's generally a good idea to get to know the mysteries of map and apply and getattr and the other dynamic features of Python.


4.67. How do I generate random numbers in Python?

The standard library module "whrandom" implements a random number generator. Usage is simple:

    import whrandom
    whrandom.random()
This returns a random floating point number in the range [0, 1).

There are also other specialized generators in this module:

    randint(a, b) chooses an integer in the range [a, b)
    choice(S) chooses from a given sequence
    uniform(a, b) chooses a floating point number in the range [a, b)
To force the random number generator's initial setting, use

    seed(x, y, z) set the seed from three integers in [1, 256)
There's also a class, whrandom, whoch you can instantiate to create independent multiple random number generators.

The module "random" contains functions that approximate various standard distributions.

All this is documented in the library reference manual. Note that the module "rand" is obsolete.


4.68. How do I access the serial port?

There's a Win95 serial communication module at

  http://www.python.org/ftp/python/contrib/System/siomodule.README
  http://www.python.org/ftp/python/contrib/System/siomodule.zip
For DOS, try Hans Nowak's Python-DX, which supports this, at:

  http://www.cuci.nl/~hnowak/
For Unix, search Deja News (using http://www.python.org/locator/) for "serial port" with author Mitch Chapman (his post is a little too long to include here).


4.69. Images on Tk-Buttons don't work in Py15?

They do work, but you must keep your own reference to the image object now. More verbosely, you must make sure that, say, a global variable or a class attribute refers to the object.

Quoting Fredrik Lundh from the mailinglist:

  Well, the Tk button widget keeps a reference to the internal
  photoimage object, but Tkinter does not.  So when the last
  Python reference goes away, Tkinter tells Tk to release the
  photoimage.  But since the image is in use by a widget, Tk
  doesn't destroy it.  Not completely.  It just blanks the image,
  making it completely transparent...
  And yes, there was a bug in the keyword argument handling
  in 1.4 that kept an extra reference around in some cases.  And
  when Guido fixed that bug in 1.5, he broke quite a few Tkinter
  programs...


4.70. Where is the math.py (socket.py, regex.py, etc.) source file?

If you can't find a source file for a module it may be a builtin or dynamically loaded module implemented in C, C++ or other compiled language. In this case you may not have the source file or it may be something like mathmodule.c, somewhere in a C source directory (not on the Python Path).

Fredrik Lundh (<fredrik@pythonware.com>) explains (on the python-list):

There are (at least) three kinds of modules in Python: 1) modules written in Python (.py); 2) modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc); 3) modules written in C and linked with the interpreter; to get a list of these, type:

    import sys
    print sys.builtin_module_names


4.71. How do I send mail from a Python script?

On Unix, it's very simple, using sendmail. The location of the sendmail program varies between systems; sometimes it is /usr/lib/sendmail, sometime /usr/sbin/sendmail. The sendmail manual page will help you out. Here's some sample code:

  SENDMAIL = "/usr/sbin/sendmail" # sendmail location
  import os
  p = os.popen("%s -t" % SENDMAIL, "w")
  p.write("To: <cary@ratatosk.org>\n")
  p.write("Subject: test\n")
  p.write("\n") # blank line separating headers from body
  p.write("Some text\n")
  p.write("some more text\n")
  sts = p.close()
  if sts != 0:
      print "Sendmail exit status", sts
On non-Unix systems (and on Unix systems too, of course!), you can use SMTP to send mail to a nearby mail server. A library for SMTP (smtplib.py) was posted to comp.lang.python by The Dragon De Monsyne (<dragondm@integral.org>) not long ago; use Deja News to locate it.

Another SMTP library is available from http://starship.skyport.net/crew/gandalf/


4.72. How do I avoid blocking in connect() of a socket?

The select module is widely known to help with asynchronous I/O on sockets once they are connected. However, it is less than common knowledge how to avoid blocking on the initial connect() call. Jeremy Hylton has the following advice (slightly edited):

To prevent the TCP connect from blocking, you can set the socket to non-blocking mode. Then when you do the connect(), you will either connect immediately (unlikely) or get an exception that contains the errno. errno.EINPROGRESS indicates that the connection is in progress, but hasn't finished yet. Different OSes will return different errnos, so you're going to have to check. I can tell you that different versions of Solaris return different errno values.

In Python 1.5 and later, you can use connect_ex() to avoid creating an exception. It will just return the errno value.

To poll, you can call connect_ex() again later -- 0 or errno.EISCONN indicate that you're connected -- or you can pass this socket to select (checking to see if it is writeable).


4.73. How do I specify hexadecimal and octal integers?

To specify an octal digit, precede the octal value with a zero. For example, to set the variable "a" to the octal value "10" (8 in decimal), type:

    >>> a = 010
To verify that this works, you can type "a" and hit enter while in the interpreter, which will cause Python to spit out the current value of "a" in decimal:

    >>> a
    8
Hexadecimal is just as easy. Simply precede the hexadecimal number with a zero, and then a lower or uppercase "x". Hexadecimal digits can be specified in lower or uppercase. For example, in the Python interpreter:

    >>> a = 0xa5
    >>> a
    165
    >>> b = 0XB2
    >>> b
    178


4.74. How to get a single keypress at a time? *

For Windows, see question 8.2. Here is an answer for Unix.

There are several solutions; some involve using curses, which is a pretty big thing to learn. Here's a solution without curses, due to Andrew Kuchling (adapted from code to do a PGP-style randomness pool):

        import termios, TERMIOS, sys, os
        fd = sys.stdin.fileno()
        old = termios.tcgetattr(fd)
        new = termios.tcgetattr(fd)
        new[3] = new[3] & ~TERMIOS.ICANON & ~TERMIOS.ECHO
        new[6][TERMIOS.VMIN] = 1
        new[6][TERMIOS.VTIME] = 0
        termios.tcsetattr(fd, TERMIOS.TCSANOW, new)
        s = ''    # We'll save the characters typed and add them to the pool.
        try:
            while 1:
                c = os.read(fd, 1)
                print "Got character", `c`
                s = s+c
        finally:
            termios.tcsetattr(fd, TERMIOS.TCSAFLUSH, old)
You need the termios module for any of this to work, and I've only tried it on Linux, though it should work elsewhere. It turns off stdin's echoing and disables canonical mode, and then reads a character at a time from stdin, noting the time after each keystroke.


4.75. How can I overload constructors (or methods) in Python?

(This actually applies to all methods, but somehow the question usually comes up first in the context of constructors.)

Where in C++ you'd write

    class C {
        C() { cout << "No arguments\n"; }
        C(int i) { cout << "Argument is " << i << "\n"; }
    }
in Python you have to write a single constructor that catches all cases using default arguments. For example:

    class C:
        def __init__(self, i=None):
            if i is None:
                print "No arguments"
            else:
                print "Argument is", i
This is not entirely equivalent, but close enough in practice.

You could also try a variable-length argument list, e.g.

        def __init__(self, *args):
            ....
The same approach works for all method definitions.


4.76. How do I pass keyword arguments from one method to another?

Use apply. For example:

    class Account:
        def __init__(self, **kw):
            self.accountType = kw.get('accountType')
            self.balance = kw.get('balance')
    class CheckingAccount(Account):
        def __init__(self, **kw):
            kw['accountType'] = 'checking'
            apply(Account.__init__, (self,), kw)
    myAccount = CheckingAccount(balance=100.00)


4.77. What module should I use to help with generating HTML?

Check out HTMLgen written by Robin Friedrich. It's a class library of objects corresponding to all the HTML 3.2 markup tags. It's used when you are writing in Python and wish to synthesize HTML pages for generating a web or for CGI forms, etc.

It can be found in the FTP contrib area on python.org or on the Starship. Use the search engines there to locate the latest version.


5. Extending Python


5.1. Can I create my own functions in C?

Yes, you can create built-in modules containing functions, variables, exceptions and even new types in C. This is explained in the document "Extending and Embedding the Python Interpreter" (the LaTeX file Doc/ext.tex). Also read the chapter on dynamic loading.

There's more information on this in each of the Python books: Programming Python, Internet Programming with Python, and Das Python-Buch (in German).


5.2. Can I create my own functions in C++?

Yes, using the C-compatibility features found in C++. Basically you place extern "C" { ... } around the Python include files and put extern "C" before each function that is going to be called by the Python interpreter. Global or static C++ objects with constructors are probably not a good idea.


5.3. How can I execute arbitrary Python statements from C?

The highest-level function to do this is PyRun_SimpleString() which takes a single string argument which is executed in the context of module __main__ and returns 0 for success and -1 when an exception occurred (including SyntaxError). If you want more control, use PyRun_String(); see the source for PyRun_SimpleString() in Python/pythonrun.c.


5.4. How can I evaluate an arbitrary Python expression from C?

Call the function PyRun_String() from the previous question with the start symbol eval_input (Py_eval_input starting with 1.5a1); it parses an expression, evaluates it and returns its value.


5.5. How do I extract C values from a Python object?

That depends on the object's type. If it's a tuple, PyTupleSize(o) returns its length and PyTuple_GetItem(o, i) returns its i'th item; similar for lists with PyListSize(o) and PyList_GetItem(o, i). For strings, PyString_Size(o) returns its length and PyString_AsString(o) a pointer to its value (note that Python strings may contain null bytes so strlen() is not safe). To test which type an object is, first make sure it isn't NULL, and then use PyString_Check(o), PyTuple_Check(o), PyList_Check(o), etc.

There is also a high-level API to Python objects which is provided by the so-called 'abstract' interface -- read Include/abstract.h for further details. It allows for example interfacing with any kind of Python sequence (e.g. lists and tuples) using calls like PySequence_Length(), PySequence_GetItem(), etc.) as well as many other useful protocols.


5.6. How do I use Py_BuildValue() to create a tuple of arbitrary length?

You can't. Use t = PyTuple_New(n) instead, and fill it with objects using PyTuple_SetItem(t, i, o) -- note that this "eats" a reference count of o. Similar for lists with PyList_New(n) and PyList_SetItem(l, i, o). Note that you must set all the tuple items to some value before you pass the tuple to Python code -- PyTuple_New(n) initializes them to NULL, which isn't a valid Python value.


5.7. How do I call an object's method from C?

Here's a function (untested) that might become part of the next release in some form. It uses <stdarg.h> to allow passing the argument list on to vmkvalue():

        object *call_method(object *inst, char *methodname, char *format, ...)
        {
                object *method;
                object *args;
                object *result;
                va_list va;
                method = getattr(inst, methodname);
                if (method == NULL) return NULL;
                va_start(va, format);
                args = vmkvalue(format, va);
                va_end(va);
                if (args == NULL) {
                        DECREF(method);
                        return NULL;
                }
                result = call_object(method, args);
                DECREF(method);
                DECREF(args);
                return result;
        }
This works for any instance that has methods -- whether built-in or user-defined. You are responsible for eventually DECREF'ing the return value.

To call, e.g., a file object's "seek" method with arguments 10, 0 (assuming the file object pointer is "f"):

        res = call_method(f, "seek", "(OO)", 10, 0);
        if (res == NULL) {
                ... an exception occurred ...
        }
        else {
                DECREF(res);
        }
Note that since call_object() always wants a tuple for the argument list, to call a function without arguments, pass "()" for the format, and to call a function with one argument, surround the argument in parentheses, e.g. "(i)".


5.8. How do I catch the output from PyErr_Print()?

(Due to Mark Hammond):

in Python code, define an object that supports the "write()" method. redirect sys.stdout and sys.stderr to this object. call print_error, or just allow the standard traceback mechanism to work.

Then, the output will go wherever your write() method sends it.


5.9. How do I access a module written in Python from C?

You can get a pointer to the module object as follows:

        module = PyImport_ImportModule("<modulename>");
If the module hasn't been imported yet (i.e. it is not yet present in sys.modules), this initializes the module; otherwise it simply returns the value of sys.modules["<modulename>"]. Note that it doesn't enter the module into any namespace -- it only ensures it has been initialized and is stored in sys.modules.

You can then access the module's attributes (i.e. any name defined in the module) as follows:

        attr = PyObject_GetAttrString(module, "<attrname>");
Calling PyObject_SetAttrString(), to assign to variables in the module, also works.


5.10. How do I interface to C++ objects from Python?

Depending on your requirements, there are many approaches. To do this manually, begin by reading the "Extending and Embedding" document (Doc/ext.tex, see also http://www.python.org/doc/). Realize that for the Python run-time system, there isn't a whole lot of difference between C and C++ -- so the strategy to build a new Python type around a C structure (pointer) type will also work for C++ objects.

A useful automated approach (which also works for C) is SWIG: http://www.cs.utah.edu/~beazley/SWIG/.


5.11. mSQLmodule (or other old module) won't build with Python 1.5 (or later)

Since python-1.4 "Python.h" will have the file includes needed in an extension module. Backward compatibility is dropped after version 1.4 and therefore mSQLmodule.c will not build as "allobjects.h" cannot be found. The following change in mSQLmodule.c is harmless when building it with 1.4 and necessary when doing so for later python versions:

Remove lines:

	#include "allobjects.h"
	#include "modsupport.h"
And insert instead:

	#include "Python.h"
You may also need to add

                #include "rename2.h"
if the module uses "old names".

This may happen with other ancient python modules as well, and the same fix applies.


5.12. I added a module using the Setup file and the make fails! Huh?

Setup must end in a newline, if there is no newline there it gets very sad. Aside from this possibility, maybe you have other non-Python-specific linkage problems.


6. Python's design


6.1. Why isn't there a switch or case statement in Python?

You can do this easily enough with a sequence of if... elif... elif... else. There have been some proposals for switch statement syntax, but there is no consensus (yet) on whether and how to do range tests.


6.2. Why does Python use indentation for grouping of statements?

Basically I believe that using indentation for grouping is extremely elegant and contributes a lot to the clarity of the average Python program. Most people learn to love this feature after a while. Some arguments for it:

Since there are no begin/end brackets there cannot be a disagreement between grouping perceived by the parser and the human reader. I remember long ago seeing a C fragment like this:

        if (x <= y)
                x++;
                y--;
        z++;
and staring a long time at it wondering why y was being decremented even for x > y... (And I wasn't a C newbie then either.)

Since there are no begin/end brackets, Python is much less prone to coding-style conflicts. In C there are loads of different ways to place the braces (including the choice whether to place braces around single statements in certain cases, for consistency). If you're used to reading (and writing) code that uses one style, you will feel at least slightly uneasy when reading (or being required to write) another style. Many coding styles place begin/end brackets on a line by themself. This makes programs considerably longer and wastes valuable screen space, making it harder to get a good overview over a program. Ideally, a function should fit on one basic tty screen (say, 20 lines). 20 lines of Python are worth a LOT more than 20 lines of C. This is not solely due to the lack of begin/end brackets (the lack of declarations also helps, and the powerful operations of course), but it certainly helps!


6.3. Why are Python strings immutable?

There are two advantages. One is performance: knowing that a string is immutable makes it easy to lay it out at construction time -- fixed and unchanging storage requirements. (This is also one of the reasons for the distinction between tuples and lists.) The other is that strings in Python are considered as "elemental" as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string "eight" to anything else. (Adapted from Jim Roskind)


6.4. Why don't strings have methods like index() or sort(), like lists?

Good question. Strings currently don't have methods at all (likewise tuples and numbers). Long ago, it seemed unnecessary to implement any of these functions in C, so a standard library module "string" written in Python was created that performs string related operations. Since then, the cry for performance has moved most of them into the built-in module strop (this is imported by module string, which is still the preferred interface, without loss of performance except during initialization). Some of these functions (e.g. index()) could easily be implemented as string methods instead, but others (e.g. sort()) can't, since their interface prescribes that they modify the object, while strings are immutable (see the previous question).


6.5. Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?

Functions are used for those operations that are generic for a group of types and which should work even for objects that don't have methods at all (e.g. numbers, strings, tuples). Also, implementing len(), max(), min() as a built-in function is actually less code than implementing them as methods for each type. One can quibble about individual cases but it's really too late to change such things fundamentally now.


6.6. Why can't I derive a class from built-in types (e.g. lists or files)?

This is caused by the relatively late addition of (user-defined) classes to the language -- the implementation framework doesn't easily allow it. See the answer to question 4.2 for a work-around. This may be fixed in the (distant) future.


6.7. Why must 'self' be declared and used explicitly in method definitions and calls?

By asking this question you reveal your C++ background. :-) When I added classes, this was (again) the simplest way of implementing methods without too many changes to the interpreter. I borrowed the idea from Modula-3. It turns out to be very useful, for a variety of reasons.

First, it makes it more obvious that you are using a method or instance attribute instead of a local variable. Reading "self.x" or "self.meth()" makes it absolutely clear that an instance variable or method is used even if you don't know the class definition by heart. In C++, you can sort of tell by the lack of a local variable declaration (assuming globals are rare or easily recognizable) -- but in Python, there are no local variable declarations, so you'd have to look up the class definition to be sure.

Second, it means that no special syntax is necessary if you want to explicitly reference or call the method from a particular class. In C++, if you want to use a method from base class that is overridden in a derived class, you have to use the :: operator -- in Python you can write baseclass.methodname(self, <argument list>). This is particularly useful for __init__() methods, and in general in cases where a derived class method wants to extend the base class method of the same name and thus has to call the base class method somehow.

Lastly, for instance variables, it solves a syntactic problem with assignment: since local variables in Python are (by definition!) those variables to which a value assigned in a function body (and that aren't explicitly declared global), there has to be some way to tell the interpreter that an assignment was meant to assign to an instance variable instead of to a local variable, and it should preferably be syntactic (for efficiency reasons). C++ does this through declarations, but Python doesn't have declarations and it would be a pity having to introduce them just for this purpose. Using the explicit "self.var" solves this nicely. Similarly, for using instance variables, having to write "self.var" means that references to unqualified names inside a method don't have to search the instance's directories.


6.8. Can't you emulate threads in the interpreter instead of relying on an OS-specific thread implementation?

Unfortunately, the interpreter pushes at least one C stack frame for each Python stack frame. Also, extensions can call back into Python at almost random moments. Therefore a complete threads implementation requires thread support for C.


6.9. Why can't lambda forms contain statements?

Python lambda forms cannot contain statements because Python's syntactic framework can't handle statements nested inside expressions.

However, in Python, this is not a serious problem. Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you're too lazy to define a function.

Functions are already first class objects in Python, and can be declared in a local scope. Therefore the only advantage of using a lambda form instead of a locally-defined function is that you'll have to invent a name for the function -- but that's just a local variable to which the function object (which is exactly the same type of object that a lambda form yields) is assigned!


6.10. Why don't lambdas have access to variables defined in the containing scope?

Because they are implemented as ordinary functions. See question 4.5 above.


6.11. Why can't recursive functions be defined inside other functions?

See question 4.5 above. But actually recursive functions can be defined in other functions with some trickery.

    def test():
        class factorial:
             def __call__(self, n):
                 if n<=1: return 1
                 return n * self(n-1)
        return factorial()
    fact = test()
The instance created by factorial() above acts like the recursive factorial function.

Mutually recursive functions can be passed to each other as arguments.


6.12. Why is there no more efficient way of iterating over a dictionary than first constructing the list of keys()?

Have you tried it? I bet it's fast enough for your purposes! In most cases such a list takes only a few percent of the space occupied by the dictionary. Apart from the fixed header, the list needs only 4 bytes (the size of a pointer) per key. A dictionary uses 12 bytes per key plus between 30 and 70 percent hash table overhead, plus the space for the keys and values. By necessity, all keys are distinct objects, and a string object (the most common key type) costs at least 20 bytes plus the length of the string. Add to that the values contained in the dictionary, and you see that 4 bytes more per item really isn't that much more memory...

A call to dict.keys() makes one fast scan over the dictionary (internally, the iteration function does exist) copying the pointers to the key objects into a pre-allocated list object of the right size. The iteration time isn't lost (since you'll have to iterate anyway -- unless in the majority of cases your loop terminates very prematurely (which I doubt since you're getting the keys in random order).

I don't expose the dictionary iteration operation to Python programmers because the dictionary shouldn't be modified during the entire iteration -- if it is, there's a small chance that the dictionary is reorganized because the hash table becomes too full, and then the iteration may miss some items and see others twice. Exactly because this only occurs rarely, it would lead to hidden bugs in programs: it's easy never to have it happen during test runs if you only insert or delete a few items per iteration -- but your users will surely hit upon it sooner or later.


6.13. Can Python be compiled to machine code, C or some other language?

Not easily. Python's high level data types, dynamic typing of objects and run-time invocation of the interpreter (using eval() or exec) together mean that a "compiled" Python program would probably consist mostly of calls into the Python run-time system, even for seemingly simple operations like "x+1". Thus, the performance gain would probably be minimal.

Internally, Python source code is always translated into a "virtual machine code" or "byte code" representation before it is interpreted (by the "Python virtual machine" or "bytecode interpreter"). In order to avoid the overhead of parsing and translating modules that rarely change over and over again, this byte code is written on a file whose name ends in ".pyc" whenever a module is parsed (from a file whose name ends in ".py"). When the corresponding .py file is changed, it is parsed and translated again and the .pyc file is rewritten. There is no performance difference once the .pyc file has been loaded (the bytecode read from the .pyc file is exactly the same as the bytecode created by direct translation). The only difference is that loading code from a .pyc file is faster than parsing and translating a .py file, so the presence of precompiled .pyc files will generally improve start-up time of Python scripts. If desired, the Lib/compileall.py module/script can be used to force creation of valid .pyc files for a given set of modules.

If you are looking for a way to translate Python programs in order to distribute them in binary form, without the need to distribute the interpreter and library as well, have a look at the freeze.py script in the Tools/freeze directory. This creates a single binary file incorporating your program, the Python interpreter, and those parts of the Python library that are needed by your program. Of course, the resulting binary will only run on the same type of platform as that used to create it.

Hints for proper usage of freeze.py:

the script must be in a file whose name ends in .py you must have installed Python fully:

With Python 1.3, this meant:

        make install
        make libinstall
        make inclinstall
        make libainstall
With Python 1.4 or later, this just means:
        make install


6.14. How does Python manage memory? Why not full garbage collection?

Python uses reference counting memory management. This means that when an object is no longer in use Python frees the object automatically, with a few exceptions.

1) if the object lies on a circular reference path it won't be freed unless the circularities are broken. EG:

       List = [None]
       List[0] = List
List will not be freed unless the circularity (List[0] is List) is broken. The reason List will not be freed is because although it may become inaccessible the list contains a reference to itself, and reference counting only deallocates an object when all references to an object are destroyed. To break the circular reference path we must destroy the reference, as in

       List[0] = None
So, if your program creates circular references (and if it is long running and/or consumes lots of memory) it may have to do some explicit management of circular structures. In many application domains this is needed rarely, if ever.

2) Sometimes objects get stuck in "tracebacks" temporarily and hence are not deallocated when you might expect. Clear the tracebacks via

       import sys
       sys.exc_traceback = sys.last_traceback = None
Tracebacks are used for reporting errors and implementing debuggers and related things. They contain a portion of the program state extracted during the handling of an exception (usually the most recent exception).

In the absence of circularities and modulo tracebacks, Python programs need not explicitly manage memory.

It is often suggested that Python could benefit from fully general garbage collection. It's looking less and less likely that Python will ever get "automatic" garbage collection (GC). For one thing, unless this were added to C as a standard feature, it's a portability pain in the ass. And yes, I know about the Xerox library. It has bits of assembler code for most common platforms. Not for all. And although it is mostly transparent, it isn't completely transparent (when I once linked Python with it, it dumped core).

"Proper" GC also becomes a problem when Python gets embedded into other applications. While in a stand-alone Python it may be fine to replace the standard malloc() and free() with versions provided by the GC library, an application embedding Python may want to have its own substitute for malloc() and free(), and may not want Python's. Right now, Python works with anything that implements malloc() and free() properly.

Besides, the predictability of destructor calls in Python is kind of attractive. With GC, the following code (which is fine in current Python) will run out of file descriptors long before it runs out of memory:

        for file in <very long list of files>:
                f = open(file)
                c = file.read(1)
Using the current reference counting and destructor scheme, each new assignment to f closes the previous file. Using GC, this is not guaranteed. Sure, you can think of ways to fix this. But it's not off-the-shelf technology.

All that said, somebody has managed to add GC to Python using the GC library fromn Xerox, so you can see for yourself. See

	http://starship.skyport.net/crew/gandalf/gc-ss.html
See also question 4.17 for ways to plug some common memory leaks manually.


6.15. Why are there separate tuple and list data types?

This is done so that tuples can be immutable while lists are mutable.

Immutable tuples are useful in situations where you need to pass a few items to a function and don't want the function to modify the tuple; for example,

	point1 = (120, 140)
	point2 = (200, 300)
	record(point1, point2)
	draw(point1, point2)
You don't want to have to think about what would happen if record() changed the coordinates -- it can't, because the tuples are immutable.

On the other hand, when creating large lists dynamically, it is absolutely crucial that they are mutable -- adding elements to a tuple one by one requires using the concatenation operator, which makes it quadratic in time.

As a general guideline, use tuples like you would use structs in C or records in Pascal, use lists like (variable length) arrays.


6.16. How are lists implemented?

Despite what a Lisper might think, Python's lists are really variable-length arrays. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array (as well as its length) in a list head structure.

This makes indexing a list (a[i]) an operation whose cost is independent of the size of the list or the value of the index.

When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don't require an actual resize.


6.17. How are dictionaries implemented?

Python's dictionaries are implemented as resizable hash tables.

Compared to B-trees, this gives better performance for lookup (the most common operation by far) under most circumstances, and the implementation is simpler.


6.18. Why must dictionary keys be immutable?

The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could change. But since whoever changes the key object can't tell that is incorporated in a dictionary, it can't move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary, it won't be found, since its hash value is different; and if you try to look up the old value, it won't be found either, since the value of the object found in that hash bin differs.

If you think you need to have a dictionary indexed with a list, try to use a tuple instead. The function tuple(l) creates a tuple with the same entries as the list l.

Some unacceptable solutions that have been proposed:

- Hash lists by their address (object ID). This doesn't work because if you construct a new list with the same value it won't be found; e.g.,

  d = {[1,2]: '12'}
  print d[[1,2]]
will raise a KeyError exception because the id of the [1,2] used in the second line differs from that in the first line. In other words, dictionary keys should be compared using '==', not using 'is'.

- Make a copy when using a list as a key. This doesn't work because the list (being a mutable object) could contain a reference to itself, and then the copying code would run into an infinite loop.

- Allow lists as keys but tell the user not to modify them. This would allow a class of hard-to-track bugs in programs that I'd rather not see; it invalidates an important invariant of dictionaries (every value in d.keys() is usable as a key of the dictionary).

- Mark lists as read-only once they are used as a dictionary key. The problem is that it's not just the top-level object that could change its value; you could use a tuple containing a list as a key. Entering anything as a key into a dictionary would require marking all objects reachable from there as read-only -- and again, self-referential objects could cause an infinite loop again (and again and again).

There is a trick to get around this if you need to, but use it at your own risk: You can wrap a mutable structure inside a class instance which has both a __cmp__ and a __hash__ method.

   class listwrapper:
        def __init__(self, the_list):
              self.the_list = the_list
        def __cmp__(self, other):
              return self.the_list == other.the_list
        def __hash__(self):
              l = self.the_list
              result = 98767 - len(l)*555
              for i in range(len(l)):
                   try:
                        result = result + (hash(l[i]) % 9999999) * 1001 + i
                   except:
                        result = (result % 7777777) + i * 333
              return result
Note that the hash computation is complicated by the possibility that some members of the list may be unhashable and also by the possibility of arithmetic overflow.

You must make sure that the hash value for all such wrapper objects that reside in a dictionary (or other hash based structure), remain fixed while the object is in the dictionary (or other structure).

Furthermore it must always be the case that if o1 == o2 (ie o1.__cmp__(o2)==0) then hash(o1)==hash(o2) (ie, o1.__hash__() == o2.__hash__()), regardless of whether the object is in a dictionary or not. If you fail to meet these restrictions dictionaries and other hash based structures may misbehave!

In the case of listwrapper above whenever the wrapper object is in a dictionary the wrapped list must not change to avoid anomalies. Don't do this unless you are prepared to think hard about the requirements and the consequences of not meeting them correctly. You've been warned!


6.19. How the heck do you make an array in Python?

["this", 1, "is", "an", "array"]

Lists are arrays in the C or Pascal sense of the word (see question 6.16). The array module also provides methods for creating arrays of fixed types with compact representations (but they are slower to index than lists). Also note that the Numerics extensions and others define array-like structures with various characteristics as well.

To get Lisp-like lists, emulate cons cells

    lisp_list = ("like",  ("this",  ("example", None) ) )
using tuples (or lists, if you want mutability). Here the analogue of lisp car is lisp_list[0] and the analogue of cdr is lisp_list[1]. Only do this if you're sure you really need to (it's usually a lot slower than using Python lists).

Think of Python lists as mutable heterogeneous arrays of Python objects (say that 10 times fast :) ).


6.20. Why doesn't list.sort() return the sorted list?

In situations where performance matters, making a copy of the list just to sort it would be wasteful. Therefore, list.sort() sorts the list in place. In order to remind you of that fact, it does not return the sorted list. This way, you won't be fooled into accidentally overwriting a list when you need a sorted copy but also need to keep the unsorted version around.

As a result, here's the idiom to iterate over the keys of a dictionary in sorted orted:

	keys = dict.keys()
	keys.sort()
	for key in keys:
		...do whatever with dict[key]...


6.21. How do you specify and enforce an interface spec in Python?

An interfaces specification for a module as provided by languages such as C++ and java describes the prototypes for the methods and functions of the module. Many feel that compile time enforcement of interface specifications help aid in the construction of large programs. Python does not support interface specifications directly, but many of their advantages can be obtained by an appropriate test discipline for components, which can often be very easily accomplished in Python.

A good test suite for a module can at once provide a regression test and serve as a module interface specification (even better since it also gives example usage). Look to many of the standard libraries which often have a "script interpretation" which provides a simple "self test." Even modules which use complex external interfaces can often be tested in isolation using trivial "stub" emulations of the external interface.

An appropriate testing discipline (if enforced) can help build large complex applications in Python as well as having interface specifications would do (or better). Of course Python allows you to get sloppy and not do it. Also you might want to design your code with an eye to make it easily tested.


6.22. Why do all classes have the same type? Why do instances all have the same type?

The Pythonic use of the word "type" is quite different from common usage in much of the rest of the programming language world. A "type" in Python is a description for an object's operations as implemented in C. All classes have the same operations implemented in C which sometimes "call back" to differing program fragments implemented in Python, and hence all classes have the same type. Similarly at the C level all class instances have the same C implementation, and hence all instances have the same type.

Remember that in Python usage "type" refers to a C implementation of an object. To distinguish among instances of different classes use Instance.__class__, and also look to 4.47. Sorry for the terminological confusion, but at this point in Python's development nothing can be done!


6.23. Why isn't all memory freed when Python exits?

Objects referenced from Python module global name spaces are not deallocated when Python exits. There are at least 2 reasons for this.

(1) To make Python exit more quickly. If Python is exitting the operating system is about to free the entire address space anyway, so why bother? :)

(2) So the interpreter doesn't do finalizations in the wrong order. For example, if an object in a global name space has a __del__ method, the __del__ method may require the existence of other objects in global namespaces and if these objects have been deleted already the __del__ method may fail. In this situation there is no reasonable way for the interpreter to guess the safe order for deallocations. Instead Python relies on the programmer to explicitly perform any needed deletions in the proper (application specific) order.

If you want to force Python to delete certain things on deallocation use the sys.exitfunc hook to force those deletions. For example if you are debugging an extension module using a memory analysis tool and you wish to make Python deallocate almost everything you might use an exitfunc like this one:

  import sys
  def my_exitfunc():
       print "cleaning up"
       import sys
       # do order dependant deletions here
       ...
       # now delete everything else in arbitrary order
       for x in sys.modules.values():
            d = x.__dict__
            for name in d.keys():
                 del d[name]
  sys.exitfunc = my_exitfunc
Other exitfuncs can be less drastic, of course.


6.24. Why no class methods or mutable class variables?

The notation

    instance.attribute(arg1, arg2)
usually translates to the equivalent of

    Class.attribute(instance, arg1, arg2)
where Class is a (super)class of instance. Similarly

    instance.attribute = value
sets an attribute of an instance (overriding any attribute of a class that instance inherits).

Sometimes programmers want to have different behaviours -- they want a method which does not bind to the instance and a class attribute which changes in place. Python does not preclude these behaviours, but you have to adopt a convention to implement them. One way to accomplish this is to use "list wrappers" and global functions.

   def C_hello():
         print "hello"
   class C:
        hello = [C_hello]
        counter = [0]
    I = C()
Here I.hello[0]() acts very much like a "class method" and I.counter[0] = 2 alters C.counter (and doesn't override it). If you don't understand why you'd ever want to do this, that's because you are pure of mind, and you probably never will want to do it! This is dangerous trickery, not recommended when avoidable. (Inspired by Tim Peter's discussion.)


6.25. Why are default values sometimes shared between objects?

It is often expected that a function CALL creates new objects for default values. This is not what happens. Default values are created when the function is DEFINED, that is, there is only one such object that all functions refer to. If that object is changed, subsequent calls to the function will refer to this changed object. By definition, immutable objects (like numbers, strings, tuples, None) are safe from change. Changes to mutable objects (like dictionaries, lists, class instances) is what causes the confusion.

Because of this feature it is good programming practice not to use mutable objects as default values, but to introduce them in the function. Don't write:

	def foo(dict={}):  # XXX shared reference to one dict for all calls
	    ...
but:
	def foo(dict=None):
		if dict is None:
			dict = {} # create a new dict for local namespace
See page 182 of "Internet Programming with Python" for one discussion of this feature. Or see the top of page 144 or bottom of page 277 in "Programming Python" for another discussion.


6.26. Why no goto?

Actually, you can use exceptions to provide a "structured goto" that even works across function calls. Many feel that exceptions can conveniently emulate all reasonable uses of the "go" or "goto" constructs of C, Fortran, and other languages. For example:

   class label: pass # declare a label
   try:
        ...
        if (condition): raise label() # goto label
        ...
   except label: # where to goto
        pass
   ...
This doesn't allow you to jump into the middle of a loop, but that's usually considered an abuse of goto anyway. Use sparingly.


6.27. How do you make a higher order function in Python?

You have two choices: you can use default arguments and override them or you can use "callable objects." For example suppose you wanted to define linear(a,b) which returns a function f where f(x) computes the value a*x+b. Using default arguments:

     def linear(a,b):
         def result(x, a=a, b=b):
             return a*x + b
         return result
Or using callable objects:

     class linear:
        def __init__(self, a, b):
            self.a, self.b = a,b
        def __call__(self, x):
            return self.a * x + self.b
In both cases:

     taxes = linear(0.3,2)
gives a callable object where taxes(10e6) == 0.3 * 10e6 + 2.

The defaults strategy has the disadvantage that the default arguments could be accidentally or maliciously overridden. The callable objects approach has the disadvantage that it is a bit slower and a bit longer. Note however that a collection of callables can share their signature via inheritance. EG

      class exponential(linear):
         # __init__ inherited
         def __call__(self, x):
             return self.a * (x ** self.b)


7. Using Python on non-UNIX platforms


7.1. Is there a Mac version of Python?

Yes, see the "mac" subdirectory of the distribution sites, e.g. ftp://ftp.python.org/pub/python/mac/.


7.2. Are there DOS and Windows versions of Python?

Yes. There is a plethora of not-always-compatible versions. See the "pythonwin", "wpy", "nt" and "pc" subdirectories of the distribution sites. All of these versions are (or soon will be) available from http://www.python.org/windows/

A quick comparison:

PythonWin: Extensive support for the 32-bit native Windows API and GUI building using MFC. Windows NT and Windows 95 only. http://www.python.org/windows/

WPY: Ports to DOS, Windows 3.1(1), Windows 95, Windows NT and OS/2. Also contains a GUI package that offers portability between Windows (not DOS) and Unix, and native look and feel on both. ftp://ftp.python.org/pub/python/wpy/.

NT: Basic ports built straight from the 1.4 distribution for Windows 95 and Windows NT. This will eventually provide core support for both PythonWin and WPY on all 32-bit Microsoft platforms. ftp://ftp.python.org/pub/python/nt/. A build including Tk and PIL can be found via http://starship.skyport.net/crew/fredrik/py14.

PC: Old, unsupported ports to DOS, Windows 3.1(1) and OS/2. ftp://ftp.python.org/pub/python/pc/.


7.3. Is there an OS/2 version of Python?

Yes, see the "pc" and "wpy" subdirectory of the distribution sites (see above).


7.4. Is there a VMS version of Python?

Yes, there is a port of Python 1.4 to OpenVMS and a few ports of 1.2 to VMS. See ftp://ftp.python.org/pub/python/contrib/Porting/vms/.


7.5. What about IBM mainframes, or other non-UNIX platforms?

I haven't heard about these, except I remember hearing about an OS/9 port and a port to Vxworks (both operating systems for embedded systems). If you're interested in any of this, go directly to the newsgroup and ask there, you may find exactly what you need. For example, a port to MPE/iX 5.0 on HP3000 computers was just announced, see http://www.allegro.com/software/.


7.6. Where are the source or Makefiles for the non-UNIX versions?

The standard sources can (almost) be used. Additional sources can be found in the platform-specific subdirectories of the distribution.


7.7. What is the status and support for the non-UNIX versions?

I don't have access to most of these platforms, so in general I am dependent on material submitted by volunteers. However I strive to integrate all changes needed to get it to compile on a particular platform back into the standard sources, so porting of the next version to the various non-UNIX platforms should be easy.


7.8. I have a PC version but it appears to be only a binary. Where's the library?

If you are running any version of Windows, then you have the wrong distribution. The FAQ lists current Windows versions. Notably, Pythonwin and wpy provide fully functional installations.

But if you are sure you have the only distribution with a hope of working on your system, then...

You still need to copy the files from the distribution directory "python/Lib" to your system. If you don't have the full distribution, you can get the file lib<version>.tar.gz from most ftp sites carrying Python; this is a subset of the distribution containing just those files, e.g. ftp://ftp.python.org/pub/python/src/lib1.4.tar.gz.

Once you have installed the library, you need to point sys.path to it. Assuming the library is in C:\misc\python\lib, the following commands will point your Python interpreter to it (note the doubled backslashes -- you can also use single forward slashes instead):

        >>> import sys
        >>> sys.path.insert(0, 'C:\\misc\\python\\lib')
        >>>
For a more permanent effect, set the environment variable PYTHONPATH, as follows (talking to a DOS prompt):

        C> SET PYTHONPATH=C:\misc\python\lib


7.9. Where's the documentation for the Mac or PC version?

The documentation for the Unix version also applies to the Mac and PC versions. Where applicable, differences are indicated in the text.


7.10. How do I create a Python program file on the Mac or PC?

Use an external editor. On the Mac, BBEdit seems to be a popular no-frills text editor. I work like this: start the interpreter; edit a module file using BBedit; import and test it in the interpreter; edit again in BBedit; then use the built-in function reload() to re-read the imported module; etc. In the 1.4 distribution you will find a BBEdit extension that makes life a little easier: it can tell the interpreter to execute the current window. See :Mac:Tools:BBPy:README.

Regarding the same question for the PC, Kurt Wm. Hemr writes: "While anyone with a pulse could certainly figure out how to do the same on MS-Windows, I would recommend the NotGNU Emacs clone for MS-Windows. Not only can you easily resave and "reload()" from Python after making changes, but since WinNot auto-copies to the clipboard any text you select, you can simply select the entire procedure (function) which you changed in WinNot, switch to QWPython, and shift-ins to reenter the changed program unit."

If you're using Windows95 or Windows NT, you should also know about PythonWin, which provides a GUI framework, with an mouse-driven editor, an object browser, and a GUI-based debugger. See

       http://www.python.org/ftp/python/pythonwin/
for details.


7.11. How can I use Tkinter on Windows 95/NT?

PythonWin does not come with Tkinter support. (You need the _tkinter extension which is not provided.) There is Tkinter support for Win95/NT in the Python Imaging Library (PIL) though. A simple step-by-step guide to installing Python and Tkinter on Win95 can be found here (should work on NT as well):

  http://www.netaxs.com/~mryan/python/install_win95.html
(There is a link to this page in the "Python Software" page on the Python web site at http://www.python.org/python/; search for "binaries".)

If you feel a bit more brave, you can also try out the "Python 1.4 for Win32" kit, available from:

  http://starship.skyport.net/crew/fredrik/py14
It includes all you need in a single package, but may include alpha and beta versions of some components. Use at your own risk.


7.12. cgi.py (or other CGI programming) doesn't work sometimes on NT or win95!

Be sure you have the latest python.exe, that you are using python.exe rather than a GUI version of python and that you have configured the server to execute

     "...\python.exe -u ..."
for the cgi execution. The -u (unbuffered) option on NT and win95 prevents the interpreter from altering newlines in the standard input and output. Without it post/multipart requests will seem to have the wrong length and binary (eg, GIF) responses may get garbled (resulting in, eg, a "broken image").


7.13. Why doesn't os.popen() work in PythonWin on NT?

The reason that os.popen() doesn't work from within PythonWin is due to a bug in Microsoft's C Runtime Library (CRT). The CRT assumes you have a Win32 console attached to the process.

You should use the win32pipe module's popen() instead which doesn't depend on having an attached Win32 console.

Example:

 import win32pipe
 f = win32pipe.popen('dir /c c:\\')
 print f.readlines()
 f.close()


7.14. How do I use different functionality on different platforms with the same program?

Remember that Python is extremely dynamic and that you can use this dynamism to configure a program at run-time to use available functionality on different platforms. For example you can test the sys.platform and import different modules based on its value.

   import sys
   if sys.platform == "win32":
      import win32pipe
      popen = win32pipe.popen
   else:
      import os
      popen = os.popen
(See FAQ 7.13 for an explanation of why you might want to do something like this.) Also you can try to import a module and use a fallback if the import fails:

    try:
         import really_fast_implementation
         choice = really_fast_implementation
    except ImportError:
         import slower_implementation
         choice = slower_implementation


7.15. Is there an Amiga version of Python?

Yes. See the AmigaPython homepage at http://www.geocities.com/ResearchTriangle/Lab/3172/python.html.


8. Python on Windows


8.1. Using Python for CGI on Microsoft Windows

Setting up the Microsoft IIS Server/Peer Server:

On the Microsoft IIS server or on the Win95 MS Personal Web Server you set up python in the same way that you would set up any other scripting engine.

Run regedt32 and go to:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W3SVC\Parameters\ScriptMap

and enter the following line (making any specific changes that your system may need)

.py :REG_SZ: c:\<path to python>\python.exe -u %s %s

This line will allow you to call your script with a simple reference like: http://yourserver/scripts/yourscript.py provided "scripts" is an "executable" directory for your server (which it usually is by default). The "-u" flag specifies unbuffered and binary mode for stdin - needed when working with binary data

In addition, it is recommended by people who would know that using ".py" may not be a good idea for the file extensions when used in this context (you might want to reserve *.py for support modules and use *.cgi or *.cgp for "main program" scripts). However, that issue is beyond this Windows FAQ entry.

Netscape Servers: Information on this topic exists at: http://home.netscape.com/comprod/server_central/support/fasttrack_man/programs.htm#1010870


8.2. How to check for a keypress without blocking?

Use the msvcrt module. This is a standard Windows-specific extensions in Python 1.5 and beyond. It defines a function kbhit() which checks whether a keyboard hit is present; also getch() which gets one character without echo. Plus a few other goodies.

(Search for "keypress" to find an answer for Unix as well.)