codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to limit *length* of PrettyPrinter


Let me preface this reply with the concern that my level of competence, 
in this area, is insufficient. However, there are a number of folk 
'here' who are 'into' Python's internals, and will (hopefully) jump-in...


Also, whilst we appear to be concentrating on understanding the content 
of a data-structure, have we adequately defined "length"?
(per msg title)

- total number of o/p lines (per paper.ref)
- total number of characters 'printed'
- total number of elements l-r (of any/all embedded data-structures)
- the number of elements in each embedded data-structure
- the number of characters displayed from each embedded d-s
- depth of data-structure t-d
- something else?


On 25/07/2020 10:52, Stavros Macrakis wrote:
> dn, Thanks again.
> 
> For background, I come from C and Lisp hacking (one of the MIT 
> developers of Macsyma <https://en.wikipedia.org/wiki/Macsyma>/Maxima 
> <https://sourceforge.net/p/maxima/wiki/Home/>) and also play with 
> R,?though I haven't been a professional developer for many years. I know 
> better than to Reply to a Digest -- sorry about that, I was just being 
> sloppy.

Us 'silver-surfers' have to stick-together! Also in the seventies I 
decided Lisp was not for me...


> The reason I wanted print-length limitation was that I wanted to get an 
> overview of an object I'd created, which contains some very long lists. 
> I expected that this was standard functionality?that I simply couldn't 
> find in the docs.
> 
> I'm familiar?with writing?pretty-printer ("grind") functions with string 
> output (from way back: see section II.I, p. 12 
> <http://bitsavers.trailing-edge.com/pdf/mit/ai/aim/AIM-279.pdf>), but 
> I'm not at all familiar with Python's type/class system, which is why 
> I'm trying to understand it by playing with it.

I accept, one might say 'on faith', that in Python "everything is an 
object", and proceed from there. Sorry!

Similarly, I've merely accepted the limitations of pprint() - and 
probably use it less-and-less, as I become more-and-more oriented 
towards TDD...


> I did try looking at the Python Standard Library docs, but I don't see 
> where it mentions the superclasses of the numerics or of the collection 
> types or the equivalent of *numberp*. If I use *type(4).__bases__*, I 
> get just*(<class 'object'>,)*, which isn't very helpful. I suspect that 
> that isn't the correct way of finding a class's superclasses -- what is?

If you haven't already, try:
- The Python Language Reference Manual (see Python docs)
	in particular "Data Model"
- PSL: Data Types = "types -- Dynamic type creation and names for 
built-in types"
- PSL: collections
- PSL: collections.abc

Another source of 'useful background' are PEPs (Python Enhancement 
Proposals). Note that some have been accepted and are part of the 
current-language - so the "proposal" part has become an historic record. 
In comparison: some have been rejected, and others are still 
under-discussion...

- PEP 0: an index
- PEP 3119 -- Introducing Abstract Base Classes
- PEP 3141 -- A Type Hierarchy for Numbers
- and no-doubt many more, which will keep you happily entertained, and 
save the members of your local flock of sheep from thinking that to you 
they are a mere number...


> BTW, where do I look to understand the difference between *dir(type(4)) 
> *(which does not include *__bases__*) and *type(4).__dir__(type(4)) 
> *(which does)? According to Martelli (2017, p. 127), *dir(*x*)*?just 
> calls /*x*./*__dir__()*; but *type(4).__dir__() *=> ERR for me. Has this 
> changed since 3.5, or is Martelli just wrong?

I don't know - and I'm not about to question Alex!


> There's nothing?else obvious in dir(0) or in dir(type(0)). After some 
> looking around, I find that the base classes are not built-in, but need 
> to be added with the *numbers*?and *collections.abc *modules? That's a 
> surprise!

Yes, to me there is much mystery in this (hence "faith", earlier).

Everything is a sub-class of object. When I need to differentiate, eg 
between a list and a dict; I either resort to isinstance() or back to 
the helpful table/taxonomy in collections.abc and hasattr() - thus a 
tuple is a Collection and a Sequence, but not a MutableSequence like a 
list. A set looks like a list until it comes to duplicate values or 
behaving as a Sequence. A dict is a MutableMapping, but as you say 
(below), when considered a Collection will only behave as a list of 
keys. So, we then chase the *View-s...


> You suggested I try *pp.__builtins__.__dict__()* . I couldn't figure out 
> what you meant by *pp* here (the module name *pprint*? the class 
> *pprint.PrettyPrint*? the configured function 
> *pprint.PrettyPrinter(width=20,indent=3).pprint*? none worked...). I 
> finally figured out that you must have meant something like 
> *pp=pprint.PrettyPrinter(width=80).print; pp(__builtins__.__dict__)*. 
> Still not sure which attributes could be useful.

With apologies: "pp" is indeed pprint. The code-example should have been 
prefaced with:

     from pprint import pprint as pp

This is (?my) short-hand, whenever I'm using pprint within a module. 
(you will find our number-crunching friends referring to "np" rather 
than the full: "numpy", and similar...


>     With bottom-up prototyping it is wise to start with the 'standard'
>     cases! (and to 'extend' one-bite at a time)
> Agreed! I started with lists, but couldn't figure out how to extend that 
> to tuples and sets.? I was thinking I could build a list then convert it 
> to a tuple or set. The core recursive step looks something like this:
> 
>  ? ? ? CONVERT( map( lambda i: limit(i, length, depth-1) , obj[0:length] 
> )?+ ( [] if len(obj) <= length else ['...'] ) )
> 
> ... since map returns an iterator, not a collection of the same type as 
> its input -- so how do I convert to the right result type (CONVERT)?
> 
> After discovering that typ.__new__(typ,obj) doesn't work for mutables, 
> and thrashing for a while, I tried this:
> 
>  ? ? ? def convert(typ,obj):
>  ? ? ? ? ? ?newobj = typ.__new__(typ,obj)
>  ? ? ? ? ? ?newobj.__init__(obj)
>  ? ? ? ? ? ?return newobj
> 
> which is pretty ugly, because the *__new__* initializer is magically 
> ignored for a mutable (with no exception) and the *__init__* setter is 
> magically ignored for an immutable (with no exception). But it seems to 
> work....
> 
> Now, on to dictionaries! Bizarrely, the *list()*?of a dictionary, and 
> its iterator, return only the keys, not the key-value pairs. No problem! 
> We'll create yet another special case, and use */set/.items()*?(which 
> for some reason doesn't exist for other collection types). And /mirabile 
> dictu/, *convert *works correctly for that!:
> 
>     dicttype = type({'a':1})
>     test = {'a':1,'b':2}
>     convert(dicttype,test.items()) =>?{'a':1,'b':2}
> 
> So we're almost done. Now all we have to do is slice the result to the 
> desired length:
> 
>     convert(dicttype,test.items()[0:1])? ? ? # ERR
> 
> 
> But */dict./items() *is not sliceable. However, it /is/?iterable... but 
> we need another count variable (or is there a better way?):
> 
> 
>     c = 0
>     convert(dicttype, [ i for i in test.items() if (c:=c+1)<2 ]) 
> 
> 
> Phew! That was a lot of work, and I'm left with a bunch of special 
> cases, but it works. Now I need to understand from a Python guru what 
> the Pythonic way of doing this is which /doesn't/?require all this ugliness.
> 
> (This doesn't really work for the original problem, because there's no 
> way of putting "..." at the end of a dictionary object, but I still 
> think I learned something about Python.)
> 
> I did take a look at the pprint source code, and could no doubt modify 
> it to handle print-length, but at this point, I'm still trying to 
> understand how Python code can be written generically. So I was 
> disappointed to see that *_print_list, _print_tuple, *and*_print_set 
> *are not written generically, but as three separate functions. I also 
> wonder what the '({' case is supposed to cover.
> 
> A lot of questions -- probably based on a lot of misunderstandings!

...and any response severely limited by my competence in these topics, 
eg my assumption that the data would be presented with different 
brackets, ie the "({", according to type.

Recently, I offered a "Friday Finking" to the list, relating a Junior 
Programmer's wrestling with the challenge of expanding an existing API 
from scalar values to a choice between scalars and a tuple. (or was it a 
list - or does it really matter?) There seemed to be no suggestion 
beyond isinstance().

In this case, there will be a 'ladder' of if...elif...else clauses, and 
quite possibly needed in two places - parsing and printing. (The 
ref.paper talked of two passes, so...)
PS there is talk of a case/switch which will handle class-distinction, 
but alas, not currently available (PEP 622?Python 3.10).

Is the challenge one of attempting to retain and represent the values 
within the data-structure? There is an implicit issue here, that a 
first-approach may be essentially replication (ie storage-expensive).

Nevertheless, proceeding in this fashion, remember that a Python list is 
not the "array" of other languages! Content-elements need not be 
homogeneous. So, an 'accumulator' list could contain a dict, a set, 
sundry scalars, and/or inner-lists, in perfect happiness.

Through zip(), Python has a very handy (IMHO) way of linking two lists, 
without any effort on my part. (I work very hard at being this lazy!) 
So, during 'parsing', it is possible to (say) build one list recording 
'type', eg list, dict, set, ... and a parallel list containing the 
values (or k-v pairs, or i,j, or...). Yet more, if/when a 'length' 
metric can be computed... Thereafter when it comes to presentation, the 
assembled lists can be zip-ped together in a for-loop to yield the 
final-presentation.

Apologies, the above seems to be 'fluff around the edges' rather than 
addressing the central needs of the problem. I'm also a little concerned 
about your expertise level and the likelihood that I may be 'talking down'.
-- 
Regards =dn