codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to expand and flatten a nested of list of dictionaries of varied lengths?


If I may, a couple of items of list-etiquette (polite behavior), as I 
understand them:
1 please reply to the list (cf only myself) because @Mats (who responded 
earlier) and others on this list are much smarter than me, and might be 
able to help you more quickly
2 top-posting seems to take the form 'answer, then question' which is 
illogical to everyone, except apparently Microsoft. It is better to have 
the conversation 'develop' as it proceeds - all the early information at 
the beginning, and the more detailed towards the 'end'. That is not to 
say that we can't "snip" or 'do some gardening', to remove unnecessary 
or erroneous material, as the conversation progresses. You will notice 
(as below) that this also enables a posting with multiple questions, to 
be discussed point-by-point.


Now to work...


 > On Sun, 18 Oct 2020 at 21:48, dn via Python-list <python-list at python.org
 > <mailto:python-list at python.org>> wrote:
 >
 >     On 19/10/2020 09:09, Shaozhong SHI wrote:
 >      > Even worse is that, in some cases, an addition called
 >     serviceRatings as a
 >      > key occur with new data unexpectedly.
 >
 >     "Even worse" than what?
 >
 >     Do you need to keep a list of acceptable/applicable/available keys?
 >     (and reject or deal with others in some alternate fashion)
 >
 >
 >      > How to produce a robust Python/Panda script to coping with all 
these?

...
[I often use ellipsis to indicate that I have snipped 'stuff in the 
middle', others are more overt and will write "<snip>" or similar]


 >     You may find it helpful to use the pprint ("pretty printing" 
library to
 >     print data-structures in a more readable/structured format).
 >
 >     To "flatten" a dictionary, you must first be sure that there will 
be no
 >     keys that will clash (else the second entry will completely 
replace the
 >     first, without trace).
 >
 >     Thus, we will need to understand more about this particular 
definition
 >     of "flatten" in relation to the range of incoming data. Perhaps 
explain
 >     them in English first...

On 19/10/2020 12:14, Shaozhong SHI wrote:
> Hi, DN,
> 
> This is the result of pprint.

[{u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
                                         u'rating': u'Requires 
improvement'},
                                        {u'name': u'Well-led',
                                         u'rating': u'Requires 
improvement'}],
                u'rating': u'Requires improvement'},
   u'reportDate': u'2019-10-04',
   u'reportLinkId': u'63ff05ec-4d31-406e-83de-49a271cfdc43'},
  {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
                                         u'rating': u'Good'},
                                        {u'name': u'Well-led',
                                         u'rating': u'Good'},
                                        {u'name': u'Caring',
                                         u'rating': u'Good'},
                                        {u'name': u'Responsive',
                                         u'rating': u'Good'},
                                        {u'name': u'Effective',
                                         u'rating': u'Requires 
improvement'}],
                u'rating': u'Good'},
   u'reportDate': u'2017-09-08',
   u'reportLinkId': u'4f20da40-89a4-4c45-a7f9-bfd52b48f286'},
  {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
                                         u'rating': u'Requires 
improvement'},
                                        {u'name': u'Well-led',
                                         u'rating': u'Requires 
improvement'},
                                        {u'name': u'Caring',
                                         u'rating': u'Requires 
improvement'},
                                        {u'name': u'Responsive',
                                         u'rating': u'Requires 
improvement'},
                                        {u'name': u'Effective',
                                         u'rating': u'Good'}],
                u'rating': u'Requires improvement'},
   u'reportDate': u'2016-06-11',
   u'reportLinkId': u'0cc4226b-401e-4f0f-ba35-062cbadffa8f'},
  {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
                                         u'rating': u'Good'},
                                        {u'name': u'Well-led',
                                         u'rating': u'Good'},
                                        {u'name': u'Caring',
                                         u'rating': u'Good'},
                                        {u'name': u'Responsive',
                                         u'rating': u'Requires 
improvement'},
                                        {u'name': u'Effective',
                                         u'rating': u'Good'}],
                u'rating': u'Good'},
   u'reportDate': u'2015-01-12',
   u'reportLinkId': u'a11c1e52-ddfd-4cd8-8b56-1b96ac287c96'}]


Well done! This looks so much better, and more to the point, it is 
easier for 'us' to see the structure - but oh dear, doesn't email 
wrapping make our lives difficult!


> Normally, it is like this.
> But sometimes, serviceRatings is added to the key list - [u'overall', 
> u'reportDate', u'reportLinkId']
> 
> That is what I meant about dynamically growing tree.

OK, (and only you/your user can answer this question) why do all the 
examples (above) not have a service-rating?

I am wondering if the use of the word "unexpectedly" has translated 
accurately between languages - if a data-item is part of the data-input, 
then our code must be able to handle it or "clean" it, as specified (by 
the user).

- are you able to add a service-rating to each "overall" entry?
- where service-ratings are not currently-available, would it be 
acceptable to add the field with a value of None? (or some other 
"sentinel-value"
- if the analysis-phase does not consider service-ratings, can we write 
code to read the field from the data-source, but discard it whilst 
loading everything else into a Pandas matrix?


> How best to handle this?

This requires understanding how the service-rating value will be used in 
the analysis, and thus how relevant records may be selected/ignored. 
Just because it features in the data, doesn't mean it needs to be 
included in the analysis!


Have I understood the question?
-- 
Regards =dn