codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

issue in handling CSV data


Sharan Basappa <sharan.basappa at gmail.com> writes:

>> 
>> Note that the commas are within the quotes. I'd say Andrea is correct:
>> This is a tab-separated file, not a comma-separated file. But for some
>> reason all fields except the last end with a comma. 
>>

However, genfromtxt is not a full-fledged CSV parser. It does not obey quotes. So the commas inside the quotes ARE treated as separators.

> Hi Peter,
>
> I respectfully disagree that it is not a comma separated. Let me explain why.
> If you look the following line in the code, it specifies comma as the delimiter:
>
> ########################
> my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> ########################
>
> Now, if you see the print after getting the data, it looks like this:
>
> ############################## 
> [['"\t"81' '"\t5c'] 
> ?['"\t"04' '"\t11'] 
> ?['"\t"e1' '"\t17'] 
> ?['"\t"6a' '"\t6c'] 
> ?['"\t"53' '"\t69'] 
> ?['"\t"98' '"\t87'] 
> ?['"\t"5c' '"\t4b'] 
> ############################## 

1) Where did the other fields (address, length) go?
>
> if you observe, the commas have disappeared. That, I think, is because
> it actually treated this as a CSV file.

2) As I said above, if you choose ',' as separator, these will disappear. Similarly, if you choose TAB as seperator, the TABs will disappear. As the format is a strange mixture of the two, you can use either one. But if it would be read with a real CSV-reader, that obeys the quote convention, than using ',' as seperator will not work. Only TAB will work.
But in both cases you would have to do some pre- or post-processing to get the data as you want them.

> Anyway, I am checking to see if I can discard the tabs and process this.
> I will keep everyone posted.

-- 
Piet van Oostrum <piet-l at vanoostrum.org>
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]