codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

issue in handling CSV data


On Sunday, 8 September 2019 12:45:45 UTC-4, Peter J. Holzer  wrote:
> On 2019-09-08 05:41:07 -0700, Sharan Basappa wrote:
> > On Sunday, 8 September 2019 04:56:29 UTC-4, Andrea D'Amore  wrote:
> > > On Sun, 8 Sep 2019 at 02:19, Sharan Basappa <sharan.basappa at gmail.com> wrote:
> > > > As you can see, the string "\t"81 is causing the error.
> > > > It seems to be due to char "\t".
> > > 
> > > It is not clear what format do you expect to be in the file.
> > > You say "it is CSV" so your actual payload seems to be a pair of three
> > > bytes (a tab and two hex digits in ASCII) per line.
> > 
> > The issue seems to be presence of tabs along with the numbers in a single string. So, when I try to convert strings to numbers, it fails due to presence of tabs.
> > 
> > Here is the hex dump:
> > 
> > 22 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 
> > 74 68 2c 22 09 22 38 31 2c 22 09 35 63 0d 0a 22 
> > 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74 
> ...
> 
> This looks like this:
> 
> "address,"      "length,"       "81,"   5c
> "address,"      "length,"       "04,"   11
> "address,"      "length,"       "e1,"   17
> "address,"      "length,"       "6a,"   6c
> ...
> 
> Note that the commas are within the quotes. I'd say Andrea is correct:
> This is a tab-separated file, not a comma-separated file. But for some
> reason all fields except the last end with a comma. 
> 
> I would 
> 
> a) try to convince the person producing the file to clean up the mess
> 
> b) if that is not successful, use the csv module to read the file with
>    separator tab and then discard the trailing commas.
> 

Hi Peter,

I respectfully disagree that it is not a comma separated. Let me explain why.
If you look the following line in the code, it specifies comma as the delimiter:

########################
my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
########################

Now, if you see the print after getting the data, it looks like this:

############################## 
[['"\t"81' '"\t5c'] 
?['"\t"04' '"\t11'] 
?['"\t"e1' '"\t17'] 
?['"\t"6a' '"\t6c'] 
?['"\t"53' '"\t69'] 
?['"\t"98' '"\t87'] 
?['"\t"5c' '"\t4b'] 
############################## 

if you observe, the commas have disappeared. That, I think, is because it actually treated this as a CSV file.

Anyway, I am checking to see if I can discard the tabs and process this.
I will keep everyone posted.