View previous topic :: View next topic |
Author |
Message |
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
Data compression (help) |
Posted: Sun Aug 09, 2015 2:36 pm |
|
|
Hello,
I'm doing a telemetry project and need to make a data compression routine to reduce my internet traffic and optimize my space to store the data.
as my application are sensor readings, I can't apply common algorithms such as "run-length", because not repeat data!
Does anyone know a way to compress data? perhaps using bitwize .. |
|
|
temtronic
Joined: 01 Jul 2010 Posts: 9245 Location: Greensville,Ontario
|
|
Posted: Sun Aug 09, 2015 3:08 pm |
|
|
You should tell us the amount of data you're talking about. Quantity and types of data.
You might find the 'overhead' of internet communications is more than the actual data you're transmitting.
Jay |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
|
Posted: Sun Aug 09, 2015 3:18 pm |
|
|
temtronic wrote: | You should tell us the amount of data you're talking about. Quantity and types of data.
You might find the 'overhead' of internet communications is more than the actual data you're transmitting.
Jay |
hi,
onde example of pack:
@codeOfClient;signal;day/hour;sensor1;sensor2;sensor3;sensor4;sensor5;sensor6;sensor7;sensor8;
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
i send one pack for minut. |
|
|
soonc
Joined: 03 Dec 2013 Posts: 215
|
Try this |
Posted: Sun Aug 09, 2015 6:04 pm |
|
|
lucasromeiro wrote: | temtronic wrote: | You should tell us the amount of data you're talking about. Quantity and types of data.
You might find the 'overhead' of internet communications is more than the actual data you're transmitting.
Jay |
hi,
onde example of pack:
@codeOfClient;signal;day/hour;sensor1;sensor2;sensor3;sensor4;sensor5;sensor6;sensor7;sensor8;
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
i send one pack for minut. |
If the data is sent as ASCII the way you show it then this example is 99 bytes long.
You did not say what the range is for each data field.
My first look you can dump all the ; : and / and then convert the field into binary values.
Doing this you can reduce it to 37 bytes very easily.
2.6 times less data to send. !
Doing this means at the receiving end you have to "decode" the binary back into ASCII but that simple enough.
Good luck
soonc |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
Re: Try this |
Posted: Sun Aug 09, 2015 7:59 pm |
|
|
soonc wrote: | lucasromeiro wrote: | temtronic wrote: | You should tell us the amount of data you're talking about. Quantity and types of data.
You might find the 'overhead' of internet communications is more than the actual data you're transmitting.
Jay |
hi,
onde example of pack:
@codeOfClient;signal;day/hour;sensor1;sensor2;sensor3;sensor4;sensor5;sensor6;sensor7;sensor8;
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
i send one pack for minut. |
If the data is sent as ASCII the way you show it then this example is 99 bytes long.
You did not say what the range is for each data field.
My first look you can dump all the ; : and / and then convert the field into binary values.
Doing this you can reduce it to 37 bytes very easily.
2.6 times less data to send. !
Doing this means at the receiving end you have to "decode" the binary back into ASCII but that simple enough.
Good luck
soonc |
Cool, very interesting!
But I did not understand the logic of converting to binary.
how it works?
Can you help me with an example of conversion and decodification? |
|
|
soonc
Joined: 03 Dec 2013 Posts: 215
|
Re: Try this |
Posted: Sun Aug 09, 2015 8:43 pm |
|
|
lucasromeiro wrote: | soonc wrote: | lucasromeiro wrote: | temtronic wrote: | You should tell us the amount of data you're talking about. Quantity and types of data.
You might find the 'overhead' of internet communications is more than the actual data you're transmitting.
Jay |
hi,
onde example of pack:
@codeOfClient;signal;day/hour;sensor1;sensor2;sensor3;sensor4;sensor5;sensor6;sensor7;sensor8;
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
i send one pack for minut. |
If the data is sent as ASCII the way you show it then this example is 99 bytes long.
You did not say what the range is for each data field.
My first look you can dump all the ; : and / and then convert the field into binary values.
Doing this you can reduce it to 37 bytes very easily.
2.6 times less data to send. !
Doing this means at the receiving end you have to "decode" the binary back into ASCII but that simple enough.
Good luck
soonc |
Cool, very interesting!
But I did not understand the logic of converting to binary.
how it works?
Can you help me with an example of conversion and decodification? |
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
@ remains @
Value 54 in ascii is character 5 and character 4 or two bytes. If decimal it implies the maximum value for the field is 99
You can use Binary or Hex them you represent 00 to 99 as 0X00 to 0X63 which is 1 byte. That's a reduction by 50%
You KNOW field one is ONE BYTE and you also know field two is ONE BYTE, so second field has same value limitation 00-99 or 0X00 to 0X63 dump the ; delimiter as you know the format of your data.
So far you have @ 0x36 0x55 three bytes.
Again Assuming all values in your data are ascii representation of decimal values then fields 1 through 5 can be single byte binary value.
Field 6 requires larger number than 32 bit . Note! Earlier when I went through the data I misread that field s 32 bit, and calculated it as a 32 bit value. You need to decide what the maximum value is for each field can be and represent that in binary. you know (I don;t) the maximum by be less then 48 bits...
Don't be limited by 8,16,32,64 bit data which are "normal" these days, there is nothing to stop you making a union to represent 48 bit values, and simply handle that in the decode. This of course implies you know what the maximum range for each field will be.
Anyway you may also need to decide on a starting character that CANNOT be a value in any other field. This is only needed if your stream data and need to capture it at any point in the stream.
So you may have to sacrifice a few bytes to do that depending upon the nature of your data. If you are sending a fix N byte packet you can simply rely on the @ as a delimiter. One point it may pay you to include a simple crc on the end of the compressed data packet, all this adds to the data again but...
If you understand your data you can compress even further. This sxheme also provide a small means to obfusticate the data, keeping the average nosy person in the dark.
I can't spend more time now this should be enough to get you started, and you should be able to reduce the byte count a LOT.
Good luck
soonc |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
Re: Try this |
Posted: Sun Aug 09, 2015 10:15 pm |
|
|
soonc wrote: | lucasromeiro wrote: | soonc wrote: | lucasromeiro wrote: | temtronic wrote: | You should tell us the amount of data you're talking about. Quantity and types of data.
You might find the 'overhead' of internet communications is more than the actual data you're transmitting.
Jay |
hi,
onde example of pack:
@codeOfClient;signal;day/hour;sensor1;sensor2;sensor3;sensor4;sensor5;sensor6;sensor7;sensor8;
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
i send one pack for minut. |
If the data is sent as ASCII the way you show it then this example is 99 bytes long.
You did not say what the range is for each data field.
My first look you can dump all the ; : and / and then convert the field into binary values.
Doing this you can reduce it to 37 bytes very easily.
2.6 times less data to send. !
Doing this means at the receiving end you have to "decode" the binary back into ASCII but that simple enough.
Good luck
soonc |
Cool, very interesting!
But I did not understand the logic of converting to binary.
how it works?
Can you help me with an example of conversion and decodification? |
@54;85;25/23:22;827346872443;23434525234;12342341234;345643645465;456456456;4564564;456456456;300;
@ remains @
Value 54 in ascii is character 5 and character 4 or two bytes. If decimal it implies the maximum value for the field is 99
You can use Binary or Hex them you represent 00 to 99 as 0X00 to 0X63 which is 1 byte. That's a reduction by 50%
You KNOW field one is ONE BYTE and you also know field two is ONE BYTE, so second field has same value limitation 00-99 or 0X00 to 0X63 dump the ; delimiter as you know the format of your data.
So far you have @ 0x36 0x55 three bytes.
Again Assuming all values in your data are ascii representation of decimal values then fields 1 through 5 can be single byte binary value.
Field 6 requires larger number than 32 bit . Note! Earlier when I went through the data I misread that field s 32 bit, and calculated it as a 32 bit value. You need to decide what the maximum value is for each field can be and represent that in binary. you know (I don;t) the maximum by be less then 48 bits...
Don't be limited by 8,16,32,64 bit data which are "normal" these days, there is nothing to stop you making a union to represent 48 bit values, and simply handle that in the decode. This of course implies you know what the maximum range for each field will be.
Anyway you may also need to decide on a starting character that CANNOT be a value in any other field. This is only needed if your stream data and need to capture it at any point in the stream.
So you may have to sacrifice a few bytes to do that depending upon the nature of your data. If you are sending a fix N byte packet you can simply rely on the @ as a delimiter. One point it may pay you to include a simple crc on the end of the compressed data packet, all this adds to the data again but...
If you understand your data you can compress even further. This sxheme also provide a small means to obfusticate the data, keeping the average nosy person in the dark.
I can't spend more time now this should be enough to get you started, and you should be able to reduce the byte count a LOT.
Good luck
soonc |
Thank you friend!!
I have difficulty understanding some details due to translation errors, but I understood most of it.
I understood the first part of the example of primeros fields.
but would not a simpler way?
Would not it be interesting to convert ALL binary?
I know the length of my data, almost all are int32.
The problem is that I do not know the value of the field, which can be (232) or (2343442342) .. etc ...
If I get the whole string and convert to binary and send?
Could have a way to compress all the time.
as I would a int32 for example?
ex: 4294967290
does asim works:
1- convert it to decimal
2- convert everything to ascii
42 94 96 72 90
* ^ ` H Z
When you have time explains me.
Thank you!!!
It helped me a lot to understand some things, but still did not understand everything. |
|
|
soonc
Joined: 03 Dec 2013 Posts: 215
|
Re: Try this |
Posted: Mon Aug 10, 2015 6:36 am |
|
|
Quote: |
Thank you friend!!
I have difficulty understanding some details due to translation errors, but I understood most of it.
I understood the first part of the example of primeros fields.
but would not a simpler way?
Would not it be interesting to convert ALL binary?
I know the length of my data, almost all are int32.
The problem is that I do not know the value of the field, which can be (232) or (2343442342) .. etc ...
If I get the whole string and convert to binary and send?
Could have a way to compress all the time.
as I would a int32 for example?
ex: 4294967290
does asim works:
1- convert it to decimal
2- convert everything to ascii
42 94 96 72 90
* ^ ` H Z
When you have time explains me.
Thank you!!!
It helped me a lot to understand some things, but still did not understand everything. |
Pity your example was incorrect. Some of the larger numbers are greater than 32bit that why I corrected my statement and talked about 48 bit etc.
You need to "understand" your data, not know the value ! There is a big difference !
Then using the same principal I described for the first 5 fields you can apply it to any value that may appear in that field.
i.e. You do NOT have to know the value, but understanding the data can help to tell you: Example: The data may have a limit of 165000 !
So using a 32 bit binary field is a waste, and that data field can be limited to 24 bits.
Sit down and figure out what the MAXIMUM value can ever possibly be for each data field.
Then Convert the ASCII to BINARY.
Understanding unions and structs can also be a great help. i.e.If you know the data never exceed 24 bits. Declare a struct and union of 32bit and a 4 byte array.
Now you can manipulate the 32 bit value all day long as struct.n32 when you need it or send it as struct.a.[n] and as you can see the transmission is reduced to 3 bytes for your data field that you KNOW never can exceed 165000 !
At the receiving end you will "know" the limits for each data field and the overall format for the compressed packet. Now it's a simple matter of reversing what you just did:
Example: You know field X is coming over as three bytes, and you understand your data must converted that into a 32 bit value for the PC to use it.
Why always send the fourth byte you know it's always 0 ?
That is a 25% saving right there.
If you understand the data you can do a lot to reduce what gets sent. !
I'm not going to do it for you.
Good luck
soonc |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
Re: Try this |
Posted: Mon Aug 10, 2015 11:53 am |
|
|
soonc wrote: | Quote: |
Thank you friend!!
I have difficulty understanding some details due to translation errors, but I understood most of it.
I understood the first part of the example of primeros fields.
but would not a simpler way?
Would not it be interesting to convert ALL binary?
I know the length of my data, almost all are int32.
The problem is that I do not know the value of the field, which can be (232) or (2343442342) .. etc ...
If I get the whole string and convert to binary and send?
Could have a way to compress all the time.
as I would a int32 for example?
ex: 4294967290
does asim works:
1- convert it to decimal
2- convert everything to ascii
42 94 96 72 90
* ^ ` H Z
When you have time explains me.
Thank you!!!
It helped me a lot to understand some things, but still did not understand everything. |
Pity your example was incorrect. Some of the larger numbers are greater than 32bit that why I corrected my statement and talked about 48 bit etc.
You need to "understand" your data, not know the value ! There is a big difference !
Then using the same principal I described for the first 5 fields you can apply it to any value that may appear in that field.
i.e. You do NOT have to know the value, but understanding the data can help to tell you: Example: The data may have a limit of 165000 !
So using a 32 bit binary field is a waste, and that data field can be limited to 24 bits.
Sit down and figure out what the MAXIMUM value can ever possibly be for each data field.
Then Convert the ASCII to BINARY.
Understanding unions and structs can also be a great help. i.e.If you know the data never exceed 24 bits. Declare a struct and union of 32bit and a 4 byte array.
Now you can manipulate the 32 bit value all day long as struct.n32 when you need it or send it as struct.a.[n] and as you can see the transmission is reduced to 3 bytes for your data field that you KNOW never can exceed 165000 !
At the receiving end you will "know" the limits for each data field and the overall format for the compressed packet. Now it's a simple matter of reversing what you just did:
Example: You know field X is coming over as three bytes, and you understand your data must converted that into a 32 bit value for the PC to use it.
Why always send the fourth byte you know it's always 0 ?
That is a 25% saving right there.
If you understand the data you can do a lot to reduce what gets sent. !
I'm not going to do it for you.
Good luck
soonc |
Hello!
My example was only to show the structure (predominantly numbers). Erred in not putting a real example of data!
basically my data would be:
@9999;100;25/10/2015-23:22:34;int32;int32;int32;int32;int32;int32;int32;int8;
@(NumberOfClientMax);(signalMax);(time stamp); integers ranging .....
My example of conversion work the same way?
I did not understand when you talk to convert ASCII to binary, you can give me an example of a single compressed package for me to understand?
this:
@0999;100;25/10/15-23:22:34;4294967290;500;0;15555;72;4294967290;9;50;
I do not want that knife in my place, but I want to learn how to do for me to do alone!
I am struggling to understand step by step how to do the way you told me.
Thank you very much for your help !! You have helped me a lot to understand. |
|
|
Ttelmah
Joined: 11 Mar 2010 Posts: 19546
|
|
Posted: Mon Aug 10, 2015 12:09 pm |
|
|
First thing we can now see is you have a date and time. Now this is seventeen characters, that can be stored/sent as just four. |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
|
Posted: Mon Aug 10, 2015 5:05 pm |
|
|
Ttelmah wrote: | First thing we can now see is you have a date and time. Now this is seventeen characters, that can be stored/sent as just four. |
Perfect!
But i need understand the process of compression and uncompression..
i dont understand :(
do you can teach me? |
|
|
newguy
Joined: 24 Jun 2004 Posts: 1909
|
|
Posted: Mon Aug 10, 2015 5:18 pm |
|
|
No one has actually suggested that you "compress" your data. All they've suggested is that you NOT send ASCII data, but instead send the binary equivalent.
As a very simple example, IF one of the quantities that you're transferring is an int8:
Data to be transferred: 0xa0
ASCII equivalent = 0x31 0x36 0x30 ("160")
Instead of sending 3 bytes (the ASCII equivalent of 160, or 0xa0), all they're suggesting is that you simply send the 0xa0, or in other words the raw binary data itself.
This approach saves bytes over the ASCII alternative. There is no compression per se, it's just that this approach is more efficient in terms of the transfer of actual number of bytes. |
|
|
asmboy
Joined: 20 Nov 2007 Posts: 2128 Location: albany ny
|
|
Posted: Mon Aug 10, 2015 6:49 pm |
|
|
Let me see if i have this right.
You have gone to the trouble of setting up a TCPIP stack and all the attendant support for your design.
I'll assume IPV4 . OK
And you get time /date stamped data ,
amounting to about about 100 bytes
of ascii numerals , all pretty universally easy to parse
because it has a reserved character as field delimiter.
You wouldn't have much to do ,
to make it in pure .CSV format really.
SO your TCPIP packet has 20 bytes of overhead no matter what you send.
https://en.wikipedia.org/wiki/IPv4
and with your data tacked on --
its a whole smoking 120 bytes.
All this agita about compression in a (usually ) high baud
transport layer where a 1 kbyte payload is considered small.
AND unless your transport layer is SLIP -
OR you pay for internet by the BYTE
you have a real non-problem, lacking any
reasonable need for compression in the first place.
i just can't see the need . sorry |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
|
Posted: Mon Aug 10, 2015 6:56 pm |
|
|
newguy wrote: | No one has actually suggested that you "compress" your data. All they've suggested is that you NOT send ASCII data, but instead send the binary equivalent.
As a very simple example, IF one of the quantities that you're transferring is an int8:
Data to be transferred: 0xa0
ASCII equivalent = 0x31 0x36 0x30 ("160")
Instead of sending 3 bytes (the ASCII equivalent of 160, or 0xa0), all they're suggesting is that you simply send the 0xa0, or in other words the raw binary data itself.
This approach saves bytes over the ASCII alternative. There is no compression per se, it's just that this approach is more efficient in terms of the transfer of actual number of bytes. |
exactly,
I understood what they said.
Techniques to save bytes.
but I consider it a form of compression data, since there is an economy without losing data.
However what I did not understand is the step by step how to make a package, for example this:
@0999;100;25/10/15-23:22:34;4294967290;500;0;15555;72;4294967290;9;50;
You can help me understand the step by step?
know other techniques of data compression? |
|
|
lucasromeiro
Joined: 27 Mar 2010 Posts: 167
|
|
Posted: Mon Aug 10, 2015 7:02 pm |
|
|
asmboy wrote: | Let me see if i have this right.
You have gone to the trouble of setting up a TCPIP stack and all the attendant support for your design.
I'll assume IPV4 . OK
And you get time /date stamped data ,
amounting to about about 100 bytes
of ascii numerals , all pretty universally easy to parse
because it has a reserved character as field delimiter.
You wouldn't have much to do ,
to make it in pure .CSV format really.
SO your TCPIP packet has 20 bytes of overhead no matter what you send.
https://en.wikipedia.org/wiki/IPv4
and with your data tacked on --
its a whole smoking 120 bytes.
All this agita about compression in a (usually ) high baud
transport layer where a 1 kbyte payload is considered small.
AND unless your transport layer is SLIP -
OR you pay for internet by the BYTE
you have a real non-problem, lacking any
reasonable need for compression in the first place.
i just can't see the need . sorry |
Thank you for your contribution!
This is an example of a single package.
These packets may be thousands depending on the operating mode.
You can not see the need, but it exists!
The whole economy is welcome, because my data flow in the operators (3G, GPRS) is too expensive to thousands of daily packages. |
|
|
|