[biomart-users] Trying to understand some code for configuring a new 0.9 BioMart

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[biomart-users] Trying to understand some code for configuring a new 0.9 BioMart

joe carlson
Hello,

I'm currently in the process of migrating a 0.7 BioMart db to 0.9. I have not been completely successful in moving to a new configuration. I'm running into some problems with NullPointerExceptions and I suspect that this is the cause of my problem. The exceptions are thrown after a little block of code that I can't quite understand.

I'm attempting to add the existing database by adding a RDBMS data source in MartConfigurator. This db had 4 'main' tables and a collection of 'dm' tables associated with them. The problems arise in org.biomart.configurator.controller.requestCreateDataSetFromTarget.

This little block of code scans the tables in the db looking for the main tables and keeping track of the columns that end with _key. It stashes the table name into mainArray at a position depending on the number of keys in the table:

  
      // build the mainKeyMap and order the mainTable
       
String[] mainArray = new String[mainTableList.size()];
       
int mCtr = 0;
       
for (String tblName : mainTableList) {
           
if (!isMainTable(tblName))
               
continue;
           
List<String> keyList = new ArrayList<String>();
           
for (String colStr : tblColMap.get(tblName)) {
               
if (colStr.endsWith(Resources.get("keySuffix")))
                   
// if(colStr.indexOf(Resources.get("keySuffix"))>=0)
                    keyList
.add(colStr);
           
}
            mainKeyMap
.put(tblName, keyList);
           
if (keyList.size() < 1) {
               
JOptionPane.showMessageDialog(null, "no key column in " + tblName);
               
return null;
           
}
           
mainArray[keyList.size() - 1] = tblName;
       
}



In my db with 4 main tables, I have some with 1 key field and others with 2 key fields. The line highlighted in red is the suspicious one. After this block, mainArray has 2 non-null entries and 2 null entries. (Only 1 table with 1 key field is mentioned.)  The NullPointerException is thrown later when accessing the null entries in the array.

Is the intention of this block of code to produce a list of tables ordered by the number of keys? If so, something is not working right since not every table gets in the list. Or is there an assumption that the main tables will always have an arrangement of 1,2,...n keys? Or is it OK that mainArray does not have every table mentioned?

The phenomenon that I'm seeing is that I'm not seeing all main tables in the new mart that I'm making. I've tried the backward compatibility code for importing an old biomart and I'm seeing some other issues.

Any advice would be appreciated.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Trying to understand some code for configuring a new 0.9 BioMart

Arek Kasprzyk
Hi Joe,

You are right in saying that main tables will always have an arrangement of 1,2,...n keys. The sub-mains 'inherit' all the keys from the table immediately above it. (Special case is of course one main table with one key). You main find helpful section discussing our relational model in the following paper http://database.oxfordjournals.org/content/2011/bar038.full.

a.

On 12 December 2015 at 00:12, <[hidden email]> wrote:
Hello,

I'm currently in the process of migrating a 0.7 BioMart db to 0.9. I have not been completely successful in moving to a new configuration. I'm running into some problems with NullPointerExceptions and I suspect that this is the cause of my problem. The exceptions are thrown after a little block of code that I can't quite understand.

I'm attempting to add the existing database by adding a RDBMS data source in MartConfigurator. This db had 4 'main' tables and a collection of 'dm' tables associated with them. The problems arise in org.biomart.configurator.controller.requestCreateDataSetFromTarget.

This little block of code scans the tables in the db looking for the main tables and keeping track of the columns that end with _key. It stashes the table name into mainArray at a position depending on the number of keys in the table:

  
      // build the mainKeyMap and order the mainTable
       
String[] mainArray = new String[mainTableList.size()];
       
int mCtr = 0;
       
for (String tblName : mainTableList) {
           
if (!isMainTable(tblName))
               
continue;
           
List<String> keyList = new ArrayList<String>();
           
for (String colStr : tblColMap.get(tblName)) {
               
if (colStr.endsWith(Resources.get("keySuffix")))
                   
// if(colStr.indexOf(Resources.get("keySuffix"))>=0)
                    keyList
.add(colStr);
           
}
            mainKeyMap
.put(tblName, keyList);
           
if (keyList.size() < 1) {
               
JOptionPane.showMessageDialog(null, "no key column in " + tblName);
               
return null;
           
}
           
mainArray[keyList.size() - 1] = tblName;
       
}



In my db with 4 main tables, I have some with 1 key field and others with 2 key fields. The line highlighted in red is the suspicious one. After this block, mainArray has 2 non-null entries and 2 null entries. (Only 1 table with 1 key field is mentioned.)  The NullPointerException is thrown later when accessing the null entries in the array.

Is the intention of this block of code to produce a list of tables ordered by the number of keys? If so, something is not working right since not every table gets in the list. Or is there an assumption that the main tables will always have an arrangement of 1,2,...n keys? Or is it OK that mainArray does not have every table mentioned?

The phenomenon that I'm seeing is that I'm not seeing all main tables in the new mart that I'm making. I've tried the backward compatibility code for importing an old biomart and I'm seeing some other issues.

Any advice would be appreciated.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Trying to understand some code for configuring a new 0.9 BioMart

joe carlson
Hi Arek,

thanks for the reply. I took a quick look at this paper but still need to read it more carefully.

The problem I’m working on is migrating a 0.7 biomart to 0.9. The initial setup was done a couple generations ago and the wisdom of the ancients has been lost.

One thing that I had never quite understood in our arrangement is why we had 2 separate mysql databases for the marts: one for the gene information and the other for the DNA sequence. Over time the db for the gene information has acquired multiple main tables. There is one main table (a ‘gene’ table) with a primary keys that is used in several other main tables. Some of the other main tables are what you would call ‘submain’ in that they have a foreign key that points to the primary key of another main table and they have their own primary key. But some other main tables in the db have the foreign key into the gene table and do not have their own primary key.

The code I was wondering about makes sense if the main tables need to be arranged so that every submain table needs its own primary key, and only one submain table can refer back to the main table. Other submain tables need to be connected to another submain table and every main or submain table can have only 1 submain table. Is this correct?

One other thing I don’t quite understand on the mart configurator wizard are the buttons for ‘group’ and ‘partition’. Can you give a quick tip for when and how these are used?

Thanks,

Joe


On Dec 12, 2015, at 2:27 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

You are right in saying that main tables will always have an arrangement of 1,2,...n keys. The sub-mains 'inherit' all the keys from the table immediately above it. (Special case is of course one main table with one key). You main find helpful section discussing our relational model in the following paper http://database.oxfordjournals.org/content/2011/bar038.full.

a.

On 12 December 2015 at 00:12, <[hidden email]> wrote:
Hello,

I'm currently in the process of migrating a 0.7 BioMart db to 0.9. I have not been completely successful in moving to a new configuration. I'm running into some problems with NullPointerExceptions and I suspect that this is the cause of my problem. The exceptions are thrown after a little block of code that I can't quite understand.

I'm attempting to add the existing database by adding a RDBMS data source in MartConfigurator. This db had 4 'main' tables and a collection of 'dm' tables associated with them. The problems arise in org.biomart.configurator.controller.requestCreateDataSetFromTarget.

This little block of code scans the tables in the db looking for the main tables and keeping track of the columns that end with _key. It stashes the table name into mainArray at a position depending on the number of keys in the table:

  
      // build the mainKeyMap and order the mainTable
        String[] mainArray = new String[mainTableList.size()];
        int mCtr = 0;
        for (String tblName : mainTableList) {
            if (!isMainTable(tblName))
                continue;
            List<String> keyList = new ArrayList<String>();
            for (String colStr : tblColMap.get(tblName)) {
                if (colStr.endsWith(Resources.get("keySuffix")))
                    // if(colStr.indexOf(Resources.get("keySuffix"))>=0)
                    keyList.add(colStr);
            }
            mainKeyMap.put(tblName, keyList);
            if (keyList.size() < 1) {
                JOptionPane.showMessageDialog(null, "no key column in " + tblName);
                return null;
            }
            mainArray[keyList.size() - 1] = tblName;
        }



In my db with 4 main tables, I have some with 1 key field and others with 2 key fields. The line highlighted in red is the suspicious one. After this block, mainArray has 2 non-null entries and 2 null entries. (Only 1 table with 1 key field is mentioned.)  The NullPointerException is thrown later when accessing the null entries in the array.

Is the intention of this block of code to produce a list of tables ordered by the number of keys? If so, something is not working right since not every table gets in the list. Or is there an assumption that the main tables will always have an arrangement of 1,2,...n keys? Or is it OK that mainArray does not have every table mentioned?

The phenomenon that I'm seeing is that I'm not seeing all main tables in the new mart that I'm making. I've tried the backward compatibility code for importing an old biomart and I'm seeing some other issues.

Any advice would be appreciated.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Trying to understand some code for configuring a new 0.9 BioMart

Arek Kasprzyk
Hi Joe,

My answers below

Re: two separate datasets for gene and DNA.

This arrangement is necessary so that we can minimise the amount of sequence information that is being stored. The DNA sequence for all sequence types: coding, utr, transcript, exon, protein etc is dynamically cut out and processed from the same underlying DNA before being presented to the user. In this way we can avoid storing explicitly each sequence type per gene.


Re: mains and submains
"The code I was wondering about makes sense if the main tables need to be arranged so that every submain table needs its own primary key, and only one submain table can refer back to the main table. Other submain tables need to be connected to another submain table and every main or submain table can have only 1 submain table. Is this correct?"

Each mart dataset needs to have at least one main table and optional submains. if you see many mains that are not connected to other submains they will be treated as separate datasets.


Re: group vs partition

 Partitioning requires that all partitions of the same dataset have exactly the same data model (identical tables). This is used for query optimisation purposes. The operation of grouping on the other hand makes possible to group unrelated (different tables) datasets so they can be presented 'visually' to the user as one mart.


HTH,
a.


On 13 December 2015 at 19:21, Joe Carlson <[hidden email]> wrote:
Hi Arek,

thanks for the reply. I took a quick look at this paper but still need to read it more carefully.

The problem I’m working on is migrating a 0.7 biomart to 0.9. The initial setup was done a couple generations ago and the wisdom of the ancients has been lost.

One thing that I had never quite understood in our arrangement is why we had 2 separate mysql databases for the marts: one for the gene information and the other for the DNA sequence. Over time the db for the gene information has acquired multiple main tables. There is one main table (a ‘gene’ table) with a primary keys that is used in several other main tables. Some of the other main tables are what you would call ‘submain’ in that they have a foreign key that points to the primary key of another main table and they have their own primary key. But some other main tables in the db have the foreign key into the gene table and do not have their own primary key.

The code I was wondering about makes sense if the main tables need to be arranged so that every submain table needs its own primary key, and only one submain table can refer back to the main table. Other submain tables need to be connected to another submain table and every main or submain table can have only 1 submain table. Is this correct?

One other thing I don’t quite understand on the mart configurator wizard are the buttons for ‘group’ and ‘partition’. Can you give a quick tip for when and how these are used?

Thanks,

Joe


On Dec 12, 2015, at 2:27 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

You are right in saying that main tables will always have an arrangement of 1,2,...n keys. The sub-mains 'inherit' all the keys from the table immediately above it. (Special case is of course one main table with one key). You main find helpful section discussing our relational model in the following paper http://database.oxfordjournals.org/content/2011/bar038.full.

a.

On 12 December 2015 at 00:12, <[hidden email]> wrote:
Hello,

I'm currently in the process of migrating a 0.7 BioMart db to 0.9. I have not been completely successful in moving to a new configuration. I'm running into some problems with NullPointerExceptions and I suspect that this is the cause of my problem. The exceptions are thrown after a little block of code that I can't quite understand.

I'm attempting to add the existing database by adding a RDBMS data source in MartConfigurator. This db had 4 'main' tables and a collection of 'dm' tables associated with them. The problems arise in org.biomart.configurator.controller.requestCreateDataSetFromTarget.

This little block of code scans the tables in the db looking for the main tables and keeping track of the columns that end with _key. It stashes the table name into mainArray at a position depending on the number of keys in the table:

  
      // build the mainKeyMap and order the mainTable
        String[] mainArray = new String[mainTableList.size()];
        int mCtr = 0;
        for (String tblName : mainTableList) {
            if (!isMainTable(tblName))
                continue;
            List<String> keyList = new ArrayList<String>();
            for (String colStr : tblColMap.get(tblName)) {
                if (colStr.endsWith(Resources.get("keySuffix")))
                    // if(colStr.indexOf(Resources.get("keySuffix"))>=0)
                    keyList.add(colStr);
            }
            mainKeyMap.put(tblName, keyList);
            if (keyList.size() < 1) {
                JOptionPane.showMessageDialog(null, "no key column in " + tblName);
                return null;
            }
            mainArray[keyList.size() - 1] = tblName;
        }



In my db with 4 main tables, I have some with 1 key field and others with 2 key fields. The line highlighted in red is the suspicious one. After this block, mainArray has 2 non-null entries and 2 null entries. (Only 1 table with 1 key field is mentioned.)  The NullPointerException is thrown later when accessing the null entries in the array.

Is the intention of this block of code to produce a list of tables ordered by the number of keys? If so, something is not working right since not every table gets in the list. Or is there an assumption that the main tables will always have an arrangement of 1,2,...n keys? Or is it OK that mainArray does not have every table mentioned?

The phenomenon that I'm seeing is that I'm not seeing all main tables in the new mart that I'm making. I've tried the backward compatibility code for importing an old biomart and I'm seeing some other issues.

Any advice would be appreciated.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Trying to understand some code for configuring a new 0.9 BioMart

Arek Kasprzyk

Forgot to add "this is correct" re: mains and sub mains 

a. 

Sent from my iPhone

On 15 Dec 2015, at 13:58, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

My answers below

Re: two separate datasets for gene and DNA.

This arrangement is necessary so that we can minimise the amount of sequence information that is being stored. The DNA sequence for all sequence types: coding, utr, transcript, exon, protein etc is dynamically cut out and processed from the same underlying DNA before being presented to the user. In this way we can avoid storing explicitly each sequence type per gene.


Re: mains and submains
"The code I was wondering about makes sense if the main tables need to be arranged so that every submain table needs its own primary key, and only one submain table can refer back to the main table. Other submain tables need to be connected to another submain table and every main or submain table can have only 1 submain table. Is this correct?"

Each mart dataset needs to have at least one main table and optional submains. if you see many mains that are not connected to other submains they will be treated as separate datasets.


Re: group vs partition

 Partitioning requires that all partitions of the same dataset have exactly the same data model (identical tables). This is used for query optimisation purposes. The operation of grouping on the other hand makes possible to group unrelated (different tables) datasets so they can be presented 'visually' to the user as one mart.


HTH,
a.


On 13 December 2015 at 19:21, Joe Carlson <[hidden email]> wrote:
Hi Arek,

thanks for the reply. I took a quick look at this paper but still need to read it more carefully.

The problem I’m working on is migrating a 0.7 biomart to 0.9. The initial setup was done a couple generations ago and the wisdom of the ancients has been lost.

One thing that I had never quite understood in our arrangement is why we had 2 separate mysql databases for the marts: one for the gene information and the other for the DNA sequence. Over time the db for the gene information has acquired multiple main tables. There is one main table (a ‘gene’ table) with a primary keys that is used in several other main tables. Some of the other main tables are what you would call ‘submain’ in that they have a foreign key that points to the primary key of another main table and they have their own primary key. But some other main tables in the db have the foreign key into the gene table and do not have their own primary key.

The code I was wondering about makes sense if the main tables need to be arranged so that every submain table needs its own primary key, and only one submain table can refer back to the main table. Other submain tables need to be connected to another submain table and every main or submain table can have only 1 submain table. Is this correct?

One other thing I don’t quite understand on the mart configurator wizard are the buttons for ‘group’ and ‘partition’. Can you give a quick tip for when and how these are used?

Thanks,

Joe


On Dec 12, 2015, at 2:27 AM, Arek Kasprzyk <[hidden email]> wrote:

Hi Joe,

You are right in saying that main tables will always have an arrangement of 1,2,...n keys. The sub-mains 'inherit' all the keys from the table immediately above it. (Special case is of course one main table with one key). You main find helpful section discussing our relational model in the following paper http://database.oxfordjournals.org/content/2011/bar038.full.

a.

On 12 December 2015 at 00:12, <[hidden email]> wrote:
Hello,

I'm currently in the process of migrating a 0.7 BioMart db to 0.9. I have not been completely successful in moving to a new configuration. I'm running into some problems with NullPointerExceptions and I suspect that this is the cause of my problem. The exceptions are thrown after a little block of code that I can't quite understand.

I'm attempting to add the existing database by adding a RDBMS data source in MartConfigurator. This db had 4 'main' tables and a collection of 'dm' tables associated with them. The problems arise in org.biomart.configurator.controller.requestCreateDataSetFromTarget.

This little block of code scans the tables in the db looking for the main tables and keeping track of the columns that end with _key. It stashes the table name into mainArray at a position depending on the number of keys in the table:

  
      // build the mainKeyMap and order the mainTable
        String[] mainArray = new String[mainTableList.size()];
        int mCtr = 0;
        for (String tblName : mainTableList) {
            if (!isMainTable(tblName))
                continue;
            List<String> keyList = new ArrayList<String>();
            for (String colStr : tblColMap.get(tblName)) {
                if (colStr.endsWith(Resources.get("keySuffix")))
                    // if(colStr.indexOf(Resources.get("keySuffix"))>=0)
                    keyList.add(colStr);
            }
            mainKeyMap.put(tblName, keyList);
            if (keyList.size() < 1) {
                JOptionPane.showMessageDialog(null, "no key column in " + tblName);
                return null;
            }
            mainArray[keyList.size() - 1] = tblName;
        }



In my db with 4 main tables, I have some with 1 key field and others with 2 key fields. The line highlighted in red is the suspicious one. After this block, mainArray has 2 non-null entries and 2 null entries. (Only 1 table with 1 key field is mentioned.)  The NullPointerException is thrown later when accessing the null entries in the array.

Is the intention of this block of code to produce a list of tables ordered by the number of keys? If so, something is not working right since not every table gets in the list. Or is there an assumption that the main tables will always have an arrangement of 1,2,...n keys? Or is it OK that mainArray does not have every table mentioned?

The phenomenon that I'm seeing is that I'm not seeing all main tables in the new mart that I'm making. I've tried the backward compatibility code for importing an old biomart and I'm seeing some other issues.

Any advice would be appreciated.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at https://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.